Blog
Behind Blue Eyes - Part 3 : Comparing the Distributions
Now that we have our data, we can finally compare the eye color distribution of actors to the baseline established in Part 1. As a reminder, the baseline from 29 US states is : Blue/Grey (27.3%), Brown/Hazel (62.8%) and Green/Other (9.9%). Full Cast We start by aggregating appearance counts by year and eye color. For each group, we compute an appearance rate. Before running any test, it is worth looking at the data visually.
May 3, 2026
Behind Blue Eyes - Part 2 : Collecting Data
To answer the question raised in Part 1, we need data on the eye color of lead actors. To my knowledge, no database exists with this information. Wikidata sometimes contains it, but the coverage is very patchy. The simplest and most reliable approach is probably to build an actor dataset, collect their photos, and annotate them. Luckily, that’s exactly what we’re going to do today. The analysis will then follow in a third post.
May 1, 2026
Behind Blue Eyes - Part 1 : Establishing a Baseline
Like many people, I watch movies and TV shows. A significant portion of the content available in France comes from the US. For years, I’ve often caught myself thinking : “huh, the lead actor has blue eyes again.” Apparently, I’m not the only one asking that question. People on Reddit also are. Since this question has been nagging at me, I’d like to try to answer it as rigorously as possible. There’s some work involved, but it should be manageable. On paper, it’s fairly straightforward. The broad approach :
April 29, 2026
Fixing Kubernetes Load Balancing with HTTP/3
This work is actually a few months old. I recently dug it back up and thought it would be worth sharing. Versions used at the time : FastAPI 0.115.2, Hypercorn 0.17.3, Pydantic 2.9.2, Locust 2.32.0, Niquests 3.9.1 and Trio 0.27.0. Kubernetes is a great technology. I use it at work, but also personally, through a lightweight distribution called k3s . It really simplifies application deployment and scaling.
April 10, 2026
Following Up on Pydantic & Polymorphism
In my previous blog post, I discussed polymorphism with Pydantic and the issues it caused. I suggested an elegant solution (at least I think so), but one that was perhaps not the most plug and play option. That said, we can try to simplify things and rely solely on Pydantic’s features to achieve the same goal, but this will require a few compromises. In particular, we will need to leverage Pydantic’s core schema generation API and use a specific annotation. It’s not ideal but there is not much choice, unless we switch back to the previous solution.
February 1, 2026
Pydantic & Polymorphism
For several years now, I have been using the Pydantic library quite a lot at work as well as in my personal projects. It is handy for validation, easy to pick up, a real game changer for managing an application’s settings via Pydantic Settings , and for simple use cases, I have nothing to complain about. On the other hand, as soon as you start doing things that are a bit more complex, it gets trickier.
January 24, 2026
Hardening Home Infrastructure
For the past two years, I’ve been proudly running a “cluster” based on k3s (a lightweight Kubernetes distribution). The setup was minimalistic, yet almost elegant : One control plane with 8 cores, 64 GB of RAM and a casual 1 TB NVMe SSD; One worker node with 8 cores (and the performance of a sleepy cow 🐄), 16 GB of RAM and a 1 TB HDD for that authentic “retro datacenter” feel. Up until now, it worked. Which is to say: nothing was on fire most of the time, but “working” and “being a good idea” are, as it turns out, two very different concepts.
January 18, 2026
Exploring CUDA, Threading and Async Python - Part 3
Previously, we discussed the impact of the GIL on CPU utilization, particularly relevant for pre-processing. We also looked at how batch size affects GPU utilization (and consequently FPS) in an ideal scenario. However, that was far from a real-world case. In practice, there’s usually a pre-processing phase handled by the CPU, with some parts potentially offloaded to the GPU (like normalization, if it makes sense). Regardless, the CPU needs to send data to the GPU while also providing it with instructions to execute.
November 5, 2024
Exploring CUDA, Threading and Async Python - Part 2
In my previous blog post, we discussed pre-processing, particularly through multithreading. Now, let’s try to understand what it means to put the GPU under pressure. First, we’ll focus on “native” PyTorch. We might dive into model-level optimizations later (and what that means for execution). CUDA CUDA is a parallel computing platform and programming model developed by NVIDIA, for NVIDIA GPUs (breaking news). Basically, it allows developers to harness the massive computing power of NVIDIA GPUs for tasks far beyond just graphics rendering. While CPUs (processors) are optimized for handling sequential tasks, GPUs are built for parallelism and can process thousands of operations simultaneously.
September 29, 2024
Exploring CUDA, Threading and Async Python - Part 1
I’ve been working with Python for quite some time now. When it comes to AI application development, I particularly enjoy using PyTorch . This has given me the opportunity to tinker a bit with CUDA . As models continue to grow in size, optimizing compute resources becomes critical, both for training and inference. Given the cost of GPUs (👋 NVIDIA), it’s essential to make the most out of the hardware. Put simply: the GPU needs to be running at full throttle. All the time. In my experience, that’s easier said than done. The Python x CUDA combo can often be a real headache when it comes to maximizing performance. Not that it’s impossible, but it’s definitely not straightforward.
September 14, 2024