Blog

Exploring CUDA, Threading and Async Python - Part 3

Previously, we discussed the impact of the GIL on CPU utilization, particularly relevant for pre-processing. We also looked at how batch size affects GPU utilization (and consequently FPS) in an ideal scenario. However, that was far from a real-world case. In practice, there’s usually a pre-processing phase handled by the CPU, with some parts potentially offloaded to the GPU (like normalization, if it makes sense). Regardless, the CPU needs to send data to the GPU while also providing it with instructions to execute.

Read more →

November 5, 2024

Exploring CUDA, Threading and Async Python - Part 2

In my previous blog post, we discussed pre-processing, particularly through multithreading. Now, let’s try to understand what it means to put the GPU under pressure. First, we’ll focus on “native” PyTorch. We might dive into model-level optimizations later (and what that means for execution). CUDA CUDA is a parallel computing platform and programming model developed by NVIDIA, for NVIDIA GPUs (breaking news). Basically, it allows developers to harness the massive computing power of NVIDIA GPUs for tasks far beyond just graphics rendering. While CPUs (processors) are optimized for handling sequential tasks, GPUs are built for parallelism and can process thousands of operations simultaneously.

Read more →

September 29, 2024

Exploring CUDA, Threading and Async Python - Part 1

I’ve been working with Python for quite some time now. When it comes to AI application development, I particularly enjoy using PyTorch . This has given me the opportunity to tinker a bit with CUDA . As models continue to grow in size, optimizing compute resources becomes critical, both for training and inference. Given the cost of GPUs (👋 NVIDIA), it’s essential to make the most out of the hardware. Put simply: the GPU needs to be running at full throttle. All the time. In my experience, that’s easier said than done. The Python x CUDA combo can often be a real headache when it comes to maximizing performance. Not that it’s impossible, but it’s definitely not straightforward.

Read more →

September 14, 2024

Improved Observability

Lately, I’ve been using Langfuse for observability. The tool is pretty cool to observe your LLM applications, track costs, etc. It even has an evaluation feature, but I wouldn’t say that it’s as best as the other features. Let’s say that I don’t think that evaluation is the main goal of Langfuse. Anyway. While playing around, I was a bit frustrated by the Python SDK. It provides a high level abstraction to track the components of your app (through a decorator) but I don’t find this feature to be consistent. Some elements really annoyed me. One can argue that I should then stick to the low-level API, but it would increase verbosity and code duplication, so I don’t think it’s a good idea. Good libraries provide good abstractions. LangChain is a perfect counterexample. Perhaps I’ll write something about it someday…

Read more →

September 8, 2024

MyPy & PEP 702 (@deprecated)

I tend to use Python a lot, both in my job and personal projects. It's a good language, but it still lacks some useful features. However, it seems that Python is evolving in a good direction. The recent PEP 695 signals a promising step forward in my opinion (despite it won't probably be supported by MyPy any time soon). Alongside this progression, modern code checkers like linters (such as Ruff), formatters (like Black), and type checkers (including MyPy) have become indispensable allies to improve your codebase (and also save a tremendous amount of time by spotting mistakes before running any code).

Read more →

March 16, 2024