Stéphan Tulkens

Tokenizer decasing

tokenization and casing | Aug 1, 2025

In this post I will talk about something I call tokenizer de_casing. _decasing is very similar to putting a lowercase normalizer in front a of a tokenizer, but works better.

kwargs.pop is probably a code smell

python and typing | Mar 28, 2025

Sometimes I see something like this:

Using overload to handle tagged union return types

python and typing | Mar 28, 2025

Here’s a function with an idiom I’ve seen a lot (probably copied from sentence-transformers):

Protocols to make untyped code behave

python and typing | Mar 20, 2025

Working with external untyped code in a typed code base can be challenging, you’ll get lots of Any or Unknown, which might propagate through your codebase. This can force you to reach for typing.cast, or # type: ignore statements, which kind of defeats the purpose of using static typing in the first place.

Rethinking evaluation and relative performance

python and ml | Mar 14, 2025

Here’s a pop quiz: classifier A scores 90% accuracy on some benchmark. Classifier B scores 80%. How much better is A?

Exposing string types to maximize user happiness

python and typing | Mar 7, 2025

Regular users of my blog will know that I am opposed to what is known as stringly typing: using strings in place of more strongly typed identifiers. As an example, consider a language-specific tokenizer:

String casing in python

python and unicode | Sep 3, 2024

Below are two ways to check if a string is lower-cased in Python.

Correctly typing cached functions

python and typing | May 31, 2024

Caching, or memoization, is a useful way to speed up repeated calls to expensive, pure, functions. When calling a function, we save the output, using the parameters of the function as a key to the cache. Then, instead of re-calculating the result of a function on each call, we simply return the value that was stored in the cache.

NewType in python

python and typing | Apr 8, 2024

New week, new post! This post is about NewType, an underused construct in Python, in my opinion, and a good way to show the difference between typingtime and runtime.

TypeVars and Unions in python

python, typing, and unions | Mar 13, 2024

This post will be about Unions, TypeVars, and unions of types with a relevant common subtype (i.e., not object). I’ll show how union types are often incorrectly used, and how using a TypeVar can solve some of these problems. So, having said that, let’s dive in!