Stéphan Tulkens

NLP Person

TypeVars and Unions in python

This post will be about Unions, TypeVars, and unions of types with a relevant common subtype (i.e., not object). I’ll show how union types are often incorrectly used, and how using a TypeVar can solve some of these problems. So, having said that, let’s dive in!

First, let’s pretend we only have three types: str, int, and float. I’ll define a function, and then show why some typing options work, and some don’t, for a combination of these input arguments. To make everything a bit more interesting, I’ll first create my own type, PositiveInt, which is an int that can only be >= 0.

class PositiveInt(int):
    # NOTE: this function is intentionally dumb
    def f(self) -> int:
        return 0

So, first let’s start with a simple example: a function that repeats an item multiple times:

All = int | str | float

def repeat(x: All, n: int) -> list[All]:
    return [x] * n

repeat(10, 5)
# [10, 10, 10, 10, 10]

If you check this using mypy, it works nicely! But there’s actually something wrong here, which we can show using reveal_type.

my_thing = 10
# note: Revealed type is ""
result = repeat(my_thing, 5)
# note: Revealed type is "builtins.list[Union[, builtins.str, builtins.float]]"

The revealed type of the input argument is int, but the return type has become list[int | str | float]! So, even though we just repeated an int 5 times and put the result into a list, our return type has been severely mangled. In fact, if we try to use it as an int later on, we’ll get mypy errors:

my_thing = 10
result = repeat(my_thing, 5)
squared = [x ** 2 for x in result]

This gives us error: Unsupported operand types for ** ("str" and "int") [operator], which is expected, because ** is not implemented for str at all.

The solution here is to use what is known as a TypeVar. A TypeVar is pretty similar to a Union, in that it can represent a bunch of specific types, but it has the important caveat that a TypeVar, at a specific point in execution, represents only one of the types it can represent. A Union, as we saw above, always represents all types it can represent. Let’s try an example:

from typing import TypeVar

AllType = TypeVar("AllType", str, int, float)

def repeat(x: AllType, n: int) -> list[AllType]:
    return [x] * n

my_thing = 10
# note: Revealed type is ""
result = repeat(my_thing, 5)
# note: Revealed type is "builtins.list[]"

Nice! The TypeVar knows that it got an int as an input, and should thus output a list[int].

Let’s dive a bit deeper into how a TypeVar should be defined: a TypeVar takes a name, which should always be exactly equal to its variable name, and either a single type to which it is bound, or two or more constraints. In the example above, we used three constraints. In simple terms, a bound type means that any subclass of the bound type is also allowed, while constraints allow you to specify whatever you want, but will coerce any subclasses of a constraint to the closest parent.

We can test this by using a PositiveInt with the constraints specified above. As PositiveInt is an int, it should work if int is specified as bound, but not as a constraint. Using the repeat function from above:

p = PositiveInt(10)
# note: Revealed type is "PositiveInt"
result = repeat(p, 10)
# note: Revealed type is "builtins.list[]"
[x.f() for x in result]
# error: "int" has no attribute "f"

So, as you can see, the mypy type checker accepts us passing a PositiveInt, but turns it into an int in the process. It therefore doesn’t believe that members of result have a function f, even though they actually do (at run-time, they are all still PositiveInts of course.)

One solution is adding PositiveInt as an additional constraint. This, of course, would solve the problem, but does leave a little bit of a bad taste in my mouth: it’s patching the problem, not really solving it. Another option is using a Union type to bind to the TypeVar, which seems a bit wild to me, but actually works.

from typing import TypeVar

AllTypesIWant = str | int | float
AllType = TypeVar("AllType", bound=AllTypesIWant)

def repeat(x: AllType, n: int) -> list[AllType]:
    return [x] * n

p = PositiveInt(10)
# note: Revealed type is "PositiveInt"
result = repeat(p, 10)
# note: Revealed type is "builtins.list[PositiveInt]"
[x.f() for x in result]
# No error!

o = object()
result = repeat(o, 10)
# Error: we are not allowed to put `object` into this function.
# Nice!

So this works! We got the best of both worlds: a set of types that can be bound to the TypeVar, which correctly deals with subtypes.

Some caveats on the use of TypeVar

A TypeVar as an input argument to a function only makes sense if it is only used in the output. So if you find yourself writing something like the following:

def my_function(x: AllType) -> str:
    return str(x)

You might be better off using a Union instead.

A TypeVar is a nice way to type classmethods, which can be useful in alternative constructors, among other things:

from __future__ import annotations

from typing import Any, TypeVar

T_ = TypeVar("T_", bound=MyClass)

class MyClass:

    def load_from_json(cls: type[T_], json_data: dict[str, Any]) -> T_:
        return cls(**json_data)

If we were to type this classmethod using MyClass directly, we would run into trouble if we were ever to subclass MyClass somewhere. Since Python >= 3.11, typing.Self exists for exactly this purpose, making this pattern a bit redundant.

I think it is very easy to over-use TypeVar. Because TypeVars work in any context in which a Union also works, it is easy to get carried away, and use them gratuitously throughout your codebase. But, in my opinion, TypeVars can be quite difficult to follow and use directly. So, as always, try to find out the minimal use case for whatever you are doing, and only use things when they are absolutely necessary.