Stéphan Tulkens

NLP Person

NewType in python

New week, new post! This post is about NewType, an underused construct in Python, in my opinion, and a good way to show the difference between typingtime and runtime.

In short, a NewType is a type you can create, but which only exists for mypy and other type checkers. In other words, a NewType does not exist during runtime, and only matters when type checking. Here’s an example:

from typing import NewType

PhoneNumber = NewType("PhoneNumber", str)
my_number = PhoneNumber("04 1234 5678")

reveal_type(my_number)
print(f"My type is: {type(my_number)}")

First, let’s dive into how a NewType is defined. A NewType takes 2 arguments. Just as was the case with a TypeVar, the first argument has to be the same string as the one you assign the NewType to. The second argument is a single type, which is the type the NewType will get at runtime.

If we type check this code, it will give use the following result:

note: Revealed type is "PhoneNumber"

but when actually running the code, we get the following result:

My type is: <class 'str'>

This is interesting! So, like I said before, a NewType is its own type when type checking, but not during runtime. Hence, defining a NewType does not change anything about your code. In fact, if you don’t ever type check your code, the defined NewTypes are invisible (except for the fact that they also signal intent.)

Uses for NewType

If NewType is functionally inert, why use it? In this section I’ll briefly discuss some use-cases and one anti-use-case.

Guard rails

As written before in the great blog by Jakub Beránek, Writing Python like it’s Rust, you can use NewType to guard against switching up arguments:

from typing import NewType

FirstName = NewType("FirstName", str)
LastName = NewType("LastName", str)


def get_person_id_from_db(first_name: FirstName, last_name: LastName) -> str:
    ...

As you can see, using NewType here makes it impossible to accidentally switch up first names and last names when calling this function. As an anecdote: I once completely missed that Python’s re.sub implementation has the following argument order:

  • pattern: the pattern to look for
  • repl: the thing to replace pattern with
  • string: the string in which to replace pattern with repl

I accidentally switched up the “repl” and “string” arguments, which meant that I was replacing all strings with whatever value “repl” had. Because I was just passing in really small strings in which I was replacing digits, I didn’t actually see what was happening. Having a NewType-based implementation here makes this impossible.

Similarly, consider the fact that numpy.save and json.dump don’t have the same argument order. In numpy.save, the file is the first argument, and the thing you want to save the second. In json.dump, however, the thing you want to save comes before the file path. Regardless of whichever you think is more logical, it is easy to do it wrong if you work in codebases in which you use both.

Clarifying intent

Even if it is not possible to mix up argument types, simply because the arguments all have different, incompatible, types, it can still be useful to involve NewTypes to clarify what the semantic content of the arguments is. I especially think this is useful when using return types. Consider the following:

def get_customer_id(first_name: str, last_name: str) -> str:
    ...


def get_customer_information(customer_id: str) -> tuple[str, str]:
    ...

And now consider the following:

from typing import NewType

FirstName = NewType("FirstName", str)
LastName = NewType("LastName", str)
CustomerId = NewType("CustomerId", str)


def get_customer_id(first_name: FirstName, last_name: LastName) -> CustomerId:
    ...


def get_customer_information(customer_id: CustomerId) -> tuple[FirstName, LastName]:
    ...

While this is a bit of a convoluted example (I would normally not return a tuple of strings, but a dataclass), I think it drives home that these methods are each other’s dual, in a way that a regularly typed codebase wouldn’t.

Downsides

A big downside of using NewType is cognitive overhead: if the types are defined far away from the place where they are used, it might be difficult to understand that what we are actually using NewType in the first place. Unsuspecting programmers might be misled, and think that the types need to be converted to be used, which can lead to redundant cast, uses of Any, etc.

One way to circumvent this issue is to be very strict about only defining NewTypes in the same file as where they are used. If you use the same NewType in multiple places, you are probably better off just redefining it to reduce the surface of interaction between different parts of your codebase. Converting between NewTypes is practically free anyway:

from codebase.part1 import PhoneNumber
from codebase.part2 import PhoneNumber as OtherPhoneNumber


def call_number(number: PhoneNumber) -> None:
    ...


number = OtherPhoneNumber("04 1234 5678")

call_number(PhoneNumber(number))

While this looks messy, the alternative is even messier: it would require you to tie together two disparate parts of your codebase for a type that is functionally inert. Typing is useful, but it should not lead you to create dependencies between otherwise independent pieces of code, which seems to me like putting the cart before the horse.