Pydantic & Polymorphism
For several years now, I have been using the Pydantic library quite a lot at work as well as in my personal projects. It is handy for validation, easy to pick up, a real game changer for managing an application’s settings via Pydantic Settings, and for simple use cases, I have nothing to complain about. On the other hand, as soon as you start doing things that are a bit more complex, it gets trickier.
In the examples that follow, I used Pydantic 2.12 and Python 3.12, but what I describe should hold true for any version of Pydantic 2 and Python 3.10+.
Issues
Handling Polymorphism
What really annoys me is how inheritance is handled. If you want a clean object oriented architecture that respects Liskov Substitution Principle (LSP) and polymorphism, it can get pretty painful with Pydantic, especially if you want it to work properly with serialization and deserialization, which is kind of the whole point anyway.
For example, imagine the following class hierarchy. It’s fairly simple, but it should illustrate my point well enough.
| |
So far, so good. You can serialize each object independently, no headaches there.
| |
{
"name": "Buddy",
"age": 3,
"breed": "Golden Retriever"
}
{
"name": "Whiskers",
"age": 2,
"color": "Tabby"
}In that case, deserialization works fine as well. Pydantic finds its way without any trouble.
| |
Now, let’s say we want to maintain a collection of animals. For example :
| |
In this case, Pydantic keeps a mechanism that makes sense but it’s a pain when it comes to serialization and deserialization. You lose track of the actual type and you can even lose the data of the child classes. In this case :
{
"animals": [
{
"name": "Buddy",
"age": 3
},
{
"name": "Whiskers",
"age": 2
}
]
}You could use SerializeAsAny[Animal] instead of Animal, but it’s clunky and easy to forget. You can also pass it as a parameter to model_dump or model_dump_json, but again, it’s easy to overlook… In short, not great. On top of that, things completely fall apart when you try to deserialize.
TypeError: Can't instantiate abstract class Animal without an implementation for abstract method 'speak'Yep, the Animal class is abstract (and even if it weren’t, we would still lose the characteristic values of each subclass without using the “serialize as any” feature).
However, Pydantic offers a solution through unions, especially with discriminated unions.
| |
Voilà, magic. It works.
{
"animals": [
{
"name": "Buddy",
"age": 3,
"breed": "Golden Retriever"
},
{
"name": "Whiskers",
"age": 2,
"color": "Tabby"
}
]
}House(animals=[Dog(name='Buddy', age=3, breed='Golden Retriever'), Cat(name='Whiskers', age=2, color='Tabby')])Super cool, but it actually becomes a huge pain as soon as you want to add a new animal. It creates a bunch of other problems and goes against the Liskov Substitution Principle.
LSP says you should be able to substitute any creature as long as it’s an Animal but in this setup, strictly speaking, a Dog | Cat is not an Animal (even if Python relies on duck typing, I’m trying to be precise). On top of that, some type checkers can’t resolve the common class, so they start complaining, and you end up having to disable them locally, which is a clear code smell in my opinion.
You could try type narrowing, but that really messes with polymorphism. For example, instead of just calling animal.speak(), you’d need something like this :
for animal in house.animals:
match animal:
case Dog():
print(f"{animal.name} says {animal.speak()}")
case Cat():
print(f"{animal.name} says {animal.speak()}")
case never:
assert_never(never)This makes the code more verbose and harder to maintain because it has to be exhaustive. Every time you add a new animal type, you need to update all these match statements. In a few rare cases, it might actually be useful to have sealed classes to handle pattern matching more intelligently… but otherwise, it’s a real headache and pushes you toward hacky solutions. Just because Python lets you do whatever doesn’t mean you should.
JSON Coupling
Well, in reality, this is only a half-problem. By default, Pydantic is very much geared toward JSON. It doesn’t natively support other formats. It’s not a huge issue but it can be annoying in some cases.
You can work around it pretty easily by splitting serialization into two steps. I still measured the performance to estimate the hit, given that Pydantic 2 is backed by quite a bit of Rust. The performance loss is pretty minimal (around 7%). Yay.
%timeit dog.model_dump_json()
541 ns ± 6.01 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
%timeit orjson.dumps(dog.model_dump(mode="json"))
578 ns ± 2.32 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)Keep in mind that the custom marshalling solution we’ll develop later will have additional overhead on top of this, but it’s a reasonable trade-off for the flexibility and correctness it provides.
Possible Solutions
Luckily, we can still try to find solutions to our problems.
Limit Polymorphism
One approach would be to avoid structures that rely on polymorphism. For example, instead of storing all the animals in a single list, you could have a dedicated list for each class.
| |
At least you know exactly what you’re dealing with, but in this case, you still run into trouble whenever you want to add a new animal…
Another option would be to use generics, which Pydantic handles correctly, but this limits you to a single type and if you define the generic with the parent class (House[Animal]), you end up in the same situation as before.
| |
In short, these are clearly not the best approaches. Ideally, we’d like to handle polymorphism transparently.
Tinkering a bit… but not too much
We can try to get by with a bit of tinkering. The idea is to avoid anything too messy or relying on dark magic. The goal is to put something simple in place to prevent dirty hacks, so it should stay clean. Ideally, we also want our tweak not to break Pydantic’s logic, so that it can be easily disabled if we don’t want to use it (or for integration with other libraries, for example).
First, we define a parent class that will manage our different schemas. The logic is pretty straightforward. As soon as we define a concrete class (or a generic), we store a mapping between an identifier and the Pydantic class. This lets us inject a tag into the serialized data and patch the data on the fly to reconstruct the objects correctly.
For the following examples, I’m organizing the code into separate modules (the relative imports like .schema and .animals indicate they’re all part of the same package). You can structure it differently in practice, but this makes the examples clearer.
| |
The adaptation for our existing classes is minimal. Basically, we just inherit from Schema.
| |
That’s just the first step. Next, we handle serialization. We introduce a new intermediate step to convert our schemas into dictionaries. This intermediate representation will then be serialized. The following new class handles our specific logic and patches the data when needed.
| |
Here’s what the dictionaries look like when we apply the marshal method. With a few tweaks, we could adapt the Marshaller to handle other Pydantic models as well, but I’m a bit lazy. This overall article is just a simple proof of concept.
{
"__type__": "__main__.House",
"owner": {
"name": "Alice"
},
"animals": [
{
"__type__": "__main__.Dog",
"name": "Buddy",
"age": 3,
"breed": "Golden Retriever"
},
{
"__type__": "__main__.Cat",
"name": "Whiskers",
"age": 2,
"color": "Tabby"
}
]
}We can then define multiple serialization methods (JSON, YAML, MessagePack, etc.). Below is an example for JSON and YAML.
| |
Serialization could be fully customized since we’re producing bytes. Right now it’s very basic, but we could tweak it to compress the class key into a unique identifier, for example by hashing it. There are many possibilities.
Finally, we make life a bit easier with a class that handles everything end to end. The cherry on the cake.
| |
Usage is pretty straightforward.
from __future__ import annotations
from .animals import Dog, Cat, House, Owner
from .codec import Codec
from .marshaller import Marshaller
from .serializers import JSONSerializer
dog = Dog(name="Buddy", age=3, breed="Golden Retriever")
cat = Cat(name="Whiskers", age=2, color="Tabby")
owner = Owner(name="Alice")
house = House(owner=owner, animals=[dog, cat])
codec = Codec(
marshaller=Marshaller(typed=True),
serializer=JSONSerializer(indent=2),
)
print(codec.decode(codec.encode(house)))Yay. Everything works.
House(owner=Owner(name='Alice'), animals=[Dog(name='Buddy', age=3, breed='Golden Retriever'), Cat(name='Whiskers', age=2, color='Tabby')])Conclusion
Pydantic’s discriminated union approach to polymorphism forces you to explicitly list all subtypes, violating the Liskov Substitution Principle and creating a maintenance burden every time you extend your class hierarchy. The lightweight marshalling layer presented here solves this by injecting type information during serialization, preserving polymorphism while keeping Pydantic’s validation strengths intact. The overhead is minimal (comparable to the 7% from two-step serialization) and you gain format flexibility beyond JSON.
The trade-offs are straightforward. You lose precise type information during deserialization (everything returns as Schema) and a production system would need better error handling and potentially versioning support.
One solution I’ve deliberately left aside would be to define a custom type, similar to SerializeAsAny, to adapt the Pydantic schema and automatically construct a union of subclasses under the hood. It starts to get really hacky, but it would be the closest to staying within the Pydantic ecosystem. The catch is that you need not to forget this extra annotation…
Alternative libraries like msgspec offer tagged unions with less boilerplate and might be worth considering for greenfield projects. For existing Pydantic codebases, the marshalling approach provides a practical path forward without a rewrite. The key takeaway is that polymorphism and type safety don’t have to be mutually exclusive.