Pydantic & Polymorphism

Pydantic & Polymorphism

January 24, 2026·Ben
Ben

For several years now, I have been using the Pydantic library quite a lot at work as well as in my personal projects. It is handy for validation, easy to pick up, a real game changer for managing an application’s settings via Pydantic Settings, and for simple use cases, I have nothing to complain about. On the other hand, as soon as you start doing things that are a bit more complex, it gets trickier.

In the examples that follow, I used Pydantic 2.12 and Python 3.12, but what I describe should hold true for any version of Pydantic 2 and Python 3.10+.

Issues

Handling Polymorphism

What really annoys me is how inheritance is handled. If you want a clean object oriented architecture that respects Liskov Substitution Principle (LSP) and polymorphism, it can get pretty painful with Pydantic, especially if you want it to work properly with serialization and deserialization, which is kind of the whole point anyway.

For example, imagine the following class hierarchy. It’s fairly simple, but it should illustrate my point well enough.

animals.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from __future__ import annotations

import abc
from typing import final, override

from pydantic import BaseModel


class Animal(BaseModel, abc.ABC):
    name: str
    age: int

    @abc.abstractmethod
    def speak(self) -> str:
        raise NotImplementedError


@final
class Dog(Animal):
    breed: str

    @override
    def speak(self) -> str:
        return "Woof!"


@final
class Cat(Animal):
    color: str

    @override
    def speak(self) -> str:
        return "Meow!"

So far, so good. You can serialize each object independently, no headaches there.

main.py
1
2
3
4
5
6
7
8
9
from __future__ import annotations

from .animals import Cat, Dog

dog = Dog(name="Buddy", age=3, breed="Golden Retriever")
cat = Cat(name="Whiskers", age=2, color="Tabby")

print(dog.model_dump_json(indent=2))
print(cat.model_dump_json(indent=2))
{
  "name": "Buddy",
  "age": 3,
  "breed": "Golden Retriever"
}
{
  "name": "Whiskers",
  "age": 2,
  "color": "Tabby"
}

In that case, deserialization works fine as well. Pydantic finds its way without any trouble.

main.py
1
2
3
4
5
6
from __future__ import annotations

from .animals import Cat, Dog

dog = Dog.model_validate({"name": "Buddy", "age": 3, "breed": "Golden Retriever"})
cat = Cat.model_validate({"name": "Whiskers", "age": 2, "color": "Tabby"})

Now, let’s say we want to maintain a collection of animals. For example :

house.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from __future__ import annotations

from collections.abc import Sequence
from typing import final

from pydantic import BaseModel

from .animals import Animal

@final
class House(BaseModel):
    animals: Sequence[Animal]

In this case, Pydantic keeps a mechanism that makes sense but it’s a pain when it comes to serialization and deserialization. You lose track of the actual type and you can even lose the data of the child classes. In this case :

{
  "animals": [
    {
      "name": "Buddy",
      "age": 3
    },
    {
      "name": "Whiskers",
      "age": 2
    }
  ]
}

You could use SerializeAsAny[Animal] instead of Animal, but it’s clunky and easy to forget. You can also pass it as a parameter to model_dump or model_dump_json, but again, it’s easy to overlook… In short, not great. On top of that, things completely fall apart when you try to deserialize.

TypeError: Can't instantiate abstract class Animal without an implementation for abstract method 'speak'

Yep, the Animal class is abstract (and even if it weren’t, we would still lose the characteristic values of each subclass without using the “serialize as any” feature).

However, Pydantic offers a solution through unions, especially with discriminated unions.

house.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from __future__ import annotations

from collections.abc import Sequence
from typing import final

from pydantic import BaseModel

from .animals import Cat, Dog

@final
class House(BaseModel):
    animals: Sequence[Dog | Cat]

Voilà, magic. It works.

{
  "animals": [
    {
      "name": "Buddy",
      "age": 3,
      "breed": "Golden Retriever"
    },
    {
      "name": "Whiskers",
      "age": 2,
      "color": "Tabby"
    }
  ]
}
House(animals=[Dog(name='Buddy', age=3, breed='Golden Retriever'), Cat(name='Whiskers', age=2, color='Tabby')])

Super cool, but it actually becomes a huge pain as soon as you want to add a new animal. It creates a bunch of other problems and goes against the Liskov Substitution Principle.

LSP says you should be able to substitute any creature as long as it’s an Animal but in this setup, strictly speaking, a Dog | Cat is not an Animal (even if Python relies on duck typing, I’m trying to be precise). On top of that, some type checkers can’t resolve the common class, so they start complaining, and you end up having to disable them locally, which is a clear code smell in my opinion.

You could try type narrowing, but that really messes with polymorphism. For example, instead of just calling animal.speak(), you’d need something like this :

for animal in house.animals:
    match animal:
        case Dog():
            print(f"{animal.name} says {animal.speak()}")
        case Cat():
            print(f"{animal.name} says {animal.speak()}")
        case never:
            assert_never(never)

This makes the code more verbose and harder to maintain because it has to be exhaustive. Every time you add a new animal type, you need to update all these match statements. In a few rare cases, it might actually be useful to have sealed classes to handle pattern matching more intelligently… but otherwise, it’s a real headache and pushes you toward hacky solutions. Just because Python lets you do whatever doesn’t mean you should.

JSON Coupling

Well, in reality, this is only a half-problem. By default, Pydantic is very much geared toward JSON. It doesn’t natively support other formats. It’s not a huge issue but it can be annoying in some cases.

You can work around it pretty easily by splitting serialization into two steps. I still measured the performance to estimate the hit, given that Pydantic 2 is backed by quite a bit of Rust. The performance loss is pretty minimal (around 7%). Yay.

%timeit dog.model_dump_json()
541 ns ± 6.01 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%timeit orjson.dumps(dog.model_dump(mode="json"))
578 ns ± 2.32 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Keep in mind that the custom marshalling solution we’ll develop later will have additional overhead on top of this, but it’s a reasonable trade-off for the flexibility and correctness it provides.

Possible Solutions

Luckily, we can still try to find solutions to our problems.

Limit Polymorphism

One approach would be to avoid structures that rely on polymorphism. For example, instead of storing all the animals in a single list, you could have a dedicated list for each class.

house.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from __future__ import annotations

from collections.abc import Sequence
from typing import final

from pydantic import BaseModel

from .animals import Cat, Dog

@final
class House(BaseModel):
    dogs: Sequence[Dog]
    cats: Sequence[Cat]

At least you know exactly what you’re dealing with, but in this case, you still run into trouble whenever you want to add a new animal…

Another option would be to use generics, which Pydantic handles correctly, but this limits you to a single type and if you define the generic with the parent class (House[Animal]), you end up in the same situation as before.

house.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from __future__ import annotations

from collections.abc import Sequence
from typing import final

from pydantic import BaseModel

from .animals import Animal

@final
class House[A: Animal](BaseModel):
    animals: Sequence[A]

In short, these are clearly not the best approaches. Ideally, we’d like to handle polymorphism transparently.

Tinkering a bit… but not too much

We can try to get by with a bit of tinkering. The idea is to avoid anything too messy or relying on dark magic. The goal is to put something simple in place to prevent dirty hacks, so it should stay clean. Ideally, we also want our tweak not to break Pydantic’s logic, so that it can be easily disabled if we don’t want to use it (or for integration with other libraries, for example).

First, we define a parent class that will manage our different schemas. The logic is pretty straightforward. As soon as we define a concrete class (or a generic), we store a mapping between an identifier and the Pydantic class. This lets us inject a tag into the serialized data and patch the data on the fly to reconstruct the objects correctly.

For the following examples, I’m organizing the code into separate modules (the relative imports like .schema and .animals indicate they’re all part of the same package). You can structure it differently in practice, but this makes the examples clearer.

schema.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from __future__ import annotations

from inspect import isabstract
from typing import Any, ClassVar, TypedDict, final, override

from pydantic import (
    BaseModel,
    SerializationInfo,
    SerializerFunctionWrapHandler,
    model_serializer,
)


@final
class Context(TypedDict):
    typed: bool


class Schema(BaseModel):
    _SCHEMAS: ClassVar[dict[str, type[Schema]]] = {}

    @override
    def __init_subclass__(cls, **kwargs: Any) -> None:
        super().__init_subclass__(**kwargs)

        key = cls._key()

        if key in cls._SCHEMAS:
            raise ValueError(f"Schema '{key}' is already registered.")

        # Skip abstract classes and generic classes
        if isabstract(cls) or getattr(cls, "__parameters__", False):
            return

        cls._SCHEMAS[key] = cls

    @model_serializer(mode="wrap")
    def _inject_type(
        self,
        next_: SerializerFunctionWrapHandler,
        info: SerializationInfo[Context | None],
    ) -> Any:
        default = next_(self)

        if info.context is not None and info.context.get("typed", False):
            return {
                "__type__": self._key(),
                **default,
            }

        return default

    @classmethod
    def _key(cls) -> str:
        # Opinionated default,
        # but with opportunity for customization for backward compatibility
        return f"{cls.__module__}.{cls.__qualname__}"

    @final
    @classmethod
    def schemas(cls) -> dict[str, type[Schema]]:
        return cls._SCHEMAS.copy()

The adaptation for our existing classes is minimal. Basically, we just inherit from Schema.

animals.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
from __future__ import annotations

import abc
from collections.abc import Sequence
from typing import final, override

from pydantic import BaseModel

from .schema import Schema


class Animal(Schema, abc.ABC):
    name: str
    age: int

    @abc.abstractmethod
    def speak(self) -> str:
        raise NotImplementedError


@final
class Dog(Animal):
    breed: str

    @override
    def speak(self) -> str:
        return "Woof!"


@final
class Cat(Animal):
    color: str

    @override
    def speak(self) -> str:
        return "Meow!"


@final
class Owner(BaseModel):
    name: str


@final
class House(Schema):
    owner: Owner  # Just to ensure that our logic still work on regular Pydantic classes
    animals: Sequence[Animal]

That’s just the first step. Next, we handle serialization. We introduce a new intermediate step to convert our schemas into dictionaries. This intermediate representation will then be serialized. The following new class handles our specific logic and patches the data when needed.

marshaller.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
from __future__ import annotations

from collections.abc import Mapping, Sequence
from typing import Any, final

from .schema import Schema, Context

@final
class Marshaller:
    def __init__(self, *, typed: bool = True, aliased: bool = True) -> None:
        self.typed = typed
        self.aliased = aliased
        self._schemas = Schema.schemas()

    @property
    def context(self) -> Context:
        return {
            "typed": self.typed,
        }

    def marshal(self, schema: Schema) -> dict[str, Any]:
        return schema.model_dump(
            mode="json",
            context=self.context,
            by_alias=self.aliased,
            serialize_as_any=True,
        )

    def _validate(self, data: Any, *, key: str) -> Schema:
        return self._schemas[key].model_validate(
            data,
            context=self.context,
            by_alias=self.aliased,
        )

    def _patch(self, data: Any) -> Any:
        match data:
            case str() | int() | float() | bool() | None:
                return data
            case Sequence():
                return [self._patch(item) for item in data]
            case Mapping():
                patched = {key: self._patch(value) for key, value in data.items()}
                key = patched.get("__type__")

                # If the key is not registered, then let Pydantic handle it
                if key not in self._schemas:
                    return patched

                # Otherwise, we patch with the validated object
                return self._validate(patched, key=key)
            case _:
                return data

    def unmarshal(self, data: dict[str, Any]) -> Schema:
        key = data.get("__type__")

        if key not in self._schemas:
            raise ValueError(f"Unknown schema type: {key!r}")

        return self._validate({k: self._patch(v) for k, v in data.items()}, key=key)

Here’s what the dictionaries look like when we apply the marshal method. With a few tweaks, we could adapt the Marshaller to handle other Pydantic models as well, but I’m a bit lazy. This overall article is just a simple proof of concept.

{
  "__type__": "__main__.House",
  "owner": {
    "name": "Alice"
  },
  "animals": [
    {
      "__type__": "__main__.Dog",
      "name": "Buddy",
      "age": 3,
      "breed": "Golden Retriever"
    },
    {
      "__type__": "__main__.Cat",
      "name": "Whiskers",
      "age": 2,
      "color": "Tabby"
    }
  ]
}

We can then define multiple serialization methods (JSON, YAML, MessagePack, etc.). Below is an example for JSON and YAML.

serializers.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from __future__ import annotations

import abc
from typing import Any, Literal, cast, final, override

import orjson
import yaml


class Serializer(abc.ABC):
    @abc.abstractmethod
    def serialize(self, schema: dict[str, Any]) -> bytes:
        raise NotImplementedError

    @abc.abstractmethod
    def deserialize(self, schema: bytes) -> dict[str, Any]:
        raise NotImplementedError


@final
class JSONSerializer(Serializer):
    def __init__(self, *, indent: Literal[2] | None = None) -> None:
        self.indent = indent

    @override
    def serialize(self, schema: dict[str, Any]) -> bytes:
        options = 0

        if self.indent == 2:
            options |= orjson.OPT_INDENT_2

        return orjson.dumps(schema, option=options)

    @override
    def deserialize(self, schema: bytes) -> dict[str, Any]:
        obj = orjson.loads(schema)

        if not isinstance(obj, dict):
            raise TypeError("Deserialized schema is not a dictionary.")

        return cast("dict[str, Any]", obj)


@final
class YAMLSerializer(Serializer):
    def __init__(self, *, indent: int | None = None) -> None:
        self.indent = indent

    @override
    def serialize(self, schema: dict[str, Any]) -> bytes:
        return yaml.dump(schema, indent=self.indent, encoding="utf-8")

    @override
    def deserialize(self, schema: bytes) -> dict[str, Any]:
        obj = yaml.load(schema, Loader=yaml.SafeLoader)

        if not isinstance(obj, dict):
            raise TypeError("Deserialized schema is not a dictionary.")

        return cast("dict[str, Any]", obj)

Serialization could be fully customized since we’re producing bytes. Right now it’s very basic, but we could tweak it to compress the class key into a unique identifier, for example by hashing it. There are many possibilities.

Finally, we make life a bit easier with a class that handles everything end to end. The cherry on the cake.

codec.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from __future__ import annotations

from typing import final

from .serializers import Serializer
from .marshallers import Marshaller
from .schema import Schema

@final
class Codec:
    def __init__(self, *, marshaller: Marshaller, serializer: Serializer) -> None:
        self.marshaller = marshaller
        self.serializer = serializer

    def encode(self, schema: Schema) -> bytes:
        return self.serializer.serialize(self.marshaller.marshal(schema))

    def decode(self, data: bytes) -> Schema:
        return self.marshaller.unmarshal(self.serializer.deserialize(data))

Usage is pretty straightforward.

from __future__ import annotations

from .animals import Dog, Cat, House, Owner
from .codec import Codec
from .marshaller import Marshaller
from .serializers import JSONSerializer

dog = Dog(name="Buddy", age=3, breed="Golden Retriever")
cat = Cat(name="Whiskers", age=2, color="Tabby")

owner = Owner(name="Alice")
house = House(owner=owner, animals=[dog, cat])

codec = Codec(
    marshaller=Marshaller(typed=True),
    serializer=JSONSerializer(indent=2),
)

print(codec.decode(codec.encode(house)))

Yay. Everything works.

House(owner=Owner(name='Alice'), animals=[Dog(name='Buddy', age=3, breed='Golden Retriever'), Cat(name='Whiskers', age=2, color='Tabby')])

Conclusion

Pydantic’s discriminated union approach to polymorphism forces you to explicitly list all subtypes, violating the Liskov Substitution Principle and creating a maintenance burden every time you extend your class hierarchy. The lightweight marshalling layer presented here solves this by injecting type information during serialization, preserving polymorphism while keeping Pydantic’s validation strengths intact. The overhead is minimal (comparable to the 7% from two-step serialization) and you gain format flexibility beyond JSON.

The trade-offs are straightforward. You lose precise type information during deserialization (everything returns as Schema) and a production system would need better error handling and potentially versioning support.

One solution I’ve deliberately left aside would be to define a custom type, similar to SerializeAsAny, to adapt the Pydantic schema and automatically construct a union of subclasses under the hood. It starts to get really hacky, but it would be the closest to staying within the Pydantic ecosystem. The catch is that you need not to forget this extra annotation…

Alternative libraries like msgspec offer tagged unions with less boilerplate and might be worth considering for greenfield projects. For existing Pydantic codebases, the marshalling approach provides a practical path forward without a rewrite. The key takeaway is that polymorphism and type safety don’t have to be mutually exclusive.

Last updated on