Pydantic: Getting started

Introduction

In this tutorial we will learn how to get started with pydantic, a data validation library for Python based on type annotations.

Pydantic enforces type hints at runtime and exposes user friendly errors when data is invalid [1]. It allows us to define data models for validation using canonical Python, which makes it very easy to get started without having to learn a new schema definition language [1].

The working principle of the library is very simple: we define classes with type annotations that reflect how the data should be structured. Then, when we receive some data that we don’t trust (for example, the JSON body of a HTTP request), we simply use the classes we defined for parsing and validation.

One advantage of using pydantic is that the parsed data structures will be instances of the classes we defined. This makes it easier to reason about our data, gives us nice features such as auto-completion when using IDEs and also allows us to use tools such as mypy for type validations.

For more information about type hints in Python, you can check this cheat sheet.

The code from this tutorial was tested with Python v3.7.2 and pydantic v1.9.0, on Windows.

A simple example

We will start by importing the BaseModel class, which is the class from which all our data models should inherit. We will also import the ValidationError exception, which we will use for error handling, in case the parsing and validation of the data into our model fails.

from pydantic import BaseModel, ValidationError

After this, we will define our model class. For illustration purposes, we will define an example Person class which has a couple of fields:

  • age: an integer with the age of the person. The type hint should be int.
  • name: a string with the name of the person. The type hint should be str.
  • is_married: a Boolean indicating if the person is married or not. The type hint should be bool.

As already mentioned, our Person class should inherit from the BaseModel class we have imported.

class Person(BaseModel):
    age: int
    name: str
    is_married: bool

Now that we have defined the model for our Person, we are going to define a dictionary containing the data that we want to parse and validate against our model. In this example, we are going to define a dictionary with valid data.

data = {
    'name': 'John',
    'age': 20,
    'is_married': False
}

Then we will create an object of the Person class. We will pass as input of the constructor the fields of our data dictionary. To make our code more compact, we will use the dictionary unpack operator (double asterisk).

person = Person(**data)

Although we are passing a dictionary of data that conforms to our model specification, we need to take in consideration that, when receiving external data, this may not be true. So, if the data is invalid, a ValidationError will be raised. As such, we will enclose the instantiation of our Person object in a try except block.

In our except block, we will simply print the exception.

try:
    person = Person(**data)
    # use the data

except ValidationError as e:
    print(e)

For exemplification purposes, we will now convert our model object back to a dictionary with a call to the dict method. Although this method can receive some optional inputs to customize the conversion of the model to a dictionary, for this test we will pass no arguments, so we get the default behavior. We will print the resulting dictionary.

print(person.dict())

The complete code is available below.

from pydantic import BaseModel, ValidationError

class Person(BaseModel):
    age: int
    name: str
    is_married: bool

data = {
    'name': 'John',
    'age': 20,
    'is_married': False
}

try:
    person = Person(**data)
    print(person.dict())

except ValidationError as e:
    print(e)

To test the code, simply run it in a tool of your choice. In my case, I’m using PyCharm, a Python IDE.

You should get a result similar to figure 1. As can be seen, we did not get any exception, meaning that the model parsing and validation occurred without any issue. We also obtained the dictionary representation of our model, as expected.

Dictionary obtained from the Person model object.
Figure 1 – Dictionary obtained from the Person model object.

We will now do some small changes to make the data invalid per our model definition:

  • We will remove the name property from from the dictionary
  • We will set the age property as an arbitrary string
data = {
    'age': "test",
    'is_married': False
}

If we run the code again after changing the data variable for the dictionary above, we should get the output shown in figure 2. As can be seen, the exception indicates that the age is not an integer and the name is missing.

Validation errors thrown by Pydantic.
Figure 2 – Validation errors thrown by Pydantic.

In our case, we have shown a very simple example where we have just printed the dictionary representation of our Person object. Nonetheless, in a real scenario, we would probably use that object for other operations. From a development experience perspective, if we are using an IDE (in my case, PyCharm), the usage of these models also gives us some nice features, such as auto-completion. This may seem a small detail, but when working with big and complex code bases, this type of features is much easier to work with than having to find out which fields may exist on a dictionary object.

Accessing a field of the Person object in PyCharm.
Figure 3 – Accessing a field of the Person object in PyCharm.

Handling errors

In the previous section, when we used an invalid data object, we have handled the errors simply by printing the exception that was raised. Nonetheless, as can be seen in the documentation, there are other methods of the exception that we can use to obtain the validation errors.

The errors method returns a list of errors with the following properties:

  • loc: the error location
  • type: the type of the error
  • msg: a human readable message explaining the error

This representation makes it easy to programmatically treat the errors. One such example could be mapping this format to custom error messages we want to show in our application or even translating them to other languages.

In the example below we will pass an invalid data object to our model and then we will call the errors method in the exception handling block. After printing the full errors array, we will access the first error and print it, and then access the second error and print each of its properties.

from pydantic import BaseModel, ValidationError

class Person(BaseModel):
    age: int
    name: str
    is_married: bool

data = {
    'age': "test",
    'is_married': False
}

try:
    person = Person(**data)
    print(person.dict())

except ValidationError as e:
    errors = e.errors()
    print(errors)

    print()
    print(errors[0])

    print()
    print(errors[1]["loc"])
    print(errors[1]["msg"])
    print(errors[1]["type"])

Upon running the code, we get the result shown below in figure 4. As can be seen, we have obtained the expected errors structure.

Validation errors printed to the console.
Figure 4 – Validation errors printed to the console.

The json method will return the JSON representation of the errors. In the snippet below we will print this JSON representation in the exception block.

except ValidationError as e:
    print(e.json())

Figure 5 shows the output in the console.

JSON representation of the validation errors.
Figure 5 – JSON representation of the validation errors.

Nested models and lists

The code on this section will be similar to the one from the first section. So, we will focus on the new parts. We will add an import for the List type of the typing module, so we can add lists to our Pydantic models.

from typing import List

We will now create a new model called Address. It will contain the following properties, for illustration purposes:

  • street: a string for the street where the person lives
  • building: an integer for the number of the building where the person lives
class Address(BaseModel):
    street: str
    building: int

Now we will enhance our Person model to contain two new fields:

  • address: an object of the class Address we have just defined. It will represent the address of the person.
  • languages: a list of strings to hold the language codes for the languages the person speaks.
class Person(BaseModel):
    age: int
    name: str
    is_married: bool
    address: Address
    languages: List[str]

We will also add the missing data to our data dictionary, to be valid as defined by our updated Person model.

data = {
    'age': 10,
    'name': 'John',
    'is_married': False,
    'address': {
        'street': 'st street',
        'building': 10
    },
    'languages':['pt-pt', 'en-us']
}

The complete code is shown below. As before, we will try to parse the data dictionary to our Person model and then print its dictionary representation.

from typing import List
from pydantic import BaseModel, ValidationError

class Address(BaseModel):
    street: str
    building: int

class Person(BaseModel):
    age: int
    name: str
    is_married: bool
    address: Address
    languages: List[str]


data = {
    'age': 10,
    'name': 'John',
    'is_married': False,
    'address': {
        'street': 'st street',
        'building': 10
    },
    'languages':['pt-pt', 'en-us']
}

try:
    person = Person(**data)
    print(person.dict())

except ValidationError as e:
    print("Exception as str:")
    print(e)
    print("Exception as json:")
    print(e.json())

Upon running the code, you should get a result similar to figure 6. As can be seen, the data was correctly validated and parsed to the Person model.

Dictionary representation of the updated Person instance.
Figure 6 – Dictionary representation of the updated Person instance.

To finalize, we will introduce some errors in the data:

  • We will set the building property as an arbitrary string
  • We will set the first element of the languages array to an object
data = {
    'age': 10,
    'name': 'John',
    'is_married': False,
    'address': {
        'street': 'st street',
        'building': 'test'
    },
    'languages':[{}, 'en-us']
}

Upon running the previous code with the updated data dictionary, we should get a result similar to figure 7. As can be seen, both errors were listed. It is also noteworthy that, in the error location (loc), we can see the path to the nested property of the address that has the error. The same applies for the location of the error in the languages array, which happened in the position 0.

Errors in data parsed to the updated Person model.
Figure 7 – Errors in data parsed to the updated Person model.

References

[1] https://pydantic-docs.helpmanual.io/

Leave a Reply