Write DRY data models with partials and Pydantic

Introduction
Pydantic is an incredibly powerful library for Data Modeling and validation that should become a standard part of your data pipelines.
Part of what makes them so powerful is they can easily accommodate nested data files like JSON.
For a quick refresher, Pydantic is a Python library that lets you define a data model in a Pythonic way, and use that model to validate data inputs, mainly through type hints.
Pydantic type-hints are much stronger and can be much more customized than standard library ones.
Partials on the other hand let you preload a function call with specific args and kwargs, which is particularly helpful if you're going to call the same function with the same parameters numerous times.
In a previous article, I talked about using Enums to define valid string inputs for validating.
We can take this further by incepting our Pydantic data model with other Pydantic data models.
The Data
I created our sample data using various random name generators. The dataset represents characters in a Dungeons and Dragons-type game.
Let's start with an inspection of our data:

As we can see, rather than a flat data structure, we now have data within data.
Inspecting our previous data model we can see that we now have to make some changes to accommodate the nested data structures:
import pydantic
class RpgCharacterModel(pydantic.BaseModel):
CREATION_DATE: datetime
NAME: str = pydantic.Field(...)
GENDER: GenderEnum
RACE: RaceEnum = pydantic.Field(...)
CLASS: ClassEnum = pydantic.Field(...)
HOME: str
GUILD: str
PAY: int = pydantic.Field(..., ge=1, le=500)
Visualizing the problem
One tool I find helpful for exploring nested data structures is JSONCrack, which provides a fantastic visualization of JSON data:

We can see that 4 other models underneath it support our top-level model.
DRY defining our model:
Using the fields we can easily see from JSON crack, we can make our first pass at the model like this.
import pydantic
class RpgRaceModelBasic(pydantic.BaseModel):
RACE_ID: int = pydantic.Field(..., ge=10, le=99)
RACE: RaceEnum = pydantic.Field(...)
HP_MODIFIER_PER_LEVEL: int = pydantic.Field(..., ge=-6, le=6)
STR_MODIFIER: int = pydantic.Field(..., ge=-6, le=6, description="Character's strength")
CON_MODIFIER: int = pydantic.Field(..., ge=-6, le=6, description="Character's strength")
DEX_MODIFIER: int = pydantic.Field(..., ge=-6, le=6, description="Character's strength")
INT_MODIFIER: int = pydantic.Field(..., ge=-6, le=6, description="Character's strength")
WIS_MODIFIER: int = pydantic.Field(..., ge=-6, le=6, description="Character's strength")
CHR_MODIFIER: int = pydantic.Field(..., ge=-6, le=6, description="Character's strength")
pydantic.Field()
lets us specify additional parameters for our model beyond type hints.
...
indicates that it is a required field.ge
indicates that the field must be greater than or equal to this value.le
indicates that the field must be less than or equal to this value.
However, there is also a lot of repeated code in the fields defining our attribute modifiers. 7 of our modifiers must be between -6 and 6. Consequently, in future changes we'd have to make changes to 7 lines of code.
We can simplify our definitions using a partial
function from the functools
library. What a partial does is allow us to pin parameters in place for a function and they're perfect for a situation like this where we are calling the same function with the same arguments over and over again:
import functools
import pydantic
id_partial = functools.partial(pydantic.Field, ..., ge=10, le=99)
attribute_partial = functools.partial(pydantic.Field, ..., ge=-6, le=6)
class RpgRaceModelDry(pydantic.BaseModel):
RACE_ID: int = id_partial(..., ge=10, le=99)
RACE: RaceEnum = pydantic.Field(...)
HP_MODIFIER_PER_LEVEL: int = attribute_partial()
STR_MODIFIER: int = attribute_partial(description="Character's strength")
CON_MODIFIER: int = attribute_partial(description="Character's constitution")
DEX_MODIFIER: int = attribute_partial(description="Character's dexterity")
INT_MODIFIER: int = attribute_partial(description="Character's intelligence")
WIS_MODIFIER: int = attribute_partial(description="Character's wisdom")
CHR_MODIFIER: int = attribute_partial(description="Character's charisma")
We now have a single place we can make changes to our code if we ever need to change the range of the attribute modifiers. This is a much cleaner and DRYer way of writing code.
Also, notice that we can still pass other parameters to the partial function, in this case description
Partials are also flexible and let you override keyword arguments you've already passed to them.
Finishing out the model:
We now have to define our other three models as well as a couple of enums to go with them:
import enum
import pydantic
class AttributeEnum(enum.Enum):
STR = 'STR'
DEX = 'DEX'
CON = 'CON'
INT = 'INT'
WIS = 'WIS'
CHR = 'CHR'
class AlignmentEnum(enum.Enum):
LAWFUL_GOOD = 'Lawful Good'
LAWFUL_NEUTRAL = 'Lawful Neutral'
LAWFUL_EVIL = 'Lawful Evil'
NEUTRAL_GOOD = 'Neutral Good'
TRUE_NEUTRAL = 'True Neutral'
NEUTRAL_EVIL = 'Neutral Evil'
CHAOTIC_GOOD = 'Chaotic Good'
CHAOTIC_NEUTRAL = 'Chaotic Neutral'
CHAOTIC_EVIL = 'Chaotic Evil'
class RpgClassModel(pydantic.BaseModel):
CLASS_ID: int = id_partial()
CLASS: ClassEnum = pydantic.Field(...)
PRIMARY_CLASS_ATTRIBUTE: AttributeEnum = pydantic.Field(...)
SPELLCASTER: bool = pydantic.Field(...)
class RpgPolityModel(pydantic.BaseModel):
KINGDOM_ID: int = id_partial(ge=100, le=999)
POLITY: str = pydantic.Field(...)
TYPE: str = pydantic.Field(...)
class RpgGuildModel(pydantic.BaseModel):
GUILD_ID: int = id_partial(ge=100, le=999)
GUILD: str = pydantic.Field(...)
ALIGNMENT: AlignmentEnum
WEEKLY_DUES: int = pydantic.Field(..., ge=10, le=100)
Pay particular attention to the KINGDOM_ID
and GUILD_ID
. We overrode the ge
and le
arguments in the partial function which is ok. It still preserves the ...
which indicates it's a required field.
By calling the partial function on our ID columns we never have to worry about forgetting to make them required fields.
Defining our top-level model
Now all the supporting models have been built, we can define our top-level model which looks like this:
import pydantic
class RpgCharacterModel(pydantic.BaseModel):
CREATION_DATE: datetime
NAME: str = pydantic.Field(...)
GENDER: GenderEnum
RACE_NESTED: RpgRaceModelDry = pydantic.Field(...)
CLASS_NESTED: RpgClassModel = pydantic.Field(...)
HOME_NESTED: RpgPolityModel
GUILD_NESTED: RpgGuildModel
PAY: int = pydantic.Field(..., ge=1, le=500)
Focus on HOME_NESTED
and GUILD_NESTED
:
Notice how they aren't required in our top-level model but those models have fields within them that are required. That means that if you pass data to the field, it must conform to the model, but if you don't pass the data to the field, it is still considered valid.
That effectively means you can pass to it a valid model or nothing.
Conclusion
Combining functools.partial()
with Pydantic data models can do a lot to make your code cleaner, easier to understand and ensure that you're properly handling invalid data in a scalable way.
In our example, we only built a single level of nesting, but you can nest repeatedly to manage any gnarly JSON object you encounter.
Likewise, a well-built data model gives downstream consumers confidence that the data you're sending to them is exactly what they're expecting.
How do you think you can apply these techniques to your data pipelines?
About
Charles Mendelson is a Data Engineer working at PitchBook data. If you would like to get in touch with him, the best way is on LinkedIn.
All the code:
# Standard Library imports
from datetime import datetime
import enum
import functools
# 3rd Party package imports
import pydantic
# Enums for limiting string data in our model
class GenderEnum(enum.Enum):
M = 'M'
F = 'F'
NB = 'NB'
class ClassEnum(enum.Enum):
Druid = 'Druid'
Fighter = 'Fighter'
Warlock = 'Warlock'
Ranger = 'Ranger'
Bard = 'Bard'
Sorcerer = 'Sorcerer'
Paladin = 'Paladin'
Rogue = 'Rogue'
Wizard = 'Wizard'
Monk = 'Monk'
Barbarian = 'Barbarian'
Cleric = 'Cleric'
class RaceEnum(enum.Enum):
Human = 'Human'
Dwarf = 'Dwarf'
Halfling = 'Halfling'
Elf = 'Elf'
Dragonborn = 'Dragonborn'
Tiefling = 'Tiefling'
Half_Orc = 'Half-Orc'
Gnome = 'Gnome'
Half_Elf = 'Half-Elf'
class AttributeEnum(enum.Enum):
STR = 'STR'
DEX = 'DEX'
CON = 'CON'
INT = 'INT'
WIS = 'WIS'
CHR = 'CHR'
class AlignmentEnum(enum.Enum):
LAWFUL_GOOD = 'Lawful Good'
LAWFUL_NEUTRAL = 'Lawful Neutral'
LAWFUL_EVIL = 'Lawful Evil'
NEUTRAL_GOOD = 'Neutral Good'
TRUE_NEUTRAL = 'True Neutral'
NEUTRAL_EVIL = 'Neutral Evil'
CHAOTIC_GOOD = 'Chaotic Good'
CHAOTIC_NEUTRAL = 'Chaotic Neutral'
CHAOTIC_EVIL = 'Chaotic Evil'
# partial function for siloing our logic in one place:
id_partial = functools.partial(pydantic.Field, ..., ge=10, le=99))
attribute_partial = functools.partial(pydantic.Field, ..., ge=-6, le=6)
# models that make up our main model
class RpgRaceModelDry(pydantic.BaseModel):
RACE_ID: int = id_partial()
RACE: RaceEnum = pydantic.Field(...)
HP_MODIFIER_PER_LEVEL: int = attribute_partial()
STR_MODIFIER: int = attribute_partial()
CON_MODIFIER: int = attribute_partial()
DEX_MODIFIER: int = attribute_partial()
INT_MODIFIER: int = attribute_partial()
WIS_MODIFIER: int = attribute_partial()
CHR_MODIFIER: int = attribute_partial()
class RpgClassModel(pydantic.BaseModel):
CLASS_ID: int = id_partial()
CLASS: ClassEnum = pydantic.Field(...)
PRIMARY_CLASS_ATTRIBUTE: AttributeEnum = pydantic.Field(...)
SPELLCASTER: bool = pydantic.Field(...)
class RpgPolityModel(pydantic.BaseModel):
KINGDOM_ID: int = id_partial(ge=100, le=999)
POLITY: str = pydantic.Field(...)
TYPE: str = pydantic.Field(...)
class RpgGuildModel(pydantic.BaseModel):
GUILD_ID: int = id_partial(ge=100, le=999)
GUILD: str = pydantic.Field(...)
ALIGNMENT: AlignmentEnum
WEEKLY_DUES: int = pydantic.Field(..., ge=10, le=100)
# Our top level model:
class RpgCharacterModel(pydantic.BaseModel):
CREATION_DATE: datetime
NAME: str = pydantic.Field(...)
GENDER: GenderEnum
RACE_NESTED: RpgRaceModelDry = pydantic.Field(...)
CLASS_NESTED: RpgClassModel = pydantic.Field(...)
HOME_NESTED: RpgPolityModel
GUILD_NESTED: RpgGuildModel
PAY: int = pydantic.Field(..., ge=1, le=500)
# We didn't talk about this one, but from a previous article,
# this validates each row of data
def validate_data(list_o_dicts, model: pydantic.BaseModel, index_offset: int = 0):
list_of_dicts_copy = list_o_dicts.copy()
#capturing our good data and our bad data
good_data = []
bad_data = []
for index, row in enumerate(list_of_dicts_copy):
try:
model(**row) #unpacks our dictionary into our keyword arguments
good_data.append(row) #appends valid data to a new list of dictionaries
except pydantic.ValidationError as e:
# Adds all validation error messages associated with the error
# and adds them to the dictionary
row['Errors'] = [error_message for error_message in e.errors()]
row['Error_row_num'] = index + index_offset
#appends bad data to a different list of dictionaries
bad_data.append(row)
return (good_data, bad_data)
Originally published at https://charlesmendelson.com on February 23, 2023.