This post is about schema inference in Python. I’m focusing on dict
and JSON data here which are really ubiquitous.
This is useful for understanding the structure and content of new data sources. The initial schema can be used as a draft for a validation model later on.
I’ll use Faker (see my last post) to create a mock dictionary: Faker.pydict
= {'wind': 6.50703274499078,
sample_dict 'interesting': 34.6591059348561,
'to': 690698.415313144,
'space': {'ten': 'NoVHnFEHRdQDnxsnwHRL',
'morning': 9547,
'him': 'https://vargas.net/blog/blog/postsabout.php'},
'against': 419}
I’ll then be using datamodel-code-generator which you can install with pip.
It’s often used as a CLI but can also be used to analyze data in code. This example creates a Python file containing a Pydantic v1 model (the default output) of the sample dictionary.
from datamodel_code_generator import InputFileType, generate
import pathlib
generate(
sample_dict,=InputFileType.Dict,
input_file_type=pathlib.Path('/tmp/sample_dict_model.py')
output )
generate
doesn’t return a string because it could potentially produce multiple .py
files. More info on that here
with open('/tmp/sample_dict_model.py', 'r') as fh:
= fh.read()
dict_model_str
print(dict_model_str)
# generated by datamodel-codegen:
# filename: <dict>
# timestamp: 2025-09-22T08:11:25+00:00
from __future__ import annotations
from pydantic import BaseModel
class Space(BaseModel):
ten: str
morning: int
him: str
class Model(BaseModel):
wind: float
interesting: float
to: float
space: Space
against: int
Input data can be any of the following:
- OpenAPI 3
- JSON Schema
- JSON/YAML Data
- Python dictionary
- GraphQL schema
See here for more info.
Dictionaries are converted to JSON before the model is generated. If data is already serialized as JSON, the model can use this instead.
In this example I’ll convert the dict
to a JSON str
.
import json
= json.dumps(sample_dict) sample_json
On my system, Pydantic v2 is installed:
import pydantic
pydantic.version.VERSION
'2.11.9'
In the JSON example, I’m using output_model_type=DataModelType.PydanticV2BaseModel
to specify a v2 Pydantic output model:
from datamodel_code_generator import DataModelType
generate(
sample_json,=InputFileType.Json,
input_file_type=pathlib.Path('/tmp/sample_json_model.py'),
output=DataModelType.PydanticV2BaseModel
output_model_type
)
with open('/tmp/sample_json_model.py', 'r') as fh:
= fh.read()
json_model_str
print(json_model_str)
# generated by datamodel-codegen:
# filename: <stdin>
# timestamp: 2025-09-22T08:11:25+00:00
from __future__ import annotations
from pydantic import BaseModel
class Space(BaseModel):
ten: str
morning: int
him: str
class Model(BaseModel):
wind: float
interesting: float
to: float
space: Space
against: int
The .py
file can obviously be copied and modified to be used as the basis for a validation model.
It can also be used on the fly by importing directly from the file:
import importlib
import sys
= importlib.util.spec_from_file_location('pydantic_model', '/tmp/sample_dict_model.py')
spec = importlib.util.module_from_spec(spec)
pydantic_model 'pydantic_model'] = pydantic_model
sys.modules[ spec.loader.exec_module(pydantic_model)
Having imported the module I can then use the model to validate the original sample:
pydantic_model.Model.model_validate(sample_dict)
Model(wind=6.50703274499078, interesting=34.6591059348561, to=690698.415313144, space=Space(ten='NoVHnFEHRdQDnxsnwHRL', morning=9547, him='https://vargas.net/blog/blog/postsabout.php'), against=419)
And just to see what happens with invalid data:
try:
'wrong': 'dict'})
pydantic_model.Model.model_validate({except Exception as e:
print(type(e).__name__)
ValidationError
datamodel-code-generator
and Pydantic
are easy to use in just a few lines of code but also very complex and powerful tools. Together they are a massive help with discovery and validation.
Banner image by Freepik