Return to blog

Validators approach in Python - Pydantic vs. Dataclasses

hero image for Validators approach in Python - Pydantic vs. Dataclasses

Introduction

In the world of Python, various tools and libraries are being created to make your life as a developer much, much easier. Two such tools that often come into play when dealing with data validation and class structures are Pydantic and Python Dataclasses. Both serve similar purposes but have distinct features and use cases. Feel free to pick your poison.

Seems similar?

Pydantic is quite similar to dataclasses mostly due to the determination of the data type processed. In both cases we can define the type to be processed:

from dataclasses import dataclass
from pydantic import BaseModel

@dataclass
class Foo:
   name: str
   fur_color: str

class PydanticFoo(BaseModel):
   name: str
   fur_color: str

It looks the same, doesn't it?

The key inside of it is that @dataclass decorator does not strongly bind the type to the variable. So the value type is basically a hint, not validated strongly like in Pydantic case. So basically declaration of variable type in Pydantic equals the type coercion.

In this example, Pydantic shines by automatically validating the input type data, whereas dataclasses require manual validation.

Example of manual validation approach in dataclasses can be shown below:

@dataclass
class Foo:
	name: str
	surname: str
	city: str
	country: str

def __post_init__(self):
	if not AcceptedCountries(self.country):
		raise ValueError("Must be a valid country to proceed")

As you can see.. Python Dataclasses uses dunder method __post_init__ to enforce validation. Therefore we can enforce type coercion for it, guard our classes from incorrect object arguments, or simply maintain the range of suspected inputs, to receive desired outcome - which in case of dataclasses - will be the object creation.

To summarize the difference about type coercion. Pydantic does it out-of-the-box, Dataclasses - require developer insight and input explicitly.

Validators

Ahh... almighty validators, the most common thing for backend developer to ever encounter inside the code. In these archetypes of code, we can also expect that on these two possibilities, the approach, and outcome will be completely different.

Pydantic offers approach with usage of decorators syntax with keyword - @decorator

from typing import Optional
from pydantic import BaseModel, validator, Field

class Foo(BaseModel):
	name: str
	surname: str
	country: str
	postcode: str

	@validator("postcode")
	def postcode_is_valid(cls, value):
		if not PLPostcode(value):
			raise ValueError("Must be a valid PL Postcode.")
		return value

So as we see above, we have a decorator named postcode within which we are providing the class function to validate class instance. If the value provided can't match the pattern required for instantiation - we receive ValueError. Simple isn't it?

Also we can have additional validation logic on types declaration:

from pydantic import BaseModel

class Foo(BaseModel):
	positive: int = Field(gt=0)
	non_negative: int = Field(ge=0)
	negative: int = Field(lt=0)
	non_positive: int = Field(le=0)
	even: int = Field(multiple_of=2)

Which can be pretty much described in the list below:

  • gt - greater than
  • lt - less than
  • ge - greater than or equal to
  • le - less than or equal to
  • multiple_of - a multiple of the given number

To show the comparison, I'll present the approach that Python dataclasses use to achieve the same result:

@dataclass
class Foo:
	positive: int
	non_negative: int
	negative: int
	non_positive: int
	even: int

def __post_init__(self):
	validate_positive()
	validate_non_negative()
	validate_negative()
	validate_non_positive()
	# rest of the code can be nested underneath

def validate_positive(self):
	if self.positive < 0:
		raise ValueError("Value of positive is lesser than 0")

def validate_non_negative(self):
	if self.non_negative <= 0:
		raise ValueError("Value of non negative is lesser than 0")

def validate_negative(self):
	if self.negative > 0:
		raise ValueError("Value of negative is greater than 0")

def validate_non_positive(self):
	if self.positive >= 0:
		raise ValueError("Value of non positive is greater than 0")

Now you can see, how much Pydantic reduces the boilerplate code. Some hardheads can say what about performance?. I will assure you, that Pydantic logic does not need a lot of memory to do validation purposes.

But it will be ideal to merge two bridges together, the simplicity of Dataclasses, and robustness and near-perfect validation and type coercion? Is it actually possible?? And the answer sounds - YES!

Pydantic dataclasses

Up from Python 3.7, Pydantic serves us - developers yet another flavour of validation. Pydantic.dataclasses

from pydantic.dataclasses import dataclass

@dataclass
class Foo:
	id: int
	name: str = 'John Doe'
	signup_ts: datetime = None

This approach gives us a mix of two important factors here. The ease of use of @dataclass decorator and automatic type coercion - for its fields.

But it shows up its biggest edge when paired with initialization hooks, instead of using the default dunder method __post_init__ - check this out!

from typing import Any, Dict

from pydantic import model_validator
from pydantic.dataclasses import dataclass

@dataclass
class Foo:
	a: int
	b: int
	c: int

@dataclass
class Baz:
	d: Foo
	@model_validator(mode='before')
	def pre_root(cls, values: Dict[str, Any]) -> Dict[str, Any]:
		print(f'First: {values}')
		return values

	@model_validator(mode='after')
	def post_root(self) -> 'Baz':
		print(f'Third: {self}')

	def __post_init__(self):
		print(f'Second: {self.d}')

As you can see.. the __post_init__ in pydantic.dataclasses is executed between all these validators. The sequence is listed below.

  • model_validator(mode='before')
  • field_validator(mode='before')
  • field_validator(mode='after')
  • Inner validators. e.g. validation for types like intstr, ...
  • __post_init__.
  • model_validator(mode='after')

So basically you can validate, model, perform sanitization of fields, do some setups for @properties. Volia! The more needs, the more possibilities of outcomes coming from that.

Summary

So basically, we're at the end of outcomes and creative approaches in validation patterns in Python. There is no wrong way on both approaches, but trying to be concise i will finalize the thoughts in two points:

  • Pydantic is a mighty workhorse, offering robust validation, data sanitization, type coercion nearly implicitly. The drawbacks here are a little bit increased memory consumption, learning curve, which could be overwhelming for fresh Pythonistas.

  • Python Dataclass is a neat, quick, and affordable approach for model build patterns, validation, and maintaining the source of truth as it should. The drawbacks here are the necessity of tweaking its structures to your needs, creating your own validators, or validation methods.

Happy Pythoning fellas!

As a reliable software company we’re focused on delivering the best quality IT services. However, we’ve discovered that programming skills give us a very particular opportunity...

.eco profile for codetain.eco

Reach Us

65-392 Zielona Góra, Poland

Botaniczna 70

© 2015-2024 Codetain. All rights reserved.

cnlogo