3. Pydantic for Data Validation¶
Date: 2025-11-04
Status: Accepted
Deciders: Core Team
Related: ADR-005 (spot-sdk Package)
Context¶
With microservices architecture, services exchange data via APIs and message queues. We need:
- Strong data validation at service boundaries
- Clear data schemas that serve as contracts
- Automatic API documentation
- Type safety in Python code
- Serialization/deserialization of complex objects
- JSON Schema generation for cross-language interop
Requirements:
- Runtime validation of incoming data
- IDE autocomplete and type checking support
- Integration with FastAPI for API docs
- Performance (high throughput validation)
- Clear validation error messages
- Support for complex nested structures
Decision¶
Use Pydantic (v2) for all data validation and schema definitions across the platform.
Rationale¶
- Type Safety: Pydantic models provide runtime validation + static type checking
- FastAPI Integration: Native integration with FastAPI for automatic API docs
- Performance: Pydantic v2 is built on Rust core, very fast validation
- Developer Experience: Excellent IDE support, clear error messages
- JSON Schema: Automatic generation of JSON schemas for OpenAPI specs
- Serialization: Built-in JSON serialization with proper type handling
- Validation: Rich validation rules (regex, ranges, custom validators)
Consequences¶
Positive¶
- Strong contracts between services prevent invalid data
- Automatic API documentation via OpenAPI/Swagger
- Early detection of data issues at boundaries
- Excellent IDE autocomplete and type hints
- Clear, actionable error messages for clients
- Type-safe code reduces bugs
- JSON Schema export for language-agnostic contracts
Negative¶
- Learning curve for team members unfamiliar with Pydantic
- Validation overhead (though minimal with v2)
- Need to maintain model definitions alongside code
- Pydantic-specific patterns may not translate to other languages
- Breaking changes in Pydantic updates require migration
Alternatives Considered¶
Alternative 1: Python dataclasses¶
- Pros:
- Built into Python standard library
- Simple and lightweight
- Good IDE support
- Cons:
- No runtime validation
- No serialization/deserialization
- No JSON Schema generation
- No FastAPI integration
- Manual validation code needed
- Why rejected: Lacks validation and serialization we need
Alternative 2: marshmallow¶
- Pros:
- Mature library with large community
- Flexible validation and serialization
- Good documentation
- Cons:
- Slower than Pydantic v2
- Less tight FastAPI integration
- Separate schema and model classes
- Verbose syntax
- Less IDE support for type hints
- Why rejected: Pydantic offers better performance and FastAPI integration
Alternative 3: JSON Schema + jsonschema library¶
- Pros:
- Language-agnostic schemas
- Standard format for API contracts
- Widely supported
- Cons:
- Schemas separate from code
- No Python type hints
- Manual serialization code
- Verbose schema definitions
- Poor IDE support
- Why rejected: Doesn't provide the developer experience we want
Implementation Notes¶
All service contracts use Pydantic models:
from pydantic import BaseModel, EmailStr, Field
class Email(BaseModel):
id: str = Field(..., description="Unique email identifier")
sender: EmailStr
recipients: list[EmailStr]
subject: str
body: str
class Config:
json_schema_extra = {
"example": {
"id": "email_123",
"sender": "user@example.com",
"recipients": ["recipient@example.com"],
"subject": "Test Email",
"body": "Email body text"
}
}
Key practices:
- Use Pydantic v2 syntax (Field, ConfigDict)
- Define clear field descriptions for API docs
- Provide example data for documentation
- Use appropriate validators (EmailStr, constr, etc.)
- Keep models in spot-sdk package for sharing
References¶
- Pydantic Documentation
- Pydantic Performance
- FastAPI + Pydantic
- Implementation:
spot-sdk/interfaces/