Database Management Guide¶
SPOT Platform uses SQLAlchemy for ORM and Alembic for database migrations. This guide covers database operations and best practices.
Quick Start¶
# Check current database revision
make db:status
# Apply pending migrations
make db:migrate
# Show migration history
make db:history
# Create a new migration
make db:revision MESSAGE="Add user table"
Deploy-Time Migrations¶
Important: Migrations must be run at deploy-time, before starting services.
Services do not run migrations automatically at startup. This is a deliberate design choice:
- Predictable deployments: Migrations run once per deployment, not per service replica
- Faster startup: Services start immediately without waiting for migration locks
- No race conditions: Concurrent service startups don't conflict on migrations
- Clear separation: Infrastructure changes (schema) are separate from application startup
Deployment Workflow¶
CI/CD Integration¶
In CI/CD pipelines, ensure migrations run before service deployment:
deploy:
script:
- make db:migrate # Run migrations first
- docker compose up -d # Then start services
Kubernetes/Container Orchestration¶
For container orchestration platforms, use an init container or deployment hook:
initContainers:
- name: migrate
image: spot-devtools:latest
command: ["alembic", "upgrade", "head"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: spot-secrets
key: database-url
Architecture¶
The database layer is organized into focused components:
- Models (
shared/database/models.py,shared/database/plugin_models.py): Define database schema - Session Management (
shared/database/session.py): Handle database connections - Migrations (
shared/database/alembic/): Manage schema changes
Components¶
- SQLAlchemy Models: Python classes that map to database tables
- Alembic Migrations: Version-controlled schema changes
- Session Management: Context managers for safe database access
Models vs Contracts¶
SPOT uses two types of models that serve different purposes:
| Type | Purpose | Location | Technology | Use Case |
|---|---|---|---|---|
| Contract Models | API contracts and validation | spot-sdk package | Pydantic v2 | External communication, analyzer interfaces |
| Database Models | Data persistence | spot-platform/shared/database/ | SQLAlchemy | Database tables, ORM queries |
Key Differences:
- Contract Models (spot-sdk): Define what data looks like when transmitted (API requests/responses, analyzer communication)
- Database Models (platform): Define how data is stored in PostgreSQL (tables, relationships, indexes)
Example:
# Contract model (spot-sdk) - For API communication
from spot_sdk.api_gateway import Email, AnalysisResult
# Used when calling analyzers or returning API responses
result = AnalysisResult(
is_phishing=True,
threat_level="high",
confidence=0.85,
# ... contract fields
)
# Database model (platform) - For persistence
from database.models import EmailRecord
# Used when saving to database
with get_db_session() as session:
email_record = EmailRecord(
id=uuid.uuid4(),
sender_hash="hash123",
created_at=datetime.utcnow(),
# ... database fields
)
session.add(email_record)
Why separate?
- Contract models: Focused on external interfaces, versioned independently
- Database models: Internal implementation detail, can evolve separately
- No tight coupling: Platform can change storage without breaking API contracts
Note: Platform services often convert between these models as needed (contract models → database models for storage, database models → contract models for API responses).
Using the ORM¶
Basic Session Usage¶
from database.session import get_db_session
from database.models import Email
# Query example
with get_db_session() as session:
email = session.query(Email).filter_by(id=email_id).first()
print(email.sender_hash)
# Insert example
with get_db_session() as session:
new_email = Email(
sender_hash="hash123",
recipient_hash="hash456",
subject_encrypted="encrypted_subject"
)
session.add(new_email)
# Commit happens automatically on context exit
Advanced Usage¶
from database.session import create_db_engine, create_session_factory
# Create engine with custom config
engine = create_db_engine(pool_size=20, max_overflow=10)
# Create session factory
SessionFactory = create_session_factory(engine)
# Use in your service
session = SessionFactory()
try:
# Your database operations
session.commit()
except Exception:
session.rollback()
raise
finally:
session.close()
Migration Commands¶
Check Migration Status¶
# Show current database revision
make db:status
# Show migration history (LIMIT=N to cap)
make db:history
Apply Migrations¶
# Apply all pending migrations
make db:migrate
# Or upgrade to a specific revision
make db:migrate REVISION=003_add_priority
Rollback Migrations¶
# Rollback the last migration (REVISION defaults to -1)
make db:downgrade
# Rollback the last 3 migrations
make db:downgrade REVISION=-3
Create New Migration¶
# Autogenerate migration from model changes
make db:revision MESSAGE="Add user roles table"
# Disable autogeneration (for an empty manual migration)
make db:revision MESSAGE="Custom data backfill" AUTOGEN=0
Manual Migration Management¶
For advanced operations not exposed by the Make targets (stamp, branches, merge...), invoke alembic directly inside the api-gateway container:
# Stamp database to specific revision (advanced)
docker compose exec api-gateway alembic stamp 001_initial_schema
Creating Database Models¶
Example Model¶
from sqlalchemy import Column, String, DateTime, ForeignKey
from sqlalchemy.orm import relationship
from datetime import datetime
import uuid
from database.models import Base
class User(Base):
"""User model following Single Responsibility Principle."""
__tablename__ = 'users'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
email = Column(String(255), unique=True, nullable=False)
created_at = Column(DateTime, default=datetime.utcnow)
# Relationships
emails = relationship("Email", back_populates="user")
Best Practices¶
- Single Responsibility: Each model represents one table
- Use UUIDs: For primary keys (better for distributed systems)
- Add Indexes: For frequently queried columns
- Use Relationships: Define FK relationships using SQLAlchemy ORM
- Default Values: Use
default=for columns with default values
Migration Workflow¶
1. Modify Models¶
Edit shared/database/models.py or shared/database/plugin_models.py:
# Add new column
class Email(Base):
__tablename__ = 'emails'
# ... existing columns ...
# New column
priority = Column(Integer, default=0)
2. Generate Migration¶
This creates a file like alembic/versions/002_add_priority_to_emails.py
3. Review Migration¶
Check the generated file in shared/database/alembic/versions/:
def upgrade() -> None:
op.add_column('emails', sa.Column('priority', sa.Integer(), nullable=True))
def downgrade() -> None:
op.drop_column('emails', 'priority')
4. Apply Migration¶
5. Verify¶
Configuration¶
Database URL¶
Set in .env:
Connection Pooling¶
Configure in your service:
engine = create_db_engine(
pool_size=10, # Number of connections to keep open
max_overflow=20, # Max connections above pool_size
pool_recycle=3600, # Recycle connections after 1 hour
pool_pre_ping=True # Verify connections before using
)
Debug Mode¶
Enable SQL logging:
Troubleshooting¶
Migration Failed¶
# Check current state
make db:status
# Review migration history
make db:history
# If stuck, stamp to a specific revision (advanced; runs alembic in
# the api-gateway container)
docker compose exec api-gateway alembic stamp <revision_id>
Connection Issues¶
# Verify DATABASE_URL is set
echo $DATABASE_URL
# Check if postgres is running
docker compose ps postgres
# Test connection
docker compose exec postgres psql -U spot -d spot -c "SELECT 1"
Permission Errors¶
If you see permission errors when creating migrations, make sure the shared/database/alembic/versions/ directory is writable.
Directory Structure¶
shared/database/
├── alembic/ # Alembic configuration
│ ├── env.py # Environment configuration
│ ├── script.py.mako # Migration template
│ └── versions/ # Migration files
│ └── 001_initial_schema.py
├── alembic.ini # Alembic config file
├── models.py # Core SQLAlchemy models
├── plugin_models.py # Plugin-related models
├── session.py # Session management utilities
└── .archive/ # Old migration system (deprecated)