Skip to content

Database Management Guide

SPOT Platform uses SQLAlchemy for ORM and Alembic for database migrations. This guide covers database operations and best practices.

Quick Start

# Check current database revision
make db:status

# Apply pending migrations
make db:migrate

# Show migration history
make db:history

# Create a new migration
make db:revision MESSAGE="Add user table"

Deploy-Time Migrations

Important: Migrations must be run at deploy-time, before starting services.

Services do not run migrations automatically at startup. This is a deliberate design choice:

  1. Predictable deployments: Migrations run once per deployment, not per service replica
  2. Faster startup: Services start immediately without waiting for migration locks
  3. No race conditions: Concurrent service startups don't conflict on migrations
  4. Clear separation: Infrastructure changes (schema) are separate from application startup

Deployment Workflow

# 1. Apply migrations first
make db:migrate

# 2. Then start services
make service:start

CI/CD Integration

In CI/CD pipelines, ensure migrations run before service deployment:

deploy:
  script:
    - make db:migrate              # Run migrations first
    - docker compose up -d         # Then start services

Kubernetes/Container Orchestration

For container orchestration platforms, use an init container or deployment hook:

initContainers:
  - name: migrate
    image: spot-devtools:latest
    command: ["alembic", "upgrade", "head"]
    env:
      - name: DATABASE_URL
        valueFrom:
          secretKeyRef:
            name: spot-secrets
            key: database-url

Architecture

The database layer is organized into focused components:

  • Models (shared/database/models.py, shared/database/plugin_models.py): Define database schema
  • Session Management (shared/database/session.py): Handle database connections
  • Migrations (shared/database/alembic/): Manage schema changes

Components

  1. SQLAlchemy Models: Python classes that map to database tables
  2. Alembic Migrations: Version-controlled schema changes
  3. Session Management: Context managers for safe database access

Models vs Contracts

SPOT uses two types of models that serve different purposes:

Type Purpose Location Technology Use Case
Contract Models API contracts and validation spot-sdk package Pydantic v2 External communication, analyzer interfaces
Database Models Data persistence spot-platform/shared/database/ SQLAlchemy Database tables, ORM queries

Key Differences:

  • Contract Models (spot-sdk): Define what data looks like when transmitted (API requests/responses, analyzer communication)
  • Database Models (platform): Define how data is stored in PostgreSQL (tables, relationships, indexes)

Example:

# Contract model (spot-sdk) - For API communication
from spot_sdk.api_gateway import Email, AnalysisResult

# Used when calling analyzers or returning API responses
result = AnalysisResult(
    is_phishing=True,
    threat_level="high",
    confidence=0.85,
    # ... contract fields
)

# Database model (platform) - For persistence
from database.models import EmailRecord

# Used when saving to database
with get_db_session() as session:
    email_record = EmailRecord(
        id=uuid.uuid4(),
        sender_hash="hash123",
        created_at=datetime.utcnow(),
        # ... database fields
    )
    session.add(email_record)

Why separate?

  • Contract models: Focused on external interfaces, versioned independently
  • Database models: Internal implementation detail, can evolve separately
  • No tight coupling: Platform can change storage without breaking API contracts

Note: Platform services often convert between these models as needed (contract models → database models for storage, database models → contract models for API responses).

Using the ORM

Basic Session Usage

from database.session import get_db_session
from database.models import Email

# Query example
with get_db_session() as session:
    email = session.query(Email).filter_by(id=email_id).first()
    print(email.sender_hash)

# Insert example
with get_db_session() as session:
    new_email = Email(
        sender_hash="hash123",
        recipient_hash="hash456",
        subject_encrypted="encrypted_subject"
    )
    session.add(new_email)
    # Commit happens automatically on context exit

Advanced Usage

from database.session import create_db_engine, create_session_factory

# Create engine with custom config
engine = create_db_engine(pool_size=20, max_overflow=10)

# Create session factory
SessionFactory = create_session_factory(engine)

# Use in your service
session = SessionFactory()
try:
    # Your database operations
    session.commit()
except Exception:
    session.rollback()
    raise
finally:
    session.close()

Migration Commands

Check Migration Status

# Show current database revision
make db:status

# Show migration history (LIMIT=N to cap)
make db:history

Apply Migrations

# Apply all pending migrations
make db:migrate

# Or upgrade to a specific revision
make db:migrate REVISION=003_add_priority

Rollback Migrations

# Rollback the last migration (REVISION defaults to -1)
make db:downgrade

# Rollback the last 3 migrations
make db:downgrade REVISION=-3

Create New Migration

# Autogenerate migration from model changes
make db:revision MESSAGE="Add user roles table"

# Disable autogeneration (for an empty manual migration)
make db:revision MESSAGE="Custom data backfill" AUTOGEN=0

Manual Migration Management

For advanced operations not exposed by the Make targets (stamp, branches, merge...), invoke alembic directly inside the api-gateway container:

# Stamp database to specific revision (advanced)
docker compose exec api-gateway alembic stamp 001_initial_schema

Creating Database Models

Example Model

from sqlalchemy import Column, String, DateTime, ForeignKey
from sqlalchemy.orm import relationship
from datetime import datetime
import uuid

from database.models import Base

class User(Base):
    """User model following Single Responsibility Principle."""

    __tablename__ = 'users'

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    email = Column(String(255), unique=True, nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)

    # Relationships
    emails = relationship("Email", back_populates="user")

Best Practices

  1. Single Responsibility: Each model represents one table
  2. Use UUIDs: For primary keys (better for distributed systems)
  3. Add Indexes: For frequently queried columns
  4. Use Relationships: Define FK relationships using SQLAlchemy ORM
  5. Default Values: Use default= for columns with default values

Migration Workflow

1. Modify Models

Edit shared/database/models.py or shared/database/plugin_models.py:

# Add new column
class Email(Base):
    __tablename__ = 'emails'

    # ... existing columns ...

    # New column
    priority = Column(Integer, default=0)

2. Generate Migration

make db:revision MESSAGE="Add priority to emails"

This creates a file like alembic/versions/002_add_priority_to_emails.py

3. Review Migration

Check the generated file in shared/database/alembic/versions/:

def upgrade() -> None:
    op.add_column('emails', sa.Column('priority', sa.Integer(), nullable=True))

def downgrade() -> None:
    op.drop_column('emails', 'priority')

4. Apply Migration

make db:migrate

5. Verify

make db:status
make db:history

Configuration

Database URL

Set in .env:

DATABASE_URL=postgresql://spot:spot@postgres:5432/spot

Connection Pooling

Configure in your service:

engine = create_db_engine(
    pool_size=10,        # Number of connections to keep open
    max_overflow=20,     # Max connections above pool_size
    pool_recycle=3600,   # Recycle connections after 1 hour
    pool_pre_ping=True   # Verify connections before using
)

Debug Mode

Enable SQL logging:

SQL_DEBUG=true

Troubleshooting

Migration Failed

# Check current state
make db:status

# Review migration history
make db:history

# If stuck, stamp to a specific revision (advanced; runs alembic in
# the api-gateway container)
docker compose exec api-gateway alembic stamp <revision_id>

Connection Issues

# Verify DATABASE_URL is set
echo $DATABASE_URL

# Check if postgres is running
docker compose ps postgres

# Test connection
docker compose exec postgres psql -U spot -d spot -c "SELECT 1"

Permission Errors

If you see permission errors when creating migrations, make sure the shared/database/alembic/versions/ directory is writable.

Directory Structure

shared/database/
├── alembic/              # Alembic configuration
│   ├── env.py           # Environment configuration
│   ├── script.py.mako   # Migration template
│   └── versions/        # Migration files
│       └── 001_initial_schema.py
├── alembic.ini          # Alembic config file
├── models.py            # Core SQLAlchemy models
├── plugin_models.py     # Plugin-related models
├── session.py           # Session management utilities
└── .archive/            # Old migration system (deprecated)

Resources