Skip to content

Configuration Reference

SPOT Platform uses environment variables for configuration. All settings can be configured via .env file or environment variables.

Quick Start

Copy .env.example to .env and customize:

cp .env.example .env

See .env.example for comprehensive documentation of all available variables.

Environment Control

# Set environment mode
APP_ENV=prod|dev|test

# Default: prod (for safety)

See Environment Configuration for details on environment management.

Required Variables

These variables MUST be set in production:

Variable Description Example
APP_ENV Environment mode prod, dev, or test
SECRET_KEY JWT token signing key Generate with: python -c "import secrets; print(secrets.token_urlsafe(32))"
POSTGRES_DB PostgreSQL database name spot
POSTGRES_USER PostgreSQL username spot
POSTGRES_PASSWORD PostgreSQL password secure_password
RABBITMQ_DEFAULT_USER RabbitMQ username guest
RABBITMQ_DEFAULT_PASS RabbitMQ password secure_password

Core Configuration

Docker Compose

Variable Default Description
COMPOSE_PROJECT_NAME spot Project name for consistent container/network naming

Application Settings

Variable Default Description
LOG_LEVEL INFO Logging level: DEBUG, INFO, WARNING, ERROR
DEBUG false Enable debug mode
TZ UTC Timezone

Security

Variable Default Description
SECRET_KEY (required) Secret key for JWT token signing - MUST be set
INTERNAL_API_KEY (none) Internal API key for service-to-service auth (optional)
TRUSTED_HOSTS localhost,127.0.0.1 Comma-separated list of trusted hosts

Database (PostgreSQL)

Variable Default Description
POSTGRES_DB spot Database name
POSTGRES_USER spot Database username
POSTGRES_PASSWORD spot123 Database password
POSTGRES_PORT 5432 Database port
DATABASE_URL (auto-constructed) Full connection URL

Redis Cache

Variable Default Description
REDIS_PASSWORD (empty) Redis password (empty = no auth)
REDIS_PORT 6379 Redis port
REDIS_URL (auto-constructed) Full connection URL
REDIS_MAXMEMORY 256mb Maximum memory
REDIS_MAXMEMORY_POLICY allkeys-lru Eviction policy

RabbitMQ Message Queue

Variable Default Description
RABBITMQ_DEFAULT_USER guest RabbitMQ username
RABBITMQ_DEFAULT_PASS guest RabbitMQ password
RABBITMQ_PORT 5672 AMQP port
RABBITMQ_MGMT_PORT 15672 Management UI port
RABBITMQ_URL (auto-constructed) Full connection URL

Service Configuration

API Gateway

Variable Default Description
API_GATEWAY_PORT 8001 External API port

Plugin Configuration

"Plugin" is the umbrella vocabulary for anything pluggable into SPOT. Two kinds exist today:

  • analyzers run POST /internal/analyze and produce a phishing verdict (AnalysisResult).
  • context providers run POST /internal/enrich and enrich emails with organisational data (EnrichmentResult) before analyzers run.

Both are configured under the plugins: section of config/spot.yaml:

plugins:
  analyzers:
    analyzer-nlp:
      enabled: true
      url: "http://analyzer-nlp:8000"
      settings: {}
    analyzer-llm:
      enabled: true
      url: "http://analyzer-llm:8000"
      settings: {}
  context_providers:
    employee-dir:
      enabled: true
      url: "http://provider-employee-dir:8000"
      settings:
        LDAP_HOST: ldap.example.com

Each entry needs at minimum url and enabled. Installed plugins also carry image, version, container_id, container_name, installed_at (set automatically by the installer).

Context providers expose POST /internal/enrich and return an EnrichmentResult. See the Context Providers guide for the full contract and implementation examples.

Context providers are referenced from workflow stages via a context_providers list:

stages:
  - name: enrichment
    type: parallel
    context_providers:
      - id: employee-dir
        timeout_ms: 5000
        required: true
    analyzers: [...]

Custom Plugin Configuration

Plugin behaviour (model paths, API tokens, feature flags, ...) is managed differently from platform orchestration config:

Aspect Platform Configuration Plugin Configuration
Purpose Register and connect to plugins Configure plugin behaviour
Location config/spot.yaml Plugin repository .env
Format plugins.{analyzers,context_providers}: section Plugin-specific settings
Scope Platform-wide orchestration Single plugin instance

Platform side (config/spot.yaml):

plugins:
  analyzers:
    my-analyzer:
      enabled: true
      url: "http://my-analyzer:8000"
      settings: {}

Analyzer side (your analyzer/.env):

# Analyzer-specific configuration
MODEL_PATH=/models/my-model.bin
CONFIDENCE_THRESHOLD=0.75
MAX_EMAIL_SIZE=10MB

Mail Retriever Configuration

Variable Default Description
SPOT_MAIL_RETRIEVERS {} JSON object with retriever configs

Example SPOT_MAIL_RETRIEVERS:

SPOT_MAIL_RETRIEVERS='{"imap":{"url":"http://mail-retriever:8000","priority":1,"enabled":true}}'

Workflow and Analyzer Configuration Files

Variable Default Description
SPOT_CONFIG_FILE config/spot.yaml Path to analyzer configuration (falls back to config/defaults/spot.yaml if not found)
SPOT_WORKFLOWS_FILE config/workflows.yaml Path to workflow configuration (falls back to config/defaults/workflows.yaml if not found)

See Workflow YAML Schema and Analyzer Settings below for detailed schema documentation.

Workflow YAML Schema

Workflows define how analyzers are orchestrated to detect spear-phishing emails. Configuration is in config/workflows.yaml.

Workflow Structure

workflows:
  - id: "workflow-id"           # Required: Unique identifier
    name: "Human Readable Name" # Required: Display name
    version: 1                  # Schema version (integer)
    description: "Description"  # Optional description
    stages: [...]               # Required: List of stages
    timeout_ms: 300000          # Total workflow timeout (default: 5 min)
    max_parallel_analyzers: 10  # Max concurrent analyzers
    final_stage_name: "stage"   # Stage that produces final result
    confidence_threshold: 0.7   # Min confidence for detection (0.0-1.0)
    created_by: "system"        # Creator identifier

Stage Configuration

Each stage groups analyzers that run together:

stages:
  - name: "stage-name"          # Required: Unique within workflow
    type: "parallel"            # parallel | sequential | conditional
    depends_on: []              # List of stage names this depends on
    continue_on_failure: true   # Continue if some analyzers fail
    min_successful_analyzers: 2 # Minimum analyzers that must succeed
    aggregation_method: "weighted_average"  # How to combine results
    analyzers: [...]            # List of analyzer configs
    condition: null             # Optional: Expression for conditional stages

Stage Types: - parallel - Run all analyzers concurrently - sequential - Run analyzers one after another - conditional - Run based on condition expression

Aggregation Methods: - weighted_average - Combine scores using analyzer weights - max_confidence - Take highest confidence score - majority_vote - Use most common classification

Analyzer Configuration (in Workflow)

Each analyzer within a stage:

analyzers:
  - id: "analyzer-nlp"          # Required: Analyzer identifier
    weight: 0.5                 # Score weight (0.0-1.0)
    timeout_ms: 30000           # Per-analyzer timeout
    required: false             # If true, stage fails if analyzer fails
    failure_strategy: "skip"    # skip | retry | fail
    retry_config:               # Optional retry settings
      max_attempts: 3
      backoff_ms: 1000
      max_backoff_ms: 10000
      exponential_backoff: true
    condition: null             # Optional: When to run this analyzer

Failure Strategies: - skip - Continue workflow without this analyzer's result - retry - Retry according to retry_config, then skip/fail - fail - Immediately fail the entire stage

Accessing Previous Stage Results (analysis_context)

Every analyzer automatically receives the results of all previously-completed stages via Email.analysis_context. No configuration is required -- the orchestrator builds the context before calling each analyzer.

Structure:

email.analysis_context = {
    "<stage-name>": {
        "providers": {
            "<provider-id>": { ...free-form data... }
        },
        "analyzers": {
            "<analyzer-id>": { ...AnalyzerResult fields... }
        }
    },
    ...
}
  • Top-level keys are stage names
  • Each stage contains a providers dict (empty until Context Providers land) and an analyzers dict
  • Analyzer results expose all AnalyzerResult fields: is_phishing, confidence, threat_level, indicators, analyzer_details, etc.
  • Only stages that have already completed at the time the analyzer runs are included

Example access in an analyzer:

@app.post("/internal/analyze")
async def analyze_email(email: Email) -> AnalysisResult:
    ctx = email.analysis_context
    if "parallel-analysis" in ctx:
        nlp = ctx["parallel-analysis"]["analyzers"].get("analyzer-nlp")
        ml = ctx["parallel-analysis"]["analyzers"].get("analyzer-ml")
        if nlp and ml:
            combined_confidence = (nlp["confidence"] + ml["confidence"]) / 2
            # Use combined results to inform this analyzer's decision
    ...

Analyzers that don't need previous results can ignore analysis_context entirely -- it defaults to an empty dict.

Complete Workflow Example

workflows:
  - id: "default-workflow"
    name: "Default Phishing Detection Workflow"
    version: 1
    description: "Parallel NLP + LLM analysis followed by decision"
    stages:
      - name: "parallel-analysis"
        type: "parallel"
        depends_on: []
        continue_on_failure: true
        min_successful_analyzers: 2
        aggregation_method: "weighted_average"
        analyzers:
          - id: "analyzer-nlp"
            weight: 0.5
            timeout_ms: 30000
            required: false
            failure_strategy: "skip"
            retry_config:
              max_attempts: 2
              backoff_ms: 1000
              max_backoff_ms: 5000
              exponential_backoff: false
          - id: "analyzer-llm"
            weight: 0.5
            timeout_ms: 45000
            required: false
            failure_strategy: "skip"

      - name: "decision"
        type: "sequential"
        depends_on: ["parallel-analysis"]
        continue_on_failure: false
        analyzers:
          - id: "analyzer-llm"
            weight: 1.0
            timeout_ms: 60000
            required: true
            failure_strategy: "retry"

    timeout_ms: 300000
    max_parallel_analyzers: 10
    final_stage_name: "decision"
    confidence_threshold: 0.7
    created_by: "system"

Analyzer Settings (spot.yaml)

The config/spot.yaml file configures analyzer behavior centrally. Analyzers fetch their configuration from the API Gateway on startup.

Structure

version: "1.0"

platform:
  log_level: INFO    # Global log level
  debug: false       # Enable debug mode

analyzers:
  analyzer-id:
    enabled: true    # Enable/disable analyzer
    settings:        # Analyzer-specific settings (override defaults)
      key: value

Analyzer Settings Example

analyzers:
  analyzer-nlp:
    enabled: true
    settings:
      host: "0.0.0.0"
      port: 8000
      log_level: INFO
      sentiment_threshold: 0.7      # NLP-specific threshold
      ner_confidence_threshold: 0.8
      phishing_score_threshold: 0.6

  analyzer-llm:
    enabled: true
    settings:
      host: "0.0.0.0"
      port: 8000
      ollama_host: "http://ollama:11434"
      ollama_model: "llama2:7b-chat"
      ollama_timeout: 60
      max_tokens: 500
      temperature: 0.1
      confidence_threshold: 0.6

  analyzer-context:
    enabled: false   # Disabled by default
    settings:
      rule_file: "/app/rules/context_rules.yaml"
      cache_ttl_seconds: 300

Config Reload

Configuration can be reloaded without restart:

# Reload config via API
curl -X POST http://localhost:8001/api/v1/config/reload \
  -H "Authorization: Bearer $TOKEN"

# Response shows what changed
{
  "old_version": "abc123@20251204",
  "new_version": "def456@20251204",
  "changed": {
    "platform": false,
    "workflows": true,
    "analyzers": ["analyzer-nlp"]
  }
}

Reload behavior: - Invalid YAML/schema returns 400, previous config preserved - Version only bumps if content actually changed - Concurrent reloads are serialized (one at a time)

Development Configuration

When APP_ENV=dev, additional variables are available:

Source Code Mounting

Variable Description
API_GATEWAY_SRC_MOUNT Path to API Gateway source
API_GATEWAY_TEST_MOUNT Path to API Gateway tests
ANALYZER_ORCHESTRATOR_SRC_MOUNT Path to Analyzer Orchestrator source
ANALYZER_ORCHESTRATOR_TEST_MOUNT Path to Analyzer Orchestrator tests
MAIL_ORCHESTRATOR_SRC_MOUNT Path to Mail Orchestrator source
MAIL_ORCHESTRATOR_TEST_MOUNT Path to Mail Orchestrator tests
SHARED_MOUNT Path to shared modules
CONFIG_MOUNT Path to config directory
MOUNT_MODE Mount mode (rw or ro)

Debug Ports

Variable Default Description
ANALYZER_ORCHESTRATOR_DEBUG_PORT 8091 Debug port for analyzer orchestrator
MAIL_ORCHESTRATOR_DEBUG_PORT 8092 Debug port for mail orchestrator

Development Tools

Variable Default Description
MAILHOG_SMTP_PORT 1025 Mailhog SMTP port
MAILHOG_WEB_PORT 8025 Mailhog web UI port
ADMINER_PORT 8080 Adminer database UI port
HOST_UID 1000 Host user ID for devtools container
HOST_GID 1000 Host group ID for devtools container

Production Configuration

When APP_ENV=prod:

Docker Registry

Variable Default Description
REGISTRY_PORT 5000 Docker registry port
CI_REGISTRY_IMAGE (CI only) Full registry path for platform services (e.g., localhost:5000/spot/platform)
CI_REGISTRY (CI only) Registry host for external analyzers (e.g., localhost:5000)
VERSION latest Image version tag
BASE_IMAGE base:latest Base image name and tag

Note: CI_REGISTRY_IMAGE and CI_REGISTRY are NOT set for local development. They are only set in .gitlab-ci-local-env for CI context.

CI/CD Configuration

GitLab-specific variables (only needed for CI/CD):

Variable Description
GITLAB_HOST GitLab hostname
GITLAB_TOKEN GitLab access token (uses CI_JOB_TOKEN if available)
CI_REGISTRY Container registry URL
GITLAB_GROUP GitLab group name

Configuration Precedence

Configuration is loaded in this order (highest to lowest priority):

  1. Command-line environment variables
  2. .env file in project root
  3. Code defaults (in Pydantic Settings classes)

Example:

# .env file has: LOG_LEVEL=INFO
# Command-line override:
LOG_LEVEL=DEBUG make service:start    # Uses DEBUG

Configuration Format

Environment Variable Prefixes

SPOT uses standard environment variable names without a global prefix:

  • Infrastructure: POSTGRES_*, REDIS_*, RABBITMQ_*
  • Application: APP_ENV, LOG_LEVEL, SECRET_KEY
  • Services: SPOT_MAIL_* (analyzers configured in config/spot.yaml)

JSON Configuration

Some variables accept JSON objects:

# Mail Retrievers (JSON object)
SPOT_MAIL_RETRIEVERS='{"imap":{"url":"http://mail-retriever:8000","priority":1}}'

Note: Analyzer configuration has moved from environment variables to config/spot.yaml.

Quick Reference Examples

Minimal Production .env

# Environment
APP_ENV=prod

# Security (REQUIRED - generate secure values)
SECRET_KEY=generate-with-python-secrets-module

# Database
POSTGRES_DB=spot
POSTGRES_USER=spot
POSTGRES_PASSWORD=secure_db_password

# Redis (optional password)
REDIS_PASSWORD=secure_redis_password

# RabbitMQ
RABBITMQ_DEFAULT_USER=spot
RABBITMQ_DEFAULT_PASS=secure_rabbitmq_password

# Trusted hosts (your domains)
TRUSTED_HOSTS=spot.example.com,api.example.com

Minimal Development .env

# Environment
APP_ENV=dev

# Security
SECRET_KEY=dev-secret-key-for-testing-only

# Database (dev defaults)
POSTGRES_DB=spot
POSTGRES_USER=spot
POSTGRES_PASSWORD=spot123

# Redis (no password in dev)
REDIS_PASSWORD=

# RabbitMQ (dev defaults)
RABBITMQ_DEFAULT_USER=guest
RABBITMQ_DEFAULT_PASS=guest

Full Example with Analyzers

.env:

# Environment
APP_ENV=prod

# Security
SECRET_KEY=your-32-character-random-key-here
TRUSTED_HOSTS=spot.example.com

# Infrastructure
POSTGRES_PASSWORD=secure_password
RABBITMQ_DEFAULT_PASS=secure_password

config/spot.yaml:

analyzers:
  analyzer-nlp:
    enabled: true
    url: "http://10.0.1.10:8000"
    settings: {}
  analyzer-llm:
    enabled: true
    url: "http://10.0.1.11:8000"
    settings: {}

Validation

The platform validates configuration at startup:

  • Database URL format
  • RabbitMQ URL format
  • Analyzer URL formats
  • Port ranges (1-65535)

Invalid configuration will cause startup to fail with a descriptive error message.