Configuration Reference¶

SPOT Platform uses environment variables for configuration. All settings can be configured via .env file or environment variables.

Quick Start¶

Copy .env.example to .env and customize:

cp .env.example .env

See .env.example for comprehensive documentation of all available variables.

Environment Control¶

# Set environment mode
APP_ENV=prod|dev|test

# Default: prod (for safety)

See Environment Configuration for details on environment management.

Required Variables¶

These variables MUST be set in production:

Variable	Description	Example
`APP_ENV`	Environment mode	`prod`, `dev`, or `test`
`SECRET_KEY`	JWT token signing key	Generate with: `python -c "import secrets; print(secrets.token_urlsafe(32))"`
`POSTGRES_DB`	PostgreSQL database name	`spot`
`POSTGRES_USER`	PostgreSQL username	`spot`
`POSTGRES_PASSWORD`	PostgreSQL password	`secure_password`
`RABBITMQ_DEFAULT_USER`	RabbitMQ username	`guest`
`RABBITMQ_DEFAULT_PASS`	RabbitMQ password	`secure_password`

Core Configuration¶

Docker Compose¶

Variable	Default	Description
`COMPOSE_PROJECT_NAME`	`spot`	Project name for consistent container/network naming

Application Settings¶

Variable	Default	Description
`LOG_LEVEL`	`INFO`	Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`
`DEBUG`	`false`	Enable debug mode
`TZ`	`UTC`	Timezone

Security¶

Variable	Default	Description
`SECRET_KEY`	(required)	Secret key for JWT token signing - MUST be set
`INTERNAL_API_KEY`	(none)	Internal API key for service-to-service auth (optional)
`TRUSTED_HOSTS`	`localhost,127.0.0.1`	Comma-separated list of trusted hosts

Database (PostgreSQL)¶

Variable	Default	Description
`POSTGRES_DB`	`spot`	Database name
`POSTGRES_USER`	`spot`	Database username
`POSTGRES_PASSWORD`	`spot123`	Database password
`POSTGRES_PORT`	`5432`	Database port
`DATABASE_URL`	(auto-constructed)	Full connection URL

Redis Cache¶

Variable	Default	Description
`REDIS_PASSWORD`	(empty)	Redis password (empty = no auth)
`REDIS_PORT`	`6379`	Redis port
`REDIS_URL`	(auto-constructed)	Full connection URL
`REDIS_MAXMEMORY`	`256mb`	Maximum memory
`REDIS_MAXMEMORY_POLICY`	`allkeys-lru`	Eviction policy

RabbitMQ Message Queue¶

Variable	Default	Description
`RABBITMQ_DEFAULT_USER`	`guest`	RabbitMQ username
`RABBITMQ_DEFAULT_PASS`	`guest`	RabbitMQ password
`RABBITMQ_PORT`	`5672`	AMQP port
`RABBITMQ_MGMT_PORT`	`15672`	Management UI port
`RABBITMQ_URL`	(auto-constructed)	Full connection URL

Service Configuration¶

API Gateway¶

Variable	Default	Description
`API_GATEWAY_PORT`	`8001`	External API port

Plugin Configuration¶

"Plugin" is the umbrella vocabulary for anything pluggable into SPOT. Two kinds exist today:

analyzers run POST /internal/analyze and produce a phishing verdict (AnalysisResult).
context providers run POST /internal/enrich and enrich emails with organisational data (EnrichmentResult) before analyzers run.

Both are configured under the plugins: section of config/spot.yaml:

plugins:
  analyzers:
    analyzer-nlp:
      enabled: true
      url: "http://analyzer-nlp:8000"
      settings: {}
    analyzer-llm:
      enabled: true
      url: "http://analyzer-llm:8000"
      settings: {}
  context_providers:
    employee-dir:
      enabled: true
      url: "http://provider-employee-dir:8000"
      settings:
        LDAP_HOST: ldap.example.com

Each entry needs at minimum url and enabled. Installed plugins also carry image, version, container_id, container_name, installed_at (set automatically by the installer).

Context providers expose POST /internal/enrich and return an EnrichmentResult. See the Context Providers guide for the full contract and implementation examples.

Context providers are referenced from workflow stages via a context_providers list:

stages:
  - name: enrichment
    type: parallel
    context_providers:
      - id: employee-dir
        timeout_ms: 5000
        required: true
    analyzers: [...]

Custom Plugin Configuration¶

Plugin behaviour (model paths, API tokens, feature flags, ...) is managed differently from platform orchestration config:

Aspect	Platform Configuration	Plugin Configuration
Purpose	Register and connect to plugins	Configure plugin behaviour
Location	`config/spot.yaml`	Plugin repository `.env`
Format	`plugins.{analyzers,context_providers}:` section	Plugin-specific settings
Scope	Platform-wide orchestration	Single plugin instance

Platform side (config/spot.yaml):

plugins:
  analyzers:
    my-analyzer:
      enabled: true
      url: "http://my-analyzer:8000"
      settings: {}

Analyzer side (your analyzer/.env):

# Analyzer-specific configuration
MODEL_PATH=/models/my-model.bin
CONFIDENCE_THRESHOLD=0.75
MAX_EMAIL_SIZE=10MB

Mail Retriever Configuration¶

Variable	Default	Description
`SPOT_MAIL_RETRIEVERS`	`{}`	JSON object with retriever configs

Example SPOT_MAIL_RETRIEVERS:

SPOT_MAIL_RETRIEVERS='{"imap":{"url":"http://mail-retriever:8000","priority":1,"enabled":true}}'

Workflow and Analyzer Configuration Files¶

Variable	Default	Description
`SPOT_CONFIG_FILE`	`config/spot.yaml`	Path to analyzer configuration (falls back to `config/defaults/spot.yaml` if not found)
`SPOT_WORKFLOWS_FILE`	`config/workflows.yaml`	Path to workflow configuration (falls back to `config/defaults/workflows.yaml` if not found)

See Workflow YAML Schema and Analyzer Settings below for detailed schema documentation.

Workflow YAML Schema¶

Workflows define how analyzers are orchestrated to detect spear-phishing emails. Configuration is in config/workflows.yaml.

Workflow Structure¶

workflows:
  - id: "workflow-id"           # Required: Unique identifier
    name: "Human Readable Name" # Required: Display name
    version: 1                  # Schema version (integer)
    description: "Description"  # Optional description
    stages: [...]               # Required: List of stages
    timeout_ms: 300000          # Total workflow timeout (default: 5 min)
    max_parallel_analyzers: 10  # Max concurrent analyzers
    final_stage_name: "stage"   # Stage that produces final result
    confidence_threshold: 0.7   # Min confidence for detection (0.0-1.0)
    created_by: "system"        # Creator identifier

Stage Configuration¶

Each stage groups analyzers that run together:

stages:
  - name: "stage-name"          # Required: Unique within workflow
    type: "parallel"            # parallel | sequential | conditional
    depends_on: []              # List of stage names this depends on
    continue_on_failure: true   # Continue if some analyzers fail
    min_successful_analyzers: 2 # Minimum analyzers that must succeed
    aggregation_method: "weighted_average"  # How to combine results
    analyzers: [...]            # List of analyzer configs
    condition: null             # Optional: Expression for conditional stages

Stage Types: - parallel - Run all analyzers concurrently - sequential - Run analyzers one after another - conditional - Run based on condition expression

Aggregation Methods: - weighted_average - Combine scores using analyzer weights - max_confidence - Take highest confidence score - majority_vote - Use most common classification

Analyzer Configuration (in Workflow)¶

Each analyzer within a stage:

analyzers:
  - id: "analyzer-nlp"          # Required: Analyzer identifier
    weight: 0.5                 # Score weight (0.0-1.0)
    timeout_ms: 30000           # Per-analyzer timeout
    required: false             # If true, stage fails if analyzer fails
    failure_strategy: "skip"    # skip | retry | fail
    retry_config:               # Optional retry settings
      max_attempts: 3
      backoff_ms: 1000
      max_backoff_ms: 10000
      exponential_backoff: true
    condition: null             # Optional: When to run this analyzer

Failure Strategies: - skip - Continue workflow without this analyzer's result - retry - Retry according to retry_config, then skip/fail - fail - Immediately fail the entire stage

Accessing Previous Stage Results (`analysis_context`)¶

Every analyzer automatically receives the results of all previously-completed stages via Email.analysis_context. No configuration is required -- the orchestrator builds the context before calling each analyzer.

Structure:

email.analysis_context = {
    "<stage-name>": {
        "providers": {
            "<provider-id>": { ...free-form data... }
        },
        "analyzers": {
            "<analyzer-id>": { ...AnalyzerResult fields... }
        }
    },
    ...
}

Top-level keys are stage names
Each stage contains a providers dict (empty until Context Providers land) and an analyzers dict
Analyzer results expose all AnalyzerResult fields: is_phishing, confidence, threat_level, indicators, analyzer_details, etc.
Only stages that have already completed at the time the analyzer runs are included

Example access in an analyzer:

@app.post("/internal/analyze")
async def analyze_email(email: Email) -> AnalysisResult:
    ctx = email.analysis_context
    if "parallel-analysis" in ctx:
        nlp = ctx["parallel-analysis"]["analyzers"].get("analyzer-nlp")
        ml = ctx["parallel-analysis"]["analyzers"].get("analyzer-ml")
        if nlp and ml:
            combined_confidence = (nlp["confidence"] + ml["confidence"]) / 2
            # Use combined results to inform this analyzer's decision
    ...

Analyzers that don't need previous results can ignore analysis_context entirely -- it defaults to an empty dict.

Complete Workflow Example¶

workflows:
  - id: "default-workflow"
    name: "Default Phishing Detection Workflow"
    version: 1
    description: "Parallel NLP + LLM analysis followed by decision"
    stages:
      - name: "parallel-analysis"
        type: "parallel"
        depends_on: []
        continue_on_failure: true
        min_successful_analyzers: 2
        aggregation_method: "weighted_average"
        analyzers:
          - id: "analyzer-nlp"
            weight: 0.5
            timeout_ms: 30000
            required: false
            failure_strategy: "skip"
            retry_config:
              max_attempts: 2
              backoff_ms: 1000
              max_backoff_ms: 5000
              exponential_backoff: false
          - id: "analyzer-llm"
            weight: 0.5
            timeout_ms: 45000
            required: false
            failure_strategy: "skip"

      - name: "decision"
        type: "sequential"
        depends_on: ["parallel-analysis"]
        continue_on_failure: false
        analyzers:
          - id: "analyzer-llm"
            weight: 1.0
            timeout_ms: 60000
            required: true
            failure_strategy: "retry"

    timeout_ms: 300000
    max_parallel_analyzers: 10
    final_stage_name: "decision"
    confidence_threshold: 0.7
    created_by: "system"

Analyzer Settings (spot.yaml)¶

The config/spot.yaml file configures analyzer behavior centrally. Analyzers fetch their configuration from the API Gateway on startup.

Structure¶

version: "1.0"

platform:
  log_level: INFO    # Global log level
  debug: false       # Enable debug mode

analyzers:
  analyzer-id:
    enabled: true    # Enable/disable analyzer
    settings:        # Analyzer-specific settings (override defaults)
      key: value

Analyzer Settings Example¶

analyzers:
  analyzer-nlp:
    enabled: true
    settings:
      host: "0.0.0.0"
      port: 8000
      log_level: INFO
      sentiment_threshold: 0.7      # NLP-specific threshold
      ner_confidence_threshold: 0.8
      phishing_score_threshold: 0.6

  analyzer-llm:
    enabled: true
    settings:
      host: "0.0.0.0"
      port: 8000
      ollama_host: "http://ollama:11434"
      ollama_model: "llama2:7b-chat"
      ollama_timeout: 60
      max_tokens: 500
      temperature: 0.1
      confidence_threshold: 0.6

  analyzer-context:
    enabled: false   # Disabled by default
    settings:
      rule_file: "/app/rules/context_rules.yaml"
      cache_ttl_seconds: 300

Config Reload¶

Configuration can be reloaded without restart:

# Reload config via API
curl -X POST http://localhost:8001/api/v1/config/reload \
  -H "Authorization: Bearer $TOKEN"

# Response shows what changed
{
  "old_version": "abc123@20251204",
  "new_version": "def456@20251204",
  "changed": {
    "platform": false,
    "workflows": true,
    "analyzers": ["analyzer-nlp"]
  }
}

Reload behavior: - Invalid YAML/schema returns 400, previous config preserved - Version only bumps if content actually changed - Concurrent reloads are serialized (one at a time)

Development Configuration¶

When APP_ENV=dev, additional variables are available:

Source Code Mounting¶

Variable	Description
`API_GATEWAY_SRC_MOUNT`	Path to API Gateway source
`API_GATEWAY_TEST_MOUNT`	Path to API Gateway tests
`ANALYZER_ORCHESTRATOR_SRC_MOUNT`	Path to Analyzer Orchestrator source
`ANALYZER_ORCHESTRATOR_TEST_MOUNT`	Path to Analyzer Orchestrator tests
`MAIL_ORCHESTRATOR_SRC_MOUNT`	Path to Mail Orchestrator source
`MAIL_ORCHESTRATOR_TEST_MOUNT`	Path to Mail Orchestrator tests
`SHARED_MOUNT`	Path to shared modules
`CONFIG_MOUNT`	Path to config directory
`MOUNT_MODE`	Mount mode (`rw` or `ro`)

Debug Ports¶

Variable	Default	Description
`ANALYZER_ORCHESTRATOR_DEBUG_PORT`	`8091`	Debug port for analyzer orchestrator
`MAIL_ORCHESTRATOR_DEBUG_PORT`	`8092`	Debug port for mail orchestrator

Development Tools¶

Variable	Default	Description
`MAILHOG_SMTP_PORT`	`1025`	Mailhog SMTP port
`MAILHOG_WEB_PORT`	`8025`	Mailhog web UI port
`ADMINER_PORT`	`8080`	Adminer database UI port
`HOST_UID`	`1000`	Host user ID for devtools container
`HOST_GID`	`1000`	Host group ID for devtools container

Production Configuration¶

When APP_ENV=prod:

Docker Registry¶

Variable	Default	Description
`REGISTRY_PORT`	`5000`	Docker registry port
`CI_REGISTRY_IMAGE`	(CI only)	Full registry path for platform services (e.g., `localhost:5000/spot/platform`)
`CI_REGISTRY`	(CI only)	Registry host for external analyzers (e.g., `localhost:5000`)
`VERSION`	`latest`	Image version tag
`BASE_IMAGE`	`base:latest`	Base image name and tag

Note: CI_REGISTRY_IMAGE and CI_REGISTRY are NOT set for local development. They are only set in .gitlab-ci-local-env for CI context.

CI/CD Configuration¶

GitLab-specific variables (only needed for CI/CD):

Variable	Description
`GITLAB_HOST`	GitLab hostname
`GITLAB_TOKEN`	GitLab access token (uses `CI_JOB_TOKEN` if available)
`CI_REGISTRY`	Container registry URL
`GITLAB_GROUP`	GitLab group name

Configuration Precedence¶

Configuration is loaded in this order (highest to lowest priority):

Command-line environment variables
.env file in project root
Code defaults (in Pydantic Settings classes)

Example:

# .env file has: LOG_LEVEL=INFO
# Command-line override:
LOG_LEVEL=DEBUG make service:start    # Uses DEBUG

Configuration Format¶

Environment Variable Prefixes¶

SPOT uses standard environment variable names without a global prefix:

Infrastructure: POSTGRES_*, REDIS_*, RABBITMQ_*
Application: APP_ENV, LOG_LEVEL, SECRET_KEY
Services: SPOT_MAIL_* (analyzers configured in config/spot.yaml)

JSON Configuration¶

Some variables accept JSON objects:

# Mail Retrievers (JSON object)
SPOT_MAIL_RETRIEVERS='{"imap":{"url":"http://mail-retriever:8000","priority":1}}'

Note: Analyzer configuration has moved from environment variables to config/spot.yaml.

Quick Reference Examples¶

Minimal Production .env¶

# Environment
APP_ENV=prod

# Security (REQUIRED - generate secure values)
SECRET_KEY=generate-with-python-secrets-module

# Database
POSTGRES_DB=spot
POSTGRES_USER=spot
POSTGRES_PASSWORD=secure_db_password

# Redis (optional password)
REDIS_PASSWORD=secure_redis_password

# RabbitMQ
RABBITMQ_DEFAULT_USER=spot
RABBITMQ_DEFAULT_PASS=secure_rabbitmq_password

# Trusted hosts (your domains)
TRUSTED_HOSTS=spot.example.com,api.example.com

Minimal Development .env¶

# Environment
APP_ENV=dev

# Security
SECRET_KEY=dev-secret-key-for-testing-only

# Database (dev defaults)
POSTGRES_DB=spot
POSTGRES_USER=spot
POSTGRES_PASSWORD=spot123

# Redis (no password in dev)
REDIS_PASSWORD=

# RabbitMQ (dev defaults)
RABBITMQ_DEFAULT_USER=guest
RABBITMQ_DEFAULT_PASS=guest

Full Example with Analyzers¶

.env:

# Environment
APP_ENV=prod

# Security
SECRET_KEY=your-32-character-random-key-here
TRUSTED_HOSTS=spot.example.com

# Infrastructure
POSTGRES_PASSWORD=secure_password
RABBITMQ_DEFAULT_PASS=secure_password

config/spot.yaml:

analyzers:
  analyzer-nlp:
    enabled: true
    url: "http://10.0.1.10:8000"
    settings: {}
  analyzer-llm:
    enabled: true
    url: "http://10.0.1.11:8000"
    settings: {}

Validation¶

The platform validates configuration at startup:

Database URL format
RabbitMQ URL format
Analyzer URL formats
Port ranges (1-65535)

Invalid configuration will cause startup to fail with a descriptive error message.

Environment Management - Environment switching guide
Admin Guide - Deployment configuration
Developer Guide - Development setup