Configuration Reference¶
SPOT Platform uses environment variables for configuration. All settings can be configured via .env file or environment variables.
Quick Start¶
Copy .env.example to .env and customize:
See .env.example for comprehensive documentation of all available variables.
Environment Control¶
See Environment Configuration for details on environment management.
Required Variables¶
These variables MUST be set in production:
| Variable | Description | Example |
|---|---|---|
APP_ENV | Environment mode | prod, dev, or test |
SECRET_KEY | JWT token signing key | Generate with: python -c "import secrets; print(secrets.token_urlsafe(32))" |
POSTGRES_DB | PostgreSQL database name | spot |
POSTGRES_USER | PostgreSQL username | spot |
POSTGRES_PASSWORD | PostgreSQL password | secure_password |
RABBITMQ_DEFAULT_USER | RabbitMQ username | guest |
RABBITMQ_DEFAULT_PASS | RabbitMQ password | secure_password |
Core Configuration¶
Docker Compose¶
| Variable | Default | Description |
|---|---|---|
COMPOSE_PROJECT_NAME | spot | Project name for consistent container/network naming |
Application Settings¶
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL | INFO | Logging level: DEBUG, INFO, WARNING, ERROR |
DEBUG | false | Enable debug mode |
TZ | UTC | Timezone |
Security¶
| Variable | Default | Description |
|---|---|---|
SECRET_KEY | (required) | Secret key for JWT token signing - MUST be set |
INTERNAL_API_KEY | (none) | Internal API key for service-to-service auth (optional) |
TRUSTED_HOSTS | localhost,127.0.0.1 | Comma-separated list of trusted hosts |
Database (PostgreSQL)¶
| Variable | Default | Description |
|---|---|---|
POSTGRES_DB | spot | Database name |
POSTGRES_USER | spot | Database username |
POSTGRES_PASSWORD | spot123 | Database password |
POSTGRES_PORT | 5432 | Database port |
DATABASE_URL | (auto-constructed) | Full connection URL |
Redis Cache¶
| Variable | Default | Description |
|---|---|---|
REDIS_PASSWORD | (empty) | Redis password (empty = no auth) |
REDIS_PORT | 6379 | Redis port |
REDIS_URL | (auto-constructed) | Full connection URL |
REDIS_MAXMEMORY | 256mb | Maximum memory |
REDIS_MAXMEMORY_POLICY | allkeys-lru | Eviction policy |
RabbitMQ Message Queue¶
| Variable | Default | Description |
|---|---|---|
RABBITMQ_DEFAULT_USER | guest | RabbitMQ username |
RABBITMQ_DEFAULT_PASS | guest | RabbitMQ password |
RABBITMQ_PORT | 5672 | AMQP port |
RABBITMQ_MGMT_PORT | 15672 | Management UI port |
RABBITMQ_URL | (auto-constructed) | Full connection URL |
Service Configuration¶
API Gateway¶
| Variable | Default | Description |
|---|---|---|
API_GATEWAY_PORT | 8001 | External API port |
Plugin Configuration¶
"Plugin" is the umbrella vocabulary for anything pluggable into SPOT. Two kinds exist today:
- analyzers run
POST /internal/analyzeand produce a phishing verdict (AnalysisResult). - context providers run
POST /internal/enrichand enrich emails with organisational data (EnrichmentResult) before analyzers run.
Both are configured under the plugins: section of config/spot.yaml:
plugins:
analyzers:
analyzer-nlp:
enabled: true
url: "http://analyzer-nlp:8000"
settings: {}
analyzer-llm:
enabled: true
url: "http://analyzer-llm:8000"
settings: {}
context_providers:
employee-dir:
enabled: true
url: "http://provider-employee-dir:8000"
settings:
LDAP_HOST: ldap.example.com
Each entry needs at minimum url and enabled. Installed plugins also carry image, version, container_id, container_name, installed_at (set automatically by the installer).
Context providers expose POST /internal/enrich and return an EnrichmentResult. See the Context Providers guide for the full contract and implementation examples.
Context providers are referenced from workflow stages via a context_providers list:
stages:
- name: enrichment
type: parallel
context_providers:
- id: employee-dir
timeout_ms: 5000
required: true
analyzers: [...]
Custom Plugin Configuration¶
Plugin behaviour (model paths, API tokens, feature flags, ...) is managed differently from platform orchestration config:
| Aspect | Platform Configuration | Plugin Configuration |
|---|---|---|
| Purpose | Register and connect to plugins | Configure plugin behaviour |
| Location | config/spot.yaml | Plugin repository .env |
| Format | plugins.{analyzers,context_providers}: section | Plugin-specific settings |
| Scope | Platform-wide orchestration | Single plugin instance |
Platform side (config/spot.yaml):
Analyzer side (your analyzer/.env):
# Analyzer-specific configuration
MODEL_PATH=/models/my-model.bin
CONFIDENCE_THRESHOLD=0.75
MAX_EMAIL_SIZE=10MB
Mail Retriever Configuration¶
| Variable | Default | Description |
|---|---|---|
SPOT_MAIL_RETRIEVERS | {} | JSON object with retriever configs |
Example SPOT_MAIL_RETRIEVERS:
Workflow and Analyzer Configuration Files¶
| Variable | Default | Description |
|---|---|---|
SPOT_CONFIG_FILE | config/spot.yaml | Path to analyzer configuration (falls back to config/defaults/spot.yaml if not found) |
SPOT_WORKFLOWS_FILE | config/workflows.yaml | Path to workflow configuration (falls back to config/defaults/workflows.yaml if not found) |
See Workflow YAML Schema and Analyzer Settings below for detailed schema documentation.
Workflow YAML Schema¶
Workflows define how analyzers are orchestrated to detect spear-phishing emails. Configuration is in config/workflows.yaml.
Workflow Structure¶
workflows:
- id: "workflow-id" # Required: Unique identifier
name: "Human Readable Name" # Required: Display name
version: 1 # Schema version (integer)
description: "Description" # Optional description
stages: [...] # Required: List of stages
timeout_ms: 300000 # Total workflow timeout (default: 5 min)
max_parallel_analyzers: 10 # Max concurrent analyzers
final_stage_name: "stage" # Stage that produces final result
confidence_threshold: 0.7 # Min confidence for detection (0.0-1.0)
created_by: "system" # Creator identifier
Stage Configuration¶
Each stage groups analyzers that run together:
stages:
- name: "stage-name" # Required: Unique within workflow
type: "parallel" # parallel | sequential | conditional
depends_on: [] # List of stage names this depends on
continue_on_failure: true # Continue if some analyzers fail
min_successful_analyzers: 2 # Minimum analyzers that must succeed
aggregation_method: "weighted_average" # How to combine results
analyzers: [...] # List of analyzer configs
condition: null # Optional: Expression for conditional stages
Stage Types: - parallel - Run all analyzers concurrently - sequential - Run analyzers one after another - conditional - Run based on condition expression
Aggregation Methods: - weighted_average - Combine scores using analyzer weights - max_confidence - Take highest confidence score - majority_vote - Use most common classification
Analyzer Configuration (in Workflow)¶
Each analyzer within a stage:
analyzers:
- id: "analyzer-nlp" # Required: Analyzer identifier
weight: 0.5 # Score weight (0.0-1.0)
timeout_ms: 30000 # Per-analyzer timeout
required: false # If true, stage fails if analyzer fails
failure_strategy: "skip" # skip | retry | fail
retry_config: # Optional retry settings
max_attempts: 3
backoff_ms: 1000
max_backoff_ms: 10000
exponential_backoff: true
condition: null # Optional: When to run this analyzer
Failure Strategies: - skip - Continue workflow without this analyzer's result - retry - Retry according to retry_config, then skip/fail - fail - Immediately fail the entire stage
Accessing Previous Stage Results (analysis_context)¶
Every analyzer automatically receives the results of all previously-completed stages via Email.analysis_context. No configuration is required -- the orchestrator builds the context before calling each analyzer.
Structure:
email.analysis_context = {
"<stage-name>": {
"providers": {
"<provider-id>": { ...free-form data... }
},
"analyzers": {
"<analyzer-id>": { ...AnalyzerResult fields... }
}
},
...
}
- Top-level keys are stage names
- Each stage contains a
providersdict (empty until Context Providers land) and ananalyzersdict - Analyzer results expose all
AnalyzerResultfields:is_phishing,confidence,threat_level,indicators,analyzer_details, etc. - Only stages that have already completed at the time the analyzer runs are included
Example access in an analyzer:
@app.post("/internal/analyze")
async def analyze_email(email: Email) -> AnalysisResult:
ctx = email.analysis_context
if "parallel-analysis" in ctx:
nlp = ctx["parallel-analysis"]["analyzers"].get("analyzer-nlp")
ml = ctx["parallel-analysis"]["analyzers"].get("analyzer-ml")
if nlp and ml:
combined_confidence = (nlp["confidence"] + ml["confidence"]) / 2
# Use combined results to inform this analyzer's decision
...
Analyzers that don't need previous results can ignore analysis_context entirely -- it defaults to an empty dict.
Complete Workflow Example¶
workflows:
- id: "default-workflow"
name: "Default Phishing Detection Workflow"
version: 1
description: "Parallel NLP + LLM analysis followed by decision"
stages:
- name: "parallel-analysis"
type: "parallel"
depends_on: []
continue_on_failure: true
min_successful_analyzers: 2
aggregation_method: "weighted_average"
analyzers:
- id: "analyzer-nlp"
weight: 0.5
timeout_ms: 30000
required: false
failure_strategy: "skip"
retry_config:
max_attempts: 2
backoff_ms: 1000
max_backoff_ms: 5000
exponential_backoff: false
- id: "analyzer-llm"
weight: 0.5
timeout_ms: 45000
required: false
failure_strategy: "skip"
- name: "decision"
type: "sequential"
depends_on: ["parallel-analysis"]
continue_on_failure: false
analyzers:
- id: "analyzer-llm"
weight: 1.0
timeout_ms: 60000
required: true
failure_strategy: "retry"
timeout_ms: 300000
max_parallel_analyzers: 10
final_stage_name: "decision"
confidence_threshold: 0.7
created_by: "system"
Analyzer Settings (spot.yaml)¶
The config/spot.yaml file configures analyzer behavior centrally. Analyzers fetch their configuration from the API Gateway on startup.
Structure¶
version: "1.0"
platform:
log_level: INFO # Global log level
debug: false # Enable debug mode
analyzers:
analyzer-id:
enabled: true # Enable/disable analyzer
settings: # Analyzer-specific settings (override defaults)
key: value
Analyzer Settings Example¶
analyzers:
analyzer-nlp:
enabled: true
settings:
host: "0.0.0.0"
port: 8000
log_level: INFO
sentiment_threshold: 0.7 # NLP-specific threshold
ner_confidence_threshold: 0.8
phishing_score_threshold: 0.6
analyzer-llm:
enabled: true
settings:
host: "0.0.0.0"
port: 8000
ollama_host: "http://ollama:11434"
ollama_model: "llama2:7b-chat"
ollama_timeout: 60
max_tokens: 500
temperature: 0.1
confidence_threshold: 0.6
analyzer-context:
enabled: false # Disabled by default
settings:
rule_file: "/app/rules/context_rules.yaml"
cache_ttl_seconds: 300
Config Reload¶
Configuration can be reloaded without restart:
# Reload config via API
curl -X POST http://localhost:8001/api/v1/config/reload \
-H "Authorization: Bearer $TOKEN"
# Response shows what changed
{
"old_version": "abc123@20251204",
"new_version": "def456@20251204",
"changed": {
"platform": false,
"workflows": true,
"analyzers": ["analyzer-nlp"]
}
}
Reload behavior: - Invalid YAML/schema returns 400, previous config preserved - Version only bumps if content actually changed - Concurrent reloads are serialized (one at a time)
Development Configuration¶
When APP_ENV=dev, additional variables are available:
Source Code Mounting¶
| Variable | Description |
|---|---|
API_GATEWAY_SRC_MOUNT | Path to API Gateway source |
API_GATEWAY_TEST_MOUNT | Path to API Gateway tests |
ANALYZER_ORCHESTRATOR_SRC_MOUNT | Path to Analyzer Orchestrator source |
ANALYZER_ORCHESTRATOR_TEST_MOUNT | Path to Analyzer Orchestrator tests |
MAIL_ORCHESTRATOR_SRC_MOUNT | Path to Mail Orchestrator source |
MAIL_ORCHESTRATOR_TEST_MOUNT | Path to Mail Orchestrator tests |
SHARED_MOUNT | Path to shared modules |
CONFIG_MOUNT | Path to config directory |
MOUNT_MODE | Mount mode (rw or ro) |
Debug Ports¶
| Variable | Default | Description |
|---|---|---|
ANALYZER_ORCHESTRATOR_DEBUG_PORT | 8091 | Debug port for analyzer orchestrator |
MAIL_ORCHESTRATOR_DEBUG_PORT | 8092 | Debug port for mail orchestrator |
Development Tools¶
| Variable | Default | Description |
|---|---|---|
MAILHOG_SMTP_PORT | 1025 | Mailhog SMTP port |
MAILHOG_WEB_PORT | 8025 | Mailhog web UI port |
ADMINER_PORT | 8080 | Adminer database UI port |
HOST_UID | 1000 | Host user ID for devtools container |
HOST_GID | 1000 | Host group ID for devtools container |
Production Configuration¶
When APP_ENV=prod:
Docker Registry¶
| Variable | Default | Description |
|---|---|---|
REGISTRY_PORT | 5000 | Docker registry port |
CI_REGISTRY_IMAGE | (CI only) | Full registry path for platform services (e.g., localhost:5000/spot/platform) |
CI_REGISTRY | (CI only) | Registry host for external analyzers (e.g., localhost:5000) |
VERSION | latest | Image version tag |
BASE_IMAGE | base:latest | Base image name and tag |
Note: CI_REGISTRY_IMAGE and CI_REGISTRY are NOT set for local development. They are only set in .gitlab-ci-local-env for CI context.
CI/CD Configuration¶
GitLab-specific variables (only needed for CI/CD):
| Variable | Description |
|---|---|
GITLAB_HOST | GitLab hostname |
GITLAB_TOKEN | GitLab access token (uses CI_JOB_TOKEN if available) |
CI_REGISTRY | Container registry URL |
GITLAB_GROUP | GitLab group name |
Configuration Precedence¶
Configuration is loaded in this order (highest to lowest priority):
- Command-line environment variables
.envfile in project root- Code defaults (in Pydantic Settings classes)
Example:
# .env file has: LOG_LEVEL=INFO
# Command-line override:
LOG_LEVEL=DEBUG make service:start # Uses DEBUG
Configuration Format¶
Environment Variable Prefixes¶
SPOT uses standard environment variable names without a global prefix:
- Infrastructure:
POSTGRES_*,REDIS_*,RABBITMQ_* - Application:
APP_ENV,LOG_LEVEL,SECRET_KEY - Services:
SPOT_MAIL_*(analyzers configured inconfig/spot.yaml)
JSON Configuration¶
Some variables accept JSON objects:
# Mail Retrievers (JSON object)
SPOT_MAIL_RETRIEVERS='{"imap":{"url":"http://mail-retriever:8000","priority":1}}'
Note: Analyzer configuration has moved from environment variables to config/spot.yaml.
Quick Reference Examples¶
Minimal Production .env¶
# Environment
APP_ENV=prod
# Security (REQUIRED - generate secure values)
SECRET_KEY=generate-with-python-secrets-module
# Database
POSTGRES_DB=spot
POSTGRES_USER=spot
POSTGRES_PASSWORD=secure_db_password
# Redis (optional password)
REDIS_PASSWORD=secure_redis_password
# RabbitMQ
RABBITMQ_DEFAULT_USER=spot
RABBITMQ_DEFAULT_PASS=secure_rabbitmq_password
# Trusted hosts (your domains)
TRUSTED_HOSTS=spot.example.com,api.example.com
Minimal Development .env¶
# Environment
APP_ENV=dev
# Security
SECRET_KEY=dev-secret-key-for-testing-only
# Database (dev defaults)
POSTGRES_DB=spot
POSTGRES_USER=spot
POSTGRES_PASSWORD=spot123
# Redis (no password in dev)
REDIS_PASSWORD=
# RabbitMQ (dev defaults)
RABBITMQ_DEFAULT_USER=guest
RABBITMQ_DEFAULT_PASS=guest
Full Example with Analyzers¶
.env:
# Environment
APP_ENV=prod
# Security
SECRET_KEY=your-32-character-random-key-here
TRUSTED_HOSTS=spot.example.com
# Infrastructure
POSTGRES_PASSWORD=secure_password
RABBITMQ_DEFAULT_PASS=secure_password
config/spot.yaml:
analyzers:
analyzer-nlp:
enabled: true
url: "http://10.0.1.10:8000"
settings: {}
analyzer-llm:
enabled: true
url: "http://10.0.1.11:8000"
settings: {}
Validation¶
The platform validates configuration at startup:
- Database URL format
- RabbitMQ URL format
- Analyzer URL formats
- Port ranges (1-65535)
Invalid configuration will cause startup to fail with a descriptive error message.
Related Documentation¶
- Environment Management - Environment switching guide
- Admin Guide - Deployment configuration
- Developer Guide - Development setup