Data Flow¶

How emails are processed through the SPOT Platform.

Overview¶

SPOT uses asynchronous job processing with RabbitMQ for scalability and reliability.

Email Analysis Flow¶

1. Job Submission¶

Client -> API Gateway -> RabbitMQ -> Analyzer Orchestrator

Flow:

Client sends POST /api/v1/analyze request
API Gateway validates email format (spot-sdk Email model)
API Gateway generates job ID
API Gateway publishes to spot.analysis exchange with email.analyze routing key
API Gateway returns job_id to client immediately

2. Job Processing¶

Analyzer Orchestrator -> Workflow Engine -> Analyzers -> PostgreSQL

Flow:

Orchestrator consumes from orchestrator.analysis.requests queue
Loads workflow definition
Executes parallel analysis stage:
Sends POST /internal/analyze to each analyzer
Waits for responses (with timeout and retries)
Executes decision stage:
Aggregates analyzer results
Calculates threat level
Determines recommended action
Stores result in PostgreSQL
Publishes completion to spot.results exchange

3. Status Query¶

Client -> API Gateway -> RabbitMQ RPC -> Analyzer Orchestrator

Flow:

Client sends GET /api/v1/analyze/{job_id} request
API Gateway publishes RPC request to spot.status exchange
Orchestrator consumes from job.status.requests queue
Orchestrator checks JobManager (in-memory cache)
Falls back to PostgreSQL if not in memory
Response sent via RPC reply queue
API Gateway returns status to client

Sequence Diagram¶

Client          API Gateway        RabbitMQ         Orchestrator     Analyzers        PostgreSQL
  |                 |                 |                 |                |                |
  |--Submit Email-->|                 |                 |                |                |
  |                 |--Publish------->|                 |                |                |
  |<--Return ID-----|                 |                 |                |                |
  |                 |                 |--Consume------->|                |                |
  |                 |                 |                 |--HTTP POST---->|                |
  |                 |                 |                 |<--Response-----|                |
  |                 |                 |                 |--Store Result----------------->|
  |                 |                 |<--Publish-------|                |                |
  |                 |                 |                 |                |                |
  |--Check Status-->|                 |                 |                |                |
  |                 |--RPC Request--->|                 |                |                |
  |                 |                 |--Consume------->|                |                |
  |                 |                 |<--RPC Reply-----|                |                |
  |<--Return Result-|                 |                 |                |                |

Workflow Execution¶

Default Workflow¶

id: default-workflow
name: Default Analysis
stages:
  - id: parallel-analysis
    type: parallel
    analyzers:
      - nlp-analyzer
      - llm-analyzer
      - misp-analyzer
    min_success: 2

  - id: decision
    type: decision
    aggregator: weighted-average

Execution:

Parallel Analysis Stage
Runs all analyzers simultaneously via HTTP
Waits for at least min_success to succeed
Continues if threshold met
Fails job if too few succeed
Decision Stage
Aggregates results using weighted average
Determines overall threat level
Recommends action based on confidence

Analyzer Communication¶

Each analyzer receives:

{
  "email": {
    "headers": {...},
    "body_text": "...",
    "body_html": "..."
  }
}

Each analyzer returns:

{
  "is_phishing": boolean,
  "threat_level": "none|low|medium|high|critical",
  "confidence": 0.0-1.0,
  "indicators": [
    {
      "type": "domain_spoofing",
      "value": "microsft.com vs microsoft.com",
      "confidence": 0.95
    }
  ]
}

Error Handling¶

Analyzer Failures¶

If an analyzer fails:

Orchestrator logs error
Retries with exponential backoff (max 3 attempts)
Continues with remaining analyzers
Checks if min_success requirement met
Fails job if threshold not met

RabbitMQ Failures¶

If RabbitMQ unavailable:

API Gateway returns 503 Service Unavailable
Client should retry with exponential backoff

Database Failures¶

If PostgreSQL unavailable:

New jobs still accepted (queued in RabbitMQ)
Status queries fail for completed jobs
Jobs remain in queue until database recovers

Monitoring Points¶

Queue Metrics¶

orchestrator.analysis.requests queue depth
Consumer lag time
Message processing rate

Processing Metrics¶

Job completion time
Analyzer success rate
Analyzer response time

Database Metrics¶

Query execution time
Connection pool usage

Data Retention¶

Analysis results:

Kept for 90 days by default
Configurable via RETENTION_DAYS

Queue messages:

TTL: 24 hours
Dead letter queue for failed messages
Retry limit: 3 attempts

Security Considerations¶

Data in Transit¶

API: HTTPS with TLS 1.2+
RabbitMQ: Internal network only
PostgreSQL: Internal network only

Data at Rest¶

Database: Encrypted volumes recommended
Queue: Persistent messages on disk

Data Privacy¶

Email content not logged
PII redacted from logs
Analysis results include only metadata

System Overview - Architecture
Services - Service details
API reference - HTTP endpoints
Operator guide - the dashboard playbook (Web UI section of the merged site)

Data Flow¶

Overview¶

Email Analysis Flow¶

1. Job Submission¶

2. Job Processing¶

3. Status Query¶

Sequence Diagram¶

Workflow Execution¶

Default Workflow¶

Analyzer Communication¶

Error Handling¶

Analyzer Failures¶

RabbitMQ Failures¶

Database Failures¶

Monitoring Points¶

Queue Metrics¶

Processing Metrics¶

Database Metrics¶

Data Retention¶

Security Considerations¶

Data in Transit¶

Data at Rest¶

Data Privacy¶

Related Documentation¶