Data Flow¶
How emails are processed through the SPOT Platform.
Overview¶
SPOT uses asynchronous job processing with RabbitMQ for scalability and reliability.
Email Analysis Flow¶
1. Job Submission¶
Flow:
- Client sends
POST /api/v1/analyzerequest - API Gateway validates email format (spot-sdk Email model)
- API Gateway generates job ID
- API Gateway publishes to
spot.analysisexchange withemail.analyzerouting key - API Gateway returns job_id to client immediately
2. Job Processing¶
Flow:
- Orchestrator consumes from
orchestrator.analysis.requestsqueue - Loads workflow definition
- Executes parallel analysis stage:
- Sends
POST /internal/analyzeto each analyzer - Waits for responses (with timeout and retries)
- Executes decision stage:
- Aggregates analyzer results
- Calculates threat level
- Determines recommended action
- Stores result in PostgreSQL
- Publishes completion to
spot.resultsexchange
3. Status Query¶
Flow:
- Client sends
GET /api/v1/analyze/{job_id}request - API Gateway publishes RPC request to
spot.statusexchange - Orchestrator consumes from
job.status.requestsqueue - Orchestrator checks JobManager (in-memory cache)
- Falls back to PostgreSQL if not in memory
- Response sent via RPC reply queue
- API Gateway returns status to client
Sequence Diagram¶
Client API Gateway RabbitMQ Orchestrator Analyzers PostgreSQL
| | | | | |
|--Submit Email-->| | | | |
| |--Publish------->| | | |
|<--Return ID-----| | | | |
| | |--Consume------->| | |
| | | |--HTTP POST---->| |
| | | |<--Response-----| |
| | | |--Store Result----------------->|
| | |<--Publish-------| | |
| | | | | |
|--Check Status-->| | | | |
| |--RPC Request--->| | | |
| | |--Consume------->| | |
| | |<--RPC Reply-----| | |
|<--Return Result-| | | | |
Workflow Execution¶
Default Workflow¶
id: default-workflow
name: Default Analysis
stages:
- id: parallel-analysis
type: parallel
analyzers:
- nlp-analyzer
- llm-analyzer
- misp-analyzer
min_success: 2
- id: decision
type: decision
aggregator: weighted-average
Execution:
- Parallel Analysis Stage
- Runs all analyzers simultaneously via HTTP
- Waits for at least
min_successto succeed - Continues if threshold met
-
Fails job if too few succeed
-
Decision Stage
- Aggregates results using weighted average
- Determines overall threat level
- Recommends action based on confidence
Analyzer Communication¶
Each analyzer receives:
Each analyzer returns:
{
"is_phishing": boolean,
"threat_level": "none|low|medium|high|critical",
"confidence": 0.0-1.0,
"indicators": [
{
"type": "domain_spoofing",
"value": "microsft.com vs microsoft.com",
"confidence": 0.95
}
]
}
Error Handling¶
Analyzer Failures¶
If an analyzer fails:
- Orchestrator logs error
- Retries with exponential backoff (max 3 attempts)
- Continues with remaining analyzers
- Checks if
min_successrequirement met - Fails job if threshold not met
RabbitMQ Failures¶
If RabbitMQ unavailable:
- API Gateway returns 503 Service Unavailable
- Client should retry with exponential backoff
Database Failures¶
If PostgreSQL unavailable:
- New jobs still accepted (queued in RabbitMQ)
- Status queries fail for completed jobs
- Jobs remain in queue until database recovers
Monitoring Points¶
Queue Metrics¶
orchestrator.analysis.requestsqueue depth- Consumer lag time
- Message processing rate
Processing Metrics¶
- Job completion time
- Analyzer success rate
- Analyzer response time
Database Metrics¶
- Query execution time
- Connection pool usage
Data Retention¶
Analysis results:
- Kept for 90 days by default
- Configurable via
RETENTION_DAYS
Queue messages:
- TTL: 24 hours
- Dead letter queue for failed messages
- Retry limit: 3 attempts
Security Considerations¶
Data in Transit¶
- API: HTTPS with TLS 1.2+
- RabbitMQ: Internal network only
- PostgreSQL: Internal network only
Data at Rest¶
- Database: Encrypted volumes recommended
- Queue: Persistent messages on disk
Data Privacy¶
- Email content not logged
- PII redacted from logs
- Analysis results include only metadata
Related Documentation¶
- System Overview - Architecture
- Services - Service details
- API reference - HTTP endpoints
- Operator guide - the dashboard playbook (Web UI section of the merged site)