SPOT Analyzer Development Guide¶
This guide explains how to create custom analyzers for the SPOT (Spear-Phishing Overwatching Tool) platform.
Table of Contents¶
- Quick Start
- Required Endpoints
- Contract Implementation
- Response Formats
- Testing Your Analyzer
- Deployment
- Best Practices
- Examples
Quick Start¶
Minimal Analyzer (5 minutes)¶
Create a FastAPI application with three required endpoints:
from fastapi import FastAPI
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisMetadata
app = FastAPI(title="My Analyzer")
@app.get("/health")
async def health():
"""Health check endpoint (required)."""
return {"status": "healthy", "service": "my-analyzer", "version": "1.0.0"}
@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
"""Main analysis endpoint (required)."""
# Your analysis logic here
return AnalysisResult(
is_phishing=False,
threat_level="safe",
confidence=0.5,
explanation="Analysis completed",
indicators=[],
metadata=AnalysisMetadata(
analyzer_id="my-analyzer",
analyzer_version="1.0.0",
analysis_duration_ms=100
)
)
@app.get("/capabilities")
async def capabilities():
"""Capabilities endpoint (required)."""
return ["custom_analysis", "pattern_detection"]
That's it! This is a valid SPOT analyzer.
Required Endpoints¶
All SPOT analyzers must implement these three HTTP endpoints:
1. Health Check: GET /health¶
Purpose: Allow orchestrator to verify analyzer is running and ready.
Response:
Requirements:
- Must return 200 OK when healthy
- Response time should be < 500ms
- Should check critical dependencies (models loaded, DB connected, etc.)
2. Analysis: POST /internal/analyze¶
Purpose: Analyze an email and return results.
Request: Email object (see Email Schema)
Response: AnalysisResult object (see AnalysisResult Schema)
Requirements:
- Must accept
spot_sdk.email.Emailobject - Must return
spot_sdk.results.AnalysisResultobject - Should complete within 30 seconds (configurable)
- Must handle errors gracefully
3. Capabilities: GET /capabilities¶
Purpose: Declare what types of analysis this analyzer provides.
Response:
Common Capabilities:
sentiment_analysis- Analyzes email sentiment/toneentity_recognition- Extracts entities (names, orgs, URLs)url_analysis- Analyzes links and domainsattachment_scanning- Scans file attachmentslanguage_detection- Detects email languagephishing_patterns- Pattern-based spear-phishing detectionsocial_engineering- Social engineering detectiondomain_reputation- Domain reputation checking
Contract Implementation¶
Installing spot-sdk¶
Email Schema¶
The email object you receive:
from spot_sdk.api_gateway import Email, EmailHeader, Attachment
# Example email structure
email = Email(
id="unique-email-id",
headers=EmailHeader(
subject="Important: Verify your account",
sender="noreply@example.com",
recipients=["user@company.com"],
date="2024-01-15T10:30:00Z"
),
body_text="Please verify your account...",
body_html="<html>...</html>",
attachments=[
Attachment(
filename="invoice.pdf",
content_type="application/pdf",
size=1024,
content="base64-encoded-content"
)
]
)
AnalysisResult Schema¶
What your analyzer must return:
from spot_sdk.api_gateway import (
AnalysisResult,
AnalysisIndicator,
AnalysisMetadata,
IndicatorType
)
result = AnalysisResult(
# Required fields
is_phishing=True, # bool: Is this phishing?
threat_level="high", # str: safe|low|medium|high|critical
confidence=0.85, # float: 0.0-1.0
explanation="Found 3 phishing indicators...",
# Indicators (list of suspicious findings)
indicators=[
AnalysisIndicator(
type=IndicatorType.SUSPICIOUS_LINKS,
description="Contains shortened URL: bit.ly/xyz123",
severity="high", # low|medium|high
confidence=0.9, # 0.0-1.0
evidence="bit.ly/xyz123", # Optional: what triggered this
location="body" # Optional: where in email
)
],
# Metadata (required)
metadata=AnalysisMetadata(
analyzer_id="my-analyzer",
analyzer_version="1.0.0",
analysis_duration_ms=245,
model_version="distilbert-v1" # Optional: ML model version
),
# Optional: raw analysis data
raw_output={
"sentiment_scores": {"negative": 0.8, "positive": 0.1},
"entities_found": 5,
"custom_data": "anything you want"
}
)
Indicator Types¶
Use these standard types for better integration:
from spot_sdk.api_gateway import IndicatorType
IndicatorType.SUSPICIOUS_LINKS # Malicious/shortened URLs
IndicatorType.DOMAIN_SPOOFING # Sender domain mismatch
IndicatorType.URGENT_LANGUAGE # Pressure tactics
IndicatorType.SPELLING_GRAMMAR_ERRORS # Poor grammar/typos
IndicatorType.SOCIAL_ENGINEERING # Social engineering tactics
IndicatorType.SUSPICIOUS_ATTACHMENTS # Dangerous file types
IndicatorType.CONTEXT_MISMATCH # Content doesn't match sender
IndicatorType.HEADER_ANOMALIES # Email header issues
Response Formats¶
Successful Analysis¶
{
"is_phishing": true,
"threat_level": "high",
"confidence": 0.85,
"explanation": "Email contains suspicious shortened URLs and urgent language",
"indicators": [
{
"type": "suspicious_links",
"description": "Found shortened URL: bit.ly/xyz123",
"severity": "high",
"confidence": 0.9,
"evidence": "bit.ly/xyz123",
"location": "body"
},
{
"type": "urgent_language",
"description": "Uses urgency tactics: 'act now', 'expires today'",
"severity": "medium",
"confidence": 0.8
}
],
"metadata": {
"analyzer_id": "my-analyzer",
"analyzer_version": "1.0.0",
"analysis_duration_ms": 245,
"model_version": "v1.2.0"
},
"raw_output": {
"sentiment": {"negative": 0.8, "positive": 0.1, "neutral": 0.1},
"entities": ["PayPal", "bit.ly"]
}
}
Error Handling¶
Always return valid AnalysisResult even on errors:
try:
# Your analysis logic
result = perform_analysis(email)
except Exception as e:
# Return safe result with error info
result = AnalysisResult(
is_phishing=False,
threat_level="safe",
confidence=0.0,
explanation=f"Analysis failed: {str(e)}",
indicators=[],
metadata=AnalysisMetadata(
analyzer_id="my-analyzer",
analyzer_version="1.0.0",
analysis_duration_ms=0
),
raw_output={"error": str(e), "error_type": type(e).__name__}
)
Testing Your Analyzer¶
Local Testing¶
import pytest
from httpx import AsyncClient
@pytest.mark.asyncio
async def test_health_endpoint():
"""Test health check works."""
async with AsyncClient(app=app, base_url="http://test") as client:
response = await client.get("/health")
assert response.status_code == 200
assert response.json()["status"] == "healthy"
@pytest.mark.asyncio
async def test_analyze_safe_email():
"""Test analysis of safe email."""
from spot_sdk.api_gateway import Email, EmailHeader
email = Email(
id="test-1",
headers=EmailHeader(
subject="Meeting tomorrow",
sender="colleague@company.com",
recipients=["me@company.com"]
),
body_text="Let's meet at 2pm."
)
async with AsyncClient(app=app, base_url="http://test") as client:
response = await client.post(
"/internal/analyze",
json=email.model_dump()
)
assert response.status_code == 200
result = response.json()
assert result["is_phishing"] == False
assert result["threat_level"] == "safe"
Test with SPOT Platform¶
Once your analyzer passes local tests, integrate with the full platform:
- Register analyzer with platform:
- Add to docker-compose (if running locally):
# docker-compose.yml or docker-compose.override.yml
services:
my-analyzer:
build: ../my-analyzer # Path to your analyzer
ports:
- "8003:8000"
networks:
- spot-network
environment:
- DEBUG=true
- Start platform with your analyzer:
# In spot-platform
cd /path/to/spot-platform
make start
# Verify analyzer is registered
curl http://localhost:8001/api/v1/analyzers
- Run end-to-end test:
# Submit analysis request
curl -X POST http://localhost:8001/api/v1/analyze \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d @test-email.json
# Check if your analyzer was called
docker compose logs my-analyzer
- Run platform integration tests:
# In spot-platform
make test-integration
# Tests verify:
# - Analyzer registration
# - Orchestrator communication
# - Result aggregation
# - Error handling
Platform Documentation:
- Platform Testing Guide - Platform testing strategies
- Platform Developer Guide - Platform development
- Configuration Reference - Environment variables
Deployment¶
Docker Deployment (Recommended)¶
Dockerfile:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY pyproject.toml poetry.lock ./
RUN pip install poetry && poetry install --no-dev
# Copy application
COPY . .
# Run analyzer
CMD ["poetry", "run", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
pyproject.toml:
[tool.poetry]
name = "my-analyzer"
version = "1.0.0"
[tool.poetry.dependencies]
python = "^3.11"
fastapi = "^0.104"
uvicorn = "^0.24"
spot-sdk-python = "^2.0.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Registration with SPOT Platform¶
Analyzers are registered via configuration:
# In spot-platform config
analyzers:
- id: my-analyzer
name: "My Custom Analyzer"
url: "http://my-analyzer:8000"
enabled: true
priority: 10
timeout_ms: 30000
Best Practices¶
Performance¶
- Keep analysis < 30 seconds: Orchestrator has timeout
- Use async operations: All endpoints should be async
- Cache models: Load ML models once at startup, not per request
- Batch where possible: Process multiple indicators efficiently
Reliability¶
- Always return valid response: Never raise unhandled exceptions
- Implement health checks properly: Check all critical dependencies
- Log errors with context: Include email ID for debugging
- Handle timeouts gracefully: Return partial results if needed
Security¶
- Validate input: Check email object is well-formed
- Sanitize data: Be careful with user-provided content
- Limit resource usage: Set memory/CPU limits
- Don't store emails: Process and discard, respect privacy
Indicator Quality¶
- Be specific: "Contains bit.ly URL" not "suspicious URL"
- Provide evidence: Show what triggered the indicator
- Set appropriate confidence: Be honest about certainty
- Use standard types: Prefer IndicatorType enums over custom strings
Examples¶
Example 1: Rule-Based Analyzer¶
from fastapi import FastAPI
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisIndicator, AnalysisMetadata, IndicatorType
import re
app = FastAPI(title="Simple Rule Analyzer")
# Detection rules
SUSPICIOUS_DOMAINS = ["bit.ly", "tinyurl.com", "t.co"]
URGENT_KEYWORDS = ["urgent", "immediate", "expires", "act now", "verify"]
@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
"""Analyze using simple rules."""
indicators = []
score = 0.0
# Get text
subject = email.headers.subject if email.headers else ""
body = email.body_text or ""
full_text = f"{subject} {body}".lower()
# Check for shortened URLs
for domain in SUSPICIOUS_DOMAINS:
if domain in full_text:
indicators.append(AnalysisIndicator(
type=IndicatorType.SUSPICIOUS_LINKS,
description=f"Contains shortened URL: {domain}",
severity="high",
confidence=0.9
))
score += 0.4
# Check for urgent language
urgent_count = sum(1 for keyword in URGENT_KEYWORDS if keyword in full_text)
if urgent_count >= 2:
indicators.append(AnalysisIndicator(
type=IndicatorType.URGENT_LANGUAGE,
description=f"Contains {urgent_count} urgency keywords",
severity="medium",
confidence=0.7
))
score += 0.3
# Determine result
is_phishing = score >= 0.5
threat_level = "high" if score >= 0.7 else "medium" if score >= 0.5 else "low" if score >= 0.2 else "safe"
return AnalysisResult(
is_phishing=is_phishing,
threat_level=threat_level,
confidence=min(score, 1.0),
explanation=f"Found {len(indicators)} indicators with score {score:.2f}",
indicators=indicators,
metadata=AnalysisMetadata(
analyzer_id="simple-rules",
analyzer_version="1.0.0",
analysis_duration_ms=50
)
)
Example 2: ML-Based Analyzer¶
from fastapi import FastAPI
from transformers import pipeline
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisIndicator, AnalysisMetadata, IndicatorType
from datetime import datetime
app = FastAPI(title="ML Sentiment Analyzer")
# Load model once at startup
sentiment_analyzer = None
@app.on_event("startup")
async def load_models():
"""Load ML models at startup."""
global sentiment_analyzer
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased")
@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
"""Analyze using ML model."""
start = datetime.utcnow()
# Extract text
text = f"{email.headers.subject if email.headers else ''} {email.body_text or ''}"
# Run sentiment analysis
result = sentiment_analyzer(text[:512])[0] # Limit to 512 chars
indicators = []
score = 0.0
# Check for negative sentiment
if result['label'] == 'NEGATIVE' and result['score'] > 0.8:
indicators.append(AnalysisIndicator(
type=IndicatorType.SOCIAL_ENGINEERING,
description=f"High negative sentiment detected ({result['score']:.2f})",
severity="medium",
confidence=result['score']
))
score = result['score'] * 0.5
is_phishing = score >= 0.4
threat_level = "medium" if is_phishing else "safe"
# Calculate duration
duration_ms = int((datetime.utcnow() - start).total_seconds() * 1000)
return AnalysisResult(
is_phishing=is_phishing,
threat_level=threat_level,
confidence=score,
explanation=f"Sentiment analysis: {result['label']} ({result['score']:.2f})",
indicators=indicators,
metadata=AnalysisMetadata(
analyzer_id="ml-sentiment",
analyzer_version="1.0.0",
analysis_duration_ms=duration_ms,
model_version="distilbert-base-uncased"
),
raw_output={"sentiment": result}
)
Troubleshooting¶
Common Issues¶
Q: My analyzer isn't being called - Check health endpoint returns 200 OK - Verify analyzer is registered in platform config - Check orchestrator logs for connection errors
Q: Timeout errors - Reduce analysis time (< 30 seconds) - Use async operations - Consider returning partial results
Q: Invalid response format - Ensure you return AnalysisResult object - Check all required fields are present - Validate with spot-sdk types
Q: Indicators not showing up - Use standard IndicatorType enums - Provide clear descriptions - Set appropriate severity levels
Support¶
SDK Documentation¶
- SDK Documentation: https://spot-project.codeberg.page/documentation/sdk
- API Reference: https://spot-project.codeberg.page/documentation/sdk/api-specs
- Python SDK: https://spot-project.codeberg.page/documentation/sdk/python-sdk
- TypeScript SDK: https://spot-project.codeberg.page/documentation/sdk/typescript-sdk
Platform Documentation¶
- Platform Documentation: https://spot-project.codeberg.page/documentation/platform
- Developer Guide: https://spot-project.codeberg.page/documentation/platform/guides/developer-guide
- Testing Guide: https://spot-project.codeberg.page/documentation/platform/TESTING
- Configuration Reference: https://spot-project.codeberg.page/documentation/platform/reference/configuration
Issues and Support¶
- SDK Issues: https://codeberg.org/SPOT_Project/sdk/-/issues
- Platform Issues: https://codeberg.org/SPOT_Project/core/-/issues
Changelog¶
- 2025-11-03: Initial version
- Created minimal analyzer guide
- Added contract specifications
- Added examples and best practices