SPOT Analyzer Development Guide¶

This guide explains how to create custom analyzers for the SPOT (Spear-Phishing Overwatching Tool) platform.

Table of Contents¶

Quick Start
Required Endpoints
Contract Implementation
Response Formats
Testing Your Analyzer
Deployment
Best Practices
Examples

Quick Start¶

Minimal Analyzer (5 minutes)¶

Create a FastAPI application with three required endpoints:

from fastapi import FastAPI
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisMetadata

app = FastAPI(title="My Analyzer")

@app.get("/health")
async def health():
    """Health check endpoint (required)."""
    return {"status": "healthy", "service": "my-analyzer", "version": "1.0.0"}

@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
    """Main analysis endpoint (required)."""
    # Your analysis logic here
    return AnalysisResult(
        is_phishing=False,
        threat_level="safe",
        confidence=0.5,
        explanation="Analysis completed",
        indicators=[],
        metadata=AnalysisMetadata(
            analyzer_id="my-analyzer",
            analyzer_version="1.0.0",
            analysis_duration_ms=100
        )
    )

@app.get("/capabilities")
async def capabilities():
    """Capabilities endpoint (required)."""
    return ["custom_analysis", "pattern_detection"]

That's it! This is a valid SPOT analyzer.

Required Endpoints¶

All SPOT analyzers must implement these three HTTP endpoints:

1. Health Check: `GET /health`¶

Purpose: Allow orchestrator to verify analyzer is running and ready.

Response:

{
  "status": "healthy",
  "service": "analyzer-name",
  "version": "1.0.0"
}

Requirements:

Must return 200 OK when healthy
Response time should be < 500ms
Should check critical dependencies (models loaded, DB connected, etc.)

2. Analysis: `POST /internal/analyze`¶

Purpose: Analyze an email and return results.

Request: Email object (see Email Schema)

Response: AnalysisResult object (see AnalysisResult Schema)

Requirements:

Must accept spot_sdk.email.Email object
Must return spot_sdk.results.AnalysisResult object
Should complete within 30 seconds (configurable)
Must handle errors gracefully

3. Capabilities: `GET /capabilities`¶

Purpose: Declare what types of analysis this analyzer provides.

Response:

[
  "sentiment_analysis",
  "url_analysis",
  "domain_verification"
]

Common Capabilities:

sentiment_analysis - Analyzes email sentiment/tone
entity_recognition - Extracts entities (names, orgs, URLs)
url_analysis - Analyzes links and domains
attachment_scanning - Scans file attachments
language_detection - Detects email language
phishing_patterns - Pattern-based spear-phishing detection
social_engineering - Social engineering detection
domain_reputation - Domain reputation checking

Contract Implementation¶

Installing spot-sdk¶

# Via pip
pip install spot-sdk-python

# Via poetry
poetry add spot-sdk-python

Email Schema¶

The email object you receive:

from spot_sdk.api_gateway import Email, EmailHeader, Attachment

# Example email structure
email = Email(
    id="unique-email-id",
    headers=EmailHeader(
        subject="Important: Verify your account",
        sender="noreply@example.com",
        recipients=["user@company.com"],
        date="2024-01-15T10:30:00Z"
    ),
    body_text="Please verify your account...",
    body_html="<html>...</html>",
    attachments=[
        Attachment(
            filename="invoice.pdf",
            content_type="application/pdf",
            size=1024,
            content="base64-encoded-content"
        )
    ]
)

AnalysisResult Schema¶

What your analyzer must return:

from spot_sdk.api_gateway import (
    AnalysisResult,
    AnalysisIndicator,
    AnalysisMetadata,
    IndicatorType
)

result = AnalysisResult(
    # Required fields
    is_phishing=True,                    # bool: Is this phishing?
    threat_level="high",                  # str: safe|low|medium|high|critical
    confidence=0.85,                      # float: 0.0-1.0
    explanation="Found 3 phishing indicators...",

    # Indicators (list of suspicious findings)
    indicators=[
        AnalysisIndicator(
            type=IndicatorType.SUSPICIOUS_LINKS,
            description="Contains shortened URL: bit.ly/xyz123",
            severity="high",              # low|medium|high
            confidence=0.9,               # 0.0-1.0
            evidence="bit.ly/xyz123",     # Optional: what triggered this
            location="body"               # Optional: where in email
        )
    ],

    # Metadata (required)
    metadata=AnalysisMetadata(
        analyzer_id="my-analyzer",
        analyzer_version="1.0.0",
        analysis_duration_ms=245,
        model_version="distilbert-v1"    # Optional: ML model version
    ),

    # Optional: raw analysis data
    raw_output={
        "sentiment_scores": {"negative": 0.8, "positive": 0.1},
        "entities_found": 5,
        "custom_data": "anything you want"
    }
)

Indicator Types¶

Use these standard types for better integration:

from spot_sdk.api_gateway import IndicatorType

IndicatorType.SUSPICIOUS_LINKS          # Malicious/shortened URLs
IndicatorType.DOMAIN_SPOOFING           # Sender domain mismatch
IndicatorType.URGENT_LANGUAGE           # Pressure tactics
IndicatorType.SPELLING_GRAMMAR_ERRORS   # Poor grammar/typos
IndicatorType.SOCIAL_ENGINEERING        # Social engineering tactics
IndicatorType.SUSPICIOUS_ATTACHMENTS    # Dangerous file types
IndicatorType.CONTEXT_MISMATCH          # Content doesn't match sender
IndicatorType.HEADER_ANOMALIES          # Email header issues

Response Formats¶

Successful Analysis¶

{
  "is_phishing": true,
  "threat_level": "high",
  "confidence": 0.85,
  "explanation": "Email contains suspicious shortened URLs and urgent language",
  "indicators": [
    {
      "type": "suspicious_links",
      "description": "Found shortened URL: bit.ly/xyz123",
      "severity": "high",
      "confidence": 0.9,
      "evidence": "bit.ly/xyz123",
      "location": "body"
    },
    {
      "type": "urgent_language",
      "description": "Uses urgency tactics: 'act now', 'expires today'",
      "severity": "medium",
      "confidence": 0.8
    }
  ],
  "metadata": {
    "analyzer_id": "my-analyzer",
    "analyzer_version": "1.0.0",
    "analysis_duration_ms": 245,
    "model_version": "v1.2.0"
  },
  "raw_output": {
    "sentiment": {"negative": 0.8, "positive": 0.1, "neutral": 0.1},
    "entities": ["PayPal", "bit.ly"]
  }
}

Error Handling¶

Always return valid AnalysisResult even on errors:

try:
    # Your analysis logic
    result = perform_analysis(email)
except Exception as e:
    # Return safe result with error info
    result = AnalysisResult(
        is_phishing=False,
        threat_level="safe",
        confidence=0.0,
        explanation=f"Analysis failed: {str(e)}",
        indicators=[],
        metadata=AnalysisMetadata(
            analyzer_id="my-analyzer",
            analyzer_version="1.0.0",
            analysis_duration_ms=0
        ),
        raw_output={"error": str(e), "error_type": type(e).__name__}
    )

Testing Your Analyzer¶

Local Testing¶

import pytest
from httpx import AsyncClient

@pytest.mark.asyncio
async def test_health_endpoint():
    """Test health check works."""
    async with AsyncClient(app=app, base_url="http://test") as client:
        response = await client.get("/health")
        assert response.status_code == 200
        assert response.json()["status"] == "healthy"

@pytest.mark.asyncio
async def test_analyze_safe_email():
    """Test analysis of safe email."""
    from spot_sdk.api_gateway import Email, EmailHeader

    email = Email(
        id="test-1",
        headers=EmailHeader(
            subject="Meeting tomorrow",
            sender="colleague@company.com",
            recipients=["me@company.com"]
        ),
        body_text="Let's meet at 2pm."
    )

    async with AsyncClient(app=app, base_url="http://test") as client:
        response = await client.post(
            "/internal/analyze",
            json=email.model_dump()
        )
        assert response.status_code == 200
        result = response.json()
        assert result["is_phishing"] == False
        assert result["threat_level"] == "safe"

Test with SPOT Platform¶

Once your analyzer passes local tests, integrate with the full platform:

Register analyzer with platform:

# In spot-platform/.env
SPOT_ANALYZER_URLS='{"my-analyzer":"http://my-analyzer:8000"}'

Add to docker-compose (if running locally):

# docker-compose.yml or docker-compose.override.yml
services:
  my-analyzer:
    build: ../my-analyzer  # Path to your analyzer
    ports:
      - "8003:8000"
    networks:
      - spot-network
    environment:
      - DEBUG=true

Start platform with your analyzer:

# In spot-platform
cd /path/to/spot-platform
make start

# Verify analyzer is registered
curl http://localhost:8001/api/v1/analyzers

Run end-to-end test:

# Submit analysis request
curl -X POST http://localhost:8001/api/v1/analyze \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d @test-email.json

# Check if your analyzer was called
docker compose logs my-analyzer

Run platform integration tests:

# In spot-platform
make test-integration

# Tests verify:
# - Analyzer registration
# - Orchestrator communication
# - Result aggregation
# - Error handling

Platform Documentation:

Platform Testing Guide - Platform testing strategies
Platform Developer Guide - Platform development
Configuration Reference - Environment variables

Deployment¶

Docker Deployment (Recommended)¶

Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY pyproject.toml poetry.lock ./
RUN pip install poetry && poetry install --no-dev

# Copy application
COPY . .

# Run analyzer
CMD ["poetry", "run", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

pyproject.toml:

[tool.poetry]
name = "my-analyzer"
version = "1.0.0"

[tool.poetry.dependencies]
python = "^3.11"
fastapi = "^0.104"
uvicorn = "^0.24"
spot-sdk-python = "^2.0.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Registration with SPOT Platform¶

Analyzers are registered via configuration:

# In spot-platform config
analyzers:
  - id: my-analyzer
    name: "My Custom Analyzer"
    url: "http://my-analyzer:8000"
    enabled: true
    priority: 10
    timeout_ms: 30000

Best Practices¶

Performance¶

Keep analysis < 30 seconds: Orchestrator has timeout
Use async operations: All endpoints should be async
Cache models: Load ML models once at startup, not per request
Batch where possible: Process multiple indicators efficiently

Reliability¶

Always return valid response: Never raise unhandled exceptions
Implement health checks properly: Check all critical dependencies
Log errors with context: Include email ID for debugging
Handle timeouts gracefully: Return partial results if needed

Security¶

Validate input: Check email object is well-formed
Sanitize data: Be careful with user-provided content
Limit resource usage: Set memory/CPU limits
Don't store emails: Process and discard, respect privacy

Indicator Quality¶

Be specific: "Contains bit.ly URL" not "suspicious URL"
Provide evidence: Show what triggered the indicator
Set appropriate confidence: Be honest about certainty
Use standard types: Prefer IndicatorType enums over custom strings

Examples¶

Example 1: Rule-Based Analyzer¶

from fastapi import FastAPI
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisIndicator, AnalysisMetadata, IndicatorType
import re

app = FastAPI(title="Simple Rule Analyzer")

# Detection rules
SUSPICIOUS_DOMAINS = ["bit.ly", "tinyurl.com", "t.co"]
URGENT_KEYWORDS = ["urgent", "immediate", "expires", "act now", "verify"]

@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
    """Analyze using simple rules."""
    indicators = []
    score = 0.0

    # Get text
    subject = email.headers.subject if email.headers else ""
    body = email.body_text or ""
    full_text = f"{subject} {body}".lower()

    # Check for shortened URLs
    for domain in SUSPICIOUS_DOMAINS:
        if domain in full_text:
            indicators.append(AnalysisIndicator(
                type=IndicatorType.SUSPICIOUS_LINKS,
                description=f"Contains shortened URL: {domain}",
                severity="high",
                confidence=0.9
            ))
            score += 0.4

    # Check for urgent language
    urgent_count = sum(1 for keyword in URGENT_KEYWORDS if keyword in full_text)
    if urgent_count >= 2:
        indicators.append(AnalysisIndicator(
            type=IndicatorType.URGENT_LANGUAGE,
            description=f"Contains {urgent_count} urgency keywords",
            severity="medium",
            confidence=0.7
        ))
        score += 0.3

    # Determine result
    is_phishing = score >= 0.5
    threat_level = "high" if score >= 0.7 else "medium" if score >= 0.5 else "low" if score >= 0.2 else "safe"

    return AnalysisResult(
        is_phishing=is_phishing,
        threat_level=threat_level,
        confidence=min(score, 1.0),
        explanation=f"Found {len(indicators)} indicators with score {score:.2f}",
        indicators=indicators,
        metadata=AnalysisMetadata(
            analyzer_id="simple-rules",
            analyzer_version="1.0.0",
            analysis_duration_ms=50
        )
    )

Example 2: ML-Based Analyzer¶

from fastapi import FastAPI
from transformers import pipeline
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisIndicator, AnalysisMetadata, IndicatorType
from datetime import datetime

app = FastAPI(title="ML Sentiment Analyzer")

# Load model once at startup
sentiment_analyzer = None

@app.on_event("startup")
async def load_models():
    """Load ML models at startup."""
    global sentiment_analyzer
    sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased")

@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
    """Analyze using ML model."""
    start = datetime.utcnow()

    # Extract text
    text = f"{email.headers.subject if email.headers else ''} {email.body_text or ''}"

    # Run sentiment analysis
    result = sentiment_analyzer(text[:512])[0]  # Limit to 512 chars

    indicators = []
    score = 0.0

    # Check for negative sentiment
    if result['label'] == 'NEGATIVE' and result['score'] > 0.8:
        indicators.append(AnalysisIndicator(
            type=IndicatorType.SOCIAL_ENGINEERING,
            description=f"High negative sentiment detected ({result['score']:.2f})",
            severity="medium",
            confidence=result['score']
        ))
        score = result['score'] * 0.5

    is_phishing = score >= 0.4
    threat_level = "medium" if is_phishing else "safe"

    # Calculate duration
    duration_ms = int((datetime.utcnow() - start).total_seconds() * 1000)

    return AnalysisResult(
        is_phishing=is_phishing,
        threat_level=threat_level,
        confidence=score,
        explanation=f"Sentiment analysis: {result['label']} ({result['score']:.2f})",
        indicators=indicators,
        metadata=AnalysisMetadata(
            analyzer_id="ml-sentiment",
            analyzer_version="1.0.0",
            analysis_duration_ms=duration_ms,
            model_version="distilbert-base-uncased"
        ),
        raw_output={"sentiment": result}
    )

Troubleshooting¶

Common Issues¶

Q: My analyzer isn't being called - Check health endpoint returns 200 OK - Verify analyzer is registered in platform config - Check orchestrator logs for connection errors

Q: Timeout errors - Reduce analysis time (< 30 seconds) - Use async operations - Consider returning partial results

Q: Invalid response format - Ensure you return AnalysisResult object - Check all required fields are present - Validate with spot-sdk types

Q: Indicators not showing up - Use standard IndicatorType enums - Provide clear descriptions - Set appropriate severity levels

Support¶

SDK Documentation¶

SDK Documentation: https://spot-project.codeberg.page/documentation/sdk
API Reference: https://spot-project.codeberg.page/documentation/sdk/api-specs
Python SDK: https://spot-project.codeberg.page/documentation/sdk/python-sdk
TypeScript SDK: https://spot-project.codeberg.page/documentation/sdk/typescript-sdk

Platform Documentation¶

Platform Documentation: https://spot-project.codeberg.page/documentation/platform
Developer Guide: https://spot-project.codeberg.page/documentation/platform/guides/developer-guide
Testing Guide: https://spot-project.codeberg.page/documentation/platform/TESTING
Configuration Reference: https://spot-project.codeberg.page/documentation/platform/reference/configuration

Issues and Support¶

SDK Issues: https://codeberg.org/SPOT_Project/sdk/-/issues
Platform Issues: https://codeberg.org/SPOT_Project/core/-/issues

Changelog¶

2025-11-03: Initial version
Created minimal analyzer guide
Added contract specifications
Added examples and best practices