Skip to content

SPOT Analyzer Development Guide

This guide explains how to create custom analyzers for the SPOT (Spear-Phishing Overwatching Tool) platform.

Table of Contents

  1. Quick Start
  2. Required Endpoints
  3. Contract Implementation
  4. Response Formats
  5. Testing Your Analyzer
  6. Deployment
  7. Best Practices
  8. Examples

Quick Start

Minimal Analyzer (5 minutes)

Create a FastAPI application with three required endpoints:

from fastapi import FastAPI
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisMetadata

app = FastAPI(title="My Analyzer")

@app.get("/health")
async def health():
    """Health check endpoint (required)."""
    return {"status": "healthy", "service": "my-analyzer", "version": "1.0.0"}

@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
    """Main analysis endpoint (required)."""
    # Your analysis logic here
    return AnalysisResult(
        is_phishing=False,
        threat_level="safe",
        confidence=0.5,
        explanation="Analysis completed",
        indicators=[],
        metadata=AnalysisMetadata(
            analyzer_id="my-analyzer",
            analyzer_version="1.0.0",
            analysis_duration_ms=100
        )
    )

@app.get("/capabilities")
async def capabilities():
    """Capabilities endpoint (required)."""
    return ["custom_analysis", "pattern_detection"]

That's it! This is a valid SPOT analyzer.


Required Endpoints

All SPOT analyzers must implement these three HTTP endpoints:

1. Health Check: GET /health

Purpose: Allow orchestrator to verify analyzer is running and ready.

Response:

{
  "status": "healthy",
  "service": "analyzer-name",
  "version": "1.0.0"
}

Requirements:

  • Must return 200 OK when healthy
  • Response time should be < 500ms
  • Should check critical dependencies (models loaded, DB connected, etc.)

2. Analysis: POST /internal/analyze

Purpose: Analyze an email and return results.

Request: Email object (see Email Schema)

Response: AnalysisResult object (see AnalysisResult Schema)

Requirements:

  • Must accept spot_sdk.email.Email object
  • Must return spot_sdk.results.AnalysisResult object
  • Should complete within 30 seconds (configurable)
  • Must handle errors gracefully

3. Capabilities: GET /capabilities

Purpose: Declare what types of analysis this analyzer provides.

Response:

[
  "sentiment_analysis",
  "url_analysis",
  "domain_verification"
]

Common Capabilities:

  • sentiment_analysis - Analyzes email sentiment/tone
  • entity_recognition - Extracts entities (names, orgs, URLs)
  • url_analysis - Analyzes links and domains
  • attachment_scanning - Scans file attachments
  • language_detection - Detects email language
  • phishing_patterns - Pattern-based spear-phishing detection
  • social_engineering - Social engineering detection
  • domain_reputation - Domain reputation checking

Contract Implementation

Installing spot-sdk

# Via pip
pip install spot-sdk-python

# Via poetry
poetry add spot-sdk-python

Email Schema

The email object you receive:

from spot_sdk.api_gateway import Email, EmailHeader, Attachment

# Example email structure
email = Email(
    id="unique-email-id",
    headers=EmailHeader(
        subject="Important: Verify your account",
        sender="noreply@example.com",
        recipients=["user@company.com"],
        date="2024-01-15T10:30:00Z"
    ),
    body_text="Please verify your account...",
    body_html="<html>...</html>",
    attachments=[
        Attachment(
            filename="invoice.pdf",
            content_type="application/pdf",
            size=1024,
            content="base64-encoded-content"
        )
    ]
)

AnalysisResult Schema

What your analyzer must return:

from spot_sdk.api_gateway import (
    AnalysisResult,
    AnalysisIndicator,
    AnalysisMetadata,
    IndicatorType
)

result = AnalysisResult(
    # Required fields
    is_phishing=True,                    # bool: Is this phishing?
    threat_level="high",                  # str: safe|low|medium|high|critical
    confidence=0.85,                      # float: 0.0-1.0
    explanation="Found 3 phishing indicators...",

    # Indicators (list of suspicious findings)
    indicators=[
        AnalysisIndicator(
            type=IndicatorType.SUSPICIOUS_LINKS,
            description="Contains shortened URL: bit.ly/xyz123",
            severity="high",              # low|medium|high
            confidence=0.9,               # 0.0-1.0
            evidence="bit.ly/xyz123",     # Optional: what triggered this
            location="body"               # Optional: where in email
        )
    ],

    # Metadata (required)
    metadata=AnalysisMetadata(
        analyzer_id="my-analyzer",
        analyzer_version="1.0.0",
        analysis_duration_ms=245,
        model_version="distilbert-v1"    # Optional: ML model version
    ),

    # Optional: raw analysis data
    raw_output={
        "sentiment_scores": {"negative": 0.8, "positive": 0.1},
        "entities_found": 5,
        "custom_data": "anything you want"
    }
)

Indicator Types

Use these standard types for better integration:

from spot_sdk.api_gateway import IndicatorType

IndicatorType.SUSPICIOUS_LINKS          # Malicious/shortened URLs
IndicatorType.DOMAIN_SPOOFING           # Sender domain mismatch
IndicatorType.URGENT_LANGUAGE           # Pressure tactics
IndicatorType.SPELLING_GRAMMAR_ERRORS   # Poor grammar/typos
IndicatorType.SOCIAL_ENGINEERING        # Social engineering tactics
IndicatorType.SUSPICIOUS_ATTACHMENTS    # Dangerous file types
IndicatorType.CONTEXT_MISMATCH          # Content doesn't match sender
IndicatorType.HEADER_ANOMALIES          # Email header issues

Response Formats

Successful Analysis

{
  "is_phishing": true,
  "threat_level": "high",
  "confidence": 0.85,
  "explanation": "Email contains suspicious shortened URLs and urgent language",
  "indicators": [
    {
      "type": "suspicious_links",
      "description": "Found shortened URL: bit.ly/xyz123",
      "severity": "high",
      "confidence": 0.9,
      "evidence": "bit.ly/xyz123",
      "location": "body"
    },
    {
      "type": "urgent_language",
      "description": "Uses urgency tactics: 'act now', 'expires today'",
      "severity": "medium",
      "confidence": 0.8
    }
  ],
  "metadata": {
    "analyzer_id": "my-analyzer",
    "analyzer_version": "1.0.0",
    "analysis_duration_ms": 245,
    "model_version": "v1.2.0"
  },
  "raw_output": {
    "sentiment": {"negative": 0.8, "positive": 0.1, "neutral": 0.1},
    "entities": ["PayPal", "bit.ly"]
  }
}

Error Handling

Always return valid AnalysisResult even on errors:

try:
    # Your analysis logic
    result = perform_analysis(email)
except Exception as e:
    # Return safe result with error info
    result = AnalysisResult(
        is_phishing=False,
        threat_level="safe",
        confidence=0.0,
        explanation=f"Analysis failed: {str(e)}",
        indicators=[],
        metadata=AnalysisMetadata(
            analyzer_id="my-analyzer",
            analyzer_version="1.0.0",
            analysis_duration_ms=0
        ),
        raw_output={"error": str(e), "error_type": type(e).__name__}
    )

Testing Your Analyzer

Local Testing

import pytest
from httpx import AsyncClient

@pytest.mark.asyncio
async def test_health_endpoint():
    """Test health check works."""
    async with AsyncClient(app=app, base_url="http://test") as client:
        response = await client.get("/health")
        assert response.status_code == 200
        assert response.json()["status"] == "healthy"

@pytest.mark.asyncio
async def test_analyze_safe_email():
    """Test analysis of safe email."""
    from spot_sdk.api_gateway import Email, EmailHeader

    email = Email(
        id="test-1",
        headers=EmailHeader(
            subject="Meeting tomorrow",
            sender="colleague@company.com",
            recipients=["me@company.com"]
        ),
        body_text="Let's meet at 2pm."
    )

    async with AsyncClient(app=app, base_url="http://test") as client:
        response = await client.post(
            "/internal/analyze",
            json=email.model_dump()
        )
        assert response.status_code == 200
        result = response.json()
        assert result["is_phishing"] == False
        assert result["threat_level"] == "safe"

Test with SPOT Platform

Once your analyzer passes local tests, integrate with the full platform:

  1. Register analyzer with platform:
# In spot-platform/.env
SPOT_ANALYZER_URLS='{"my-analyzer":"http://my-analyzer:8000"}'
  1. Add to docker-compose (if running locally):
# docker-compose.yml or docker-compose.override.yml
services:
  my-analyzer:
    build: ../my-analyzer  # Path to your analyzer
    ports:
      - "8003:8000"
    networks:
      - spot-network
    environment:
      - DEBUG=true
  1. Start platform with your analyzer:
# In spot-platform
cd /path/to/spot-platform
make start

# Verify analyzer is registered
curl http://localhost:8001/api/v1/analyzers
  1. Run end-to-end test:
# Submit analysis request
curl -X POST http://localhost:8001/api/v1/analyze \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d @test-email.json

# Check if your analyzer was called
docker compose logs my-analyzer
  1. Run platform integration tests:
# In spot-platform
make test-integration

# Tests verify:
# - Analyzer registration
# - Orchestrator communication
# - Result aggregation
# - Error handling

Platform Documentation:


Deployment

Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY pyproject.toml poetry.lock ./
RUN pip install poetry && poetry install --no-dev

# Copy application
COPY . .

# Run analyzer
CMD ["poetry", "run", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

pyproject.toml:

[tool.poetry]
name = "my-analyzer"
version = "1.0.0"

[tool.poetry.dependencies]
python = "^3.11"
fastapi = "^0.104"
uvicorn = "^0.24"
spot-sdk-python = "^2.0.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Registration with SPOT Platform

Analyzers are registered via configuration:

# In spot-platform config
analyzers:
  - id: my-analyzer
    name: "My Custom Analyzer"
    url: "http://my-analyzer:8000"
    enabled: true
    priority: 10
    timeout_ms: 30000

Best Practices

Performance

  • Keep analysis < 30 seconds: Orchestrator has timeout
  • Use async operations: All endpoints should be async
  • Cache models: Load ML models once at startup, not per request
  • Batch where possible: Process multiple indicators efficiently

Reliability

  • Always return valid response: Never raise unhandled exceptions
  • Implement health checks properly: Check all critical dependencies
  • Log errors with context: Include email ID for debugging
  • Handle timeouts gracefully: Return partial results if needed

Security

  • Validate input: Check email object is well-formed
  • Sanitize data: Be careful with user-provided content
  • Limit resource usage: Set memory/CPU limits
  • Don't store emails: Process and discard, respect privacy

Indicator Quality

  • Be specific: "Contains bit.ly URL" not "suspicious URL"
  • Provide evidence: Show what triggered the indicator
  • Set appropriate confidence: Be honest about certainty
  • Use standard types: Prefer IndicatorType enums over custom strings

Examples

Example 1: Rule-Based Analyzer

from fastapi import FastAPI
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisIndicator, AnalysisMetadata, IndicatorType
import re

app = FastAPI(title="Simple Rule Analyzer")

# Detection rules
SUSPICIOUS_DOMAINS = ["bit.ly", "tinyurl.com", "t.co"]
URGENT_KEYWORDS = ["urgent", "immediate", "expires", "act now", "verify"]

@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
    """Analyze using simple rules."""
    indicators = []
    score = 0.0

    # Get text
    subject = email.headers.subject if email.headers else ""
    body = email.body_text or ""
    full_text = f"{subject} {body}".lower()

    # Check for shortened URLs
    for domain in SUSPICIOUS_DOMAINS:
        if domain in full_text:
            indicators.append(AnalysisIndicator(
                type=IndicatorType.SUSPICIOUS_LINKS,
                description=f"Contains shortened URL: {domain}",
                severity="high",
                confidence=0.9
            ))
            score += 0.4

    # Check for urgent language
    urgent_count = sum(1 for keyword in URGENT_KEYWORDS if keyword in full_text)
    if urgent_count >= 2:
        indicators.append(AnalysisIndicator(
            type=IndicatorType.URGENT_LANGUAGE,
            description=f"Contains {urgent_count} urgency keywords",
            severity="medium",
            confidence=0.7
        ))
        score += 0.3

    # Determine result
    is_phishing = score >= 0.5
    threat_level = "high" if score >= 0.7 else "medium" if score >= 0.5 else "low" if score >= 0.2 else "safe"

    return AnalysisResult(
        is_phishing=is_phishing,
        threat_level=threat_level,
        confidence=min(score, 1.0),
        explanation=f"Found {len(indicators)} indicators with score {score:.2f}",
        indicators=indicators,
        metadata=AnalysisMetadata(
            analyzer_id="simple-rules",
            analyzer_version="1.0.0",
            analysis_duration_ms=50
        )
    )

Example 2: ML-Based Analyzer

from fastapi import FastAPI
from transformers import pipeline
from spot_sdk.api_gateway import Email
from spot_sdk.api_gateway import AnalysisResult, AnalysisIndicator, AnalysisMetadata, IndicatorType
from datetime import datetime

app = FastAPI(title="ML Sentiment Analyzer")

# Load model once at startup
sentiment_analyzer = None

@app.on_event("startup")
async def load_models():
    """Load ML models at startup."""
    global sentiment_analyzer
    sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased")

@app.post("/internal/analyze")
async def analyze(email: Email) -> AnalysisResult:
    """Analyze using ML model."""
    start = datetime.utcnow()

    # Extract text
    text = f"{email.headers.subject if email.headers else ''} {email.body_text or ''}"

    # Run sentiment analysis
    result = sentiment_analyzer(text[:512])[0]  # Limit to 512 chars

    indicators = []
    score = 0.0

    # Check for negative sentiment
    if result['label'] == 'NEGATIVE' and result['score'] > 0.8:
        indicators.append(AnalysisIndicator(
            type=IndicatorType.SOCIAL_ENGINEERING,
            description=f"High negative sentiment detected ({result['score']:.2f})",
            severity="medium",
            confidence=result['score']
        ))
        score = result['score'] * 0.5

    is_phishing = score >= 0.4
    threat_level = "medium" if is_phishing else "safe"

    # Calculate duration
    duration_ms = int((datetime.utcnow() - start).total_seconds() * 1000)

    return AnalysisResult(
        is_phishing=is_phishing,
        threat_level=threat_level,
        confidence=score,
        explanation=f"Sentiment analysis: {result['label']} ({result['score']:.2f})",
        indicators=indicators,
        metadata=AnalysisMetadata(
            analyzer_id="ml-sentiment",
            analyzer_version="1.0.0",
            analysis_duration_ms=duration_ms,
            model_version="distilbert-base-uncased"
        ),
        raw_output={"sentiment": result}
    )

Troubleshooting

Common Issues

Q: My analyzer isn't being called - Check health endpoint returns 200 OK - Verify analyzer is registered in platform config - Check orchestrator logs for connection errors

Q: Timeout errors - Reduce analysis time (< 30 seconds) - Use async operations - Consider returning partial results

Q: Invalid response format - Ensure you return AnalysisResult object - Check all required fields are present - Validate with spot-sdk types

Q: Indicators not showing up - Use standard IndicatorType enums - Provide clear descriptions - Set appropriate severity levels


Support

SDK Documentation

  • SDK Documentation: https://spot-project.codeberg.page/documentation/sdk
  • API Reference: https://spot-project.codeberg.page/documentation/sdk/api-specs
  • Python SDK: https://spot-project.codeberg.page/documentation/sdk/python-sdk
  • TypeScript SDK: https://spot-project.codeberg.page/documentation/sdk/typescript-sdk

Platform Documentation

  • Platform Documentation: https://spot-project.codeberg.page/documentation/platform
  • Developer Guide: https://spot-project.codeberg.page/documentation/platform/guides/developer-guide
  • Testing Guide: https://spot-project.codeberg.page/documentation/platform/TESTING
  • Configuration Reference: https://spot-project.codeberg.page/documentation/platform/reference/configuration

Issues and Support

  • SDK Issues: https://codeberg.org/SPOT_Project/sdk/-/issues
  • Platform Issues: https://codeberg.org/SPOT_Project/core/-/issues

Changelog

  • 2025-11-03: Initial version
  • Created minimal analyzer guide
  • Added contract specifications
  • Added examples and best practices