Context Providers Guide¶

Context providers ingest organisational data (employee directory, threat intelligence, wiki pages, policies, ...) into the SPOT Knowledge Store. They run as standalone HTTP services. They do not see emails and do not return verdicts ; they sync documents on a schedule, and analyzers retrieve those documents on demand by tag + semantic similarity.

When to write a context provider¶

Use a context provider when:

You have an external source of organisational facts (LDAP, HRIS, threat feed, wiki, ticketing system) from which analyzers can benefit.
The data refreshes on a schedule, not per-email.
Analyzers can find what they need by tag expression + free-text query.

HTTP contract¶

A context provider exposes one platform-facing endpoint plus the usual plugin endpoints.

`POST /internal/sync`¶

Called by the api-gateway scheduler on the cadence declared in spot.yaml, and on manual admin trigger. The provider:

Fetches the latest snapshot from its source system.
Builds KnowledgeDocument objects (one per record, or one per chunk for long text).
Calls await KnowledgeClient.bulk_upsert(docs).
Returns a JSON summary like {"upserted": <int>}.

The endpoint is protected by the X-Internal-API-Key header (same key used elsewhere ; SPOT_INTERNAL_API_KEY env var, injected by the installer).

`GET /health`¶

Returns {"status": "ok", "service": "<name>", "version": "<v>"}.

`GET /settings/schema`¶

Returns the JSON Schema for the provider's user-editable settings — must match the spot.plugin.settings Docker label so the dashboard can render an edit form.

Implementing a provider (Python / FastAPI)¶

from fastapi import FastAPI, HTTPException
from spot_sdk import KnowledgeClient, KnowledgeDocument, KnowledgeTag

app = FastAPI()


@app.post("/internal/sync")
async def sync() -> dict[str, int]:
    employees = await fetch_from_ldap()
    client = KnowledgeClient(
        url=os.environ["SPOT_KNOWLEDGE_URL"],
        api_key=os.environ["SPOT_INTERNAL_API_KEY"],
    )
    docs = [
        KnowledgeDocument(
            id=f"employee:{e.email}",
            content=f"{e.name}, {e.title}, {e.department}. Email: {e.email}.",
            tags=[
                KnowledgeTag.EMPLOYEE,
                *([KnowledgeTag.EXECUTIVE] if e.is_executive else []),
                e.department.lower(),
            ],
            metadata={"email": e.email, "title": e.title},
            source="provider-employee-dir",
        )
        for e in employees
    ]
    await client.bulk_upsert(docs)
    return {"upserted": len(docs)}


@app.get("/health")
async def health() -> dict[str, str]:
    return {"status": "ok", "service": "provider-employee-dir"}

For long documents (wiki pages, policy PDFs, ...), split with spot_sdk.chunk_text(text, max_chars=2000, overlap=200) and upsert each chunk with metadata["parent_id"] pointing back at the source — embeddings are better per-chunk and retrieval is finer-grained.

provider-employee-dir/ is the reference implementation.

Tags and the shared vocabulary¶

Tags are the only categorisation axis the Knowledge Store understands. Use the shared constants from spot_sdk.knowledge_tags (EMPLOYEE, EXECUTIVE, WIKI_PAGE, POLICY, FINANCE, ...) so analyzers and providers stay aligned. Custom tags like acme:jira_issue are fine where the shared vocabulary doesn't fit.

Tag expressions (analyzer-side) support + (AND) and | (OR), e.g. employee+executive|director. Tokens must match [a-z0-9][a-z0-9_\-:]*. Full grammar in KNOWLEDGE-STORE.md.

Registering a provider¶

Context providers live under plugins.context_providers in spot.yaml:

plugins:
  context_providers:
    employee-dir:
      enabled: true
      url: http://provider-employee-dir:8000
      sync_schedule: "0 */6 * * *"   # every 6 hours
      sync_timeout_ms: 120000
      settings:
        SOURCE_BACKEND: ldap
        LDAP_URL: ldap://ldap.example.com

Field	Description
`enabled`	Whether the scheduler runs syncs for this provider
`url`	Base URL the orchestrator and scheduler call
`sync_schedule`	Cron expression (or empty for manual-only)
`sync_timeout_ms`	Max time for a single `/internal/sync` call
`settings`	Provider-specific config (see `/settings/schema`)

Supported cron syntax: fixed values, *, comma lists (0,15,30,45), ranges (3-6), steps (*/10), and @hourly / @daily / @weekly / @monthly aliases. Absent or blank sync_schedule means "manual-only" ; the scheduler skips the provider but the manual trigger still works.

You can also manage providers via the API or the Plugins page in the web dashboard, which writes to the same spot.yaml:

Method	Path	Action
GET	`/api/v1/config/plugin/context_provider`	List IDs
GET	`/api/v1/config/plugin/context_provider/{id}`	Get one
POST	`/api/v1/config/plugin/context_provider`	Create
PUT	`/api/v1/config/plugin/context_provider/{id}`	Update
DELETE	`/api/v1/config/plugin/context_provider/{id}`	Delete
POST	`/api/v1/config/plugin/context_provider/{id}/enable`	Enable
POST	`/api/v1/config/plugin/context_provider/{id}/disable`	Disable

The same URL shape with kind=analyzer and kind=mail_retriever covers the other two plugin kinds.

Triggering a sync manually¶

Admin-only endpoints ; surfaced in the dashboard as a "Sync now" button on the provider's detail page:

POST /api/v1/plugins/context_provider/{id}/sync   # run once now
GET  /api/v1/plugins/context_provider/{id}/sync   # last-run state

The status payload includes the last run's outcome (success | failure), the document count, the next scheduled fire time, and any cron-validation errors.

Failure handling¶

Sync timeout: the scheduler aborts after sync_timeout_ms, records failure for that run, and tries again at the next scheduled fire time.
Sync error: any non-2xx response (or unhandled exception) is recorded; previously-ingested documents stay in the store so analyzers continue to retrieve the last good snapshot.
Knowledge Store unreachable: KnowledgeClient.bulk_upsert raises ; return 502 Bad Gateway from /internal/sync so the scheduler records a clean failure.
Embedding backend down (Ollama): the Knowledge Store returns 503 to upsert calls; treat the same as an unreachable store.

The dashboard's Knowledge readiness banner aggregates these signals so operators see a single status line.

Packaging¶

Context providers follow the same packaging pattern as the other plugin kinds:

Package as a Docker image exposing port 8000.
Implement POST /internal/sync, GET /health, GET /settings/schema.
Add the standard OCI labels:
spot.plugin.kind=context_provider
spot.plugin.id=<unique-id>
spot.plugin.settings=<JSON Schema matching /settings/schema>
Deploy alongside the SPOT platform (docker-compose, Kubernetes, ...).
Register via spot.yaml or the dashboard.

The SPOT_KNOWLEDGE_URL and SPOT_INTERNAL_API_KEY env vars are injected automatically when the installer-managed lifecycle starts the container; for manual deployments, set both yourself.

Providers can be written in any language as long as they honour the HTTP contract above and the Knowledge Store HTTP API (POST /bulk-upsert, see KNOWLEDGE-STORE.md). The Python SDK is provided as a convenience wrapper.