Context Providers Guide¶
Context providers ingest organisational data (employee directory, threat intelligence, wiki pages, policies, ...) into the SPOT Knowledge Store. They run as standalone HTTP services. They do not see emails and do not return verdicts ; they sync documents on a schedule, and analyzers retrieve those documents on demand by tag + semantic similarity.
When to write a context provider¶
Use a context provider when:
- You have an external source of organisational facts (LDAP, HRIS, threat feed, wiki, ticketing system) from which analyzers can benefit.
- The data refreshes on a schedule, not per-email.
- Analyzers can find what they need by tag expression + free-text query.
HTTP contract¶
A context provider exposes one platform-facing endpoint plus the usual plugin endpoints.
POST /internal/sync¶
Called by the api-gateway scheduler on the cadence declared in spot.yaml, and on manual admin trigger. The provider:
- Fetches the latest snapshot from its source system.
- Builds
KnowledgeDocumentobjects (one per record, or one per chunk for long text). - Calls
await KnowledgeClient.bulk_upsert(docs). - Returns a JSON summary like
{"upserted": <int>}.
The endpoint is protected by the X-Internal-API-Key header (same key used elsewhere ; SPOT_INTERNAL_API_KEY env var, injected by the installer).
GET /health¶
Returns {"status": "ok", "service": "<name>", "version": "<v>"}.
GET /settings/schema¶
Returns the JSON Schema for the provider's user-editable settings — must match the spot.plugin.settings Docker label so the dashboard can render an edit form.
Implementing a provider (Python / FastAPI)¶
from fastapi import FastAPI, HTTPException
from spot_sdk import KnowledgeClient, KnowledgeDocument, KnowledgeTag
app = FastAPI()
@app.post("/internal/sync")
async def sync() -> dict[str, int]:
employees = await fetch_from_ldap()
client = KnowledgeClient(
url=os.environ["SPOT_KNOWLEDGE_URL"],
api_key=os.environ["SPOT_INTERNAL_API_KEY"],
)
docs = [
KnowledgeDocument(
id=f"employee:{e.email}",
content=f"{e.name}, {e.title}, {e.department}. Email: {e.email}.",
tags=[
KnowledgeTag.EMPLOYEE,
*([KnowledgeTag.EXECUTIVE] if e.is_executive else []),
e.department.lower(),
],
metadata={"email": e.email, "title": e.title},
source="provider-employee-dir",
)
for e in employees
]
await client.bulk_upsert(docs)
return {"upserted": len(docs)}
@app.get("/health")
async def health() -> dict[str, str]:
return {"status": "ok", "service": "provider-employee-dir"}
For long documents (wiki pages, policy PDFs, ...), split with spot_sdk.chunk_text(text, max_chars=2000, overlap=200) and upsert each chunk with metadata["parent_id"] pointing back at the source — embeddings are better per-chunk and retrieval is finer-grained.
provider-employee-dir/ is the reference implementation.
Tags and the shared vocabulary¶
Tags are the only categorisation axis the Knowledge Store understands. Use the shared constants from spot_sdk.knowledge_tags (EMPLOYEE, EXECUTIVE, WIKI_PAGE, POLICY, FINANCE, ...) so analyzers and providers stay aligned. Custom tags like acme:jira_issue are fine where the shared vocabulary doesn't fit.
Tag expressions (analyzer-side) support + (AND) and | (OR), e.g. employee+executive|director. Tokens must match [a-z0-9][a-z0-9_\-:]*. Full grammar in KNOWLEDGE-STORE.md.
Registering a provider¶
Context providers live under plugins.context_providers in spot.yaml:
plugins:
context_providers:
employee-dir:
enabled: true
url: http://provider-employee-dir:8000
sync_schedule: "0 */6 * * *" # every 6 hours
sync_timeout_ms: 120000
settings:
SOURCE_BACKEND: ldap
LDAP_URL: ldap://ldap.example.com
| Field | Description |
|---|---|
enabled | Whether the scheduler runs syncs for this provider |
url | Base URL the orchestrator and scheduler call |
sync_schedule | Cron expression (or empty for manual-only) |
sync_timeout_ms | Max time for a single /internal/sync call |
settings | Provider-specific config (see /settings/schema) |
Supported cron syntax: fixed values, *, comma lists (0,15,30,45), ranges (3-6), steps (*/10), and @hourly / @daily / @weekly / @monthly aliases. Absent or blank sync_schedule means "manual-only" ; the scheduler skips the provider but the manual trigger still works.
You can also manage providers via the API or the Plugins page in the web dashboard, which writes to the same spot.yaml:
| Method | Path | Action |
|---|---|---|
| GET | /api/v1/config/plugin/context_provider | List IDs |
| GET | /api/v1/config/plugin/context_provider/{id} | Get one |
| POST | /api/v1/config/plugin/context_provider | Create |
| PUT | /api/v1/config/plugin/context_provider/{id} | Update |
| DELETE | /api/v1/config/plugin/context_provider/{id} | Delete |
| POST | /api/v1/config/plugin/context_provider/{id}/enable | Enable |
| POST | /api/v1/config/plugin/context_provider/{id}/disable | Disable |
The same URL shape with kind=analyzer and kind=mail_retriever covers the other two plugin kinds.
Triggering a sync manually¶
Admin-only endpoints ; surfaced in the dashboard as a "Sync now" button on the provider's detail page:
POST /api/v1/plugins/context_provider/{id}/sync # run once now
GET /api/v1/plugins/context_provider/{id}/sync # last-run state
The status payload includes the last run's outcome (success | failure), the document count, the next scheduled fire time, and any cron-validation errors.
Failure handling¶
- Sync timeout: the scheduler aborts after
sync_timeout_ms, recordsfailurefor that run, and tries again at the next scheduled fire time. - Sync error: any non-2xx response (or unhandled exception) is recorded; previously-ingested documents stay in the store so analyzers continue to retrieve the last good snapshot.
- Knowledge Store unreachable:
KnowledgeClient.bulk_upsertraises ; return502 Bad Gatewayfrom/internal/syncso the scheduler records a clean failure. - Embedding backend down (Ollama): the Knowledge Store returns
503to upsert calls; treat the same as an unreachable store.
The dashboard's Knowledge readiness banner aggregates these signals so operators see a single status line.
Packaging¶
Context providers follow the same packaging pattern as the other plugin kinds:
- Package as a Docker image exposing port
8000. - Implement
POST /internal/sync,GET /health,GET /settings/schema. - Add the standard OCI labels:
spot.plugin.kind=context_providerspot.plugin.id=<unique-id>spot.plugin.settings=<JSON Schema matching /settings/schema>- Deploy alongside the SPOT platform (docker-compose, Kubernetes, ...).
- Register via
spot.yamlor the dashboard.
The SPOT_KNOWLEDGE_URL and SPOT_INTERNAL_API_KEY env vars are injected automatically when the installer-managed lifecycle starts the container; for manual deployments, set both yourself.
Providers can be written in any language as long as they honour the HTTP contract above and the Knowledge Store HTTP API (POST /bulk-upsert, see KNOWLEDGE-STORE.md). The Python SDK is provided as a convenience wrapper.