4. Separate Analyzer Repositories¶

Date: 2025-11-04
Status: Accepted
Deciders: Core Team
Related: ADR-001 (Microservices Architecture), ADR-005 (spot-sdk Package)

Context¶

Analyzers are specialized services that use different technologies:

spot-analyzer-nlp: Uses DistilBERT and NER models (Python ML libraries)
spot-analyzer-llm: Uses Ollama for LLM-based analysis (large model downloads)
spot-analyzer-context: Rule-based analysis (lightweight logic)

Each analyzer has:

Different dependencies and ML frameworks
Different resource requirements (CPU, memory, GPU)
Different development teams and expertise
Different release cycles
Different testing requirements

We need to decide: monorepo vs separate repositories for analyzers.

Decision¶

Use separate Git repositories for each analyzer type:

spot-platform/ - Core orchestration platform
spot-sdk/ - Shared contracts and interfaces
spot-analyzer-nlp/ - NLP-based analyzer
spot-analyzer-llm/ - LLM-based analyzer
spot-analyzer-context/ - Rule-based analyzer

Rationale¶

Clear Ownership: Each repository has a clear owner team
Independent Releases: Analyzers can be released independently
Focused Dependencies: Each analyzer only includes its required dependencies
Repository Size: Keeps repos small and fast to clone
CI/CD Simplicity: Each analyzer has its own pipeline
Team Autonomy: Teams can work without affecting other analyzers
Technology Isolation: NLP team doesn't need to understand LLM code

Consequences¶

Positive¶

Smaller repositories are faster to clone and easier to navigate
Independent release cycles and version numbers
Focused CI/CD pipelines (only test what changed)
Clear ownership and responsibility boundaries
Easier onboarding (new developers only learn relevant repos)
Can use different CI tools per analyzer if needed
Dependency conflicts isolated per analyzer

Negative¶

Need to maintain multiple repositories
Cross-repo changes require coordination
Harder to make atomic changes across analyzers
Duplicate CI/CD configuration across repos
Need version management across repos
Cannot use repo-wide code search
Need process for keeping contracts in sync

Alternatives Considered¶

Alternative 1: Monorepo¶

Pros:
Single place for all code
Atomic commits across services
Easier refactoring across services
Single CI/CD pipeline
Repo-wide search and refactoring tools
Cons:
Large repository slow to clone
All dependencies in one place (huge node_modules/venv)
Changes to one analyzer trigger CI for all
Merge conflicts more common
Harder to enforce ownership
Mixed ML frameworks in single repo
Why rejected: Analyzers are too different, monorepo benefits don't outweigh costs

Alternative 2: Monorepo with build tools (Nx, Turborepo)¶

Pros:
Monorepo benefits with selective builds
Incremental testing
Dependency graph management
Cons:
Requires additional tooling and learning
Complexity overhead for small team
Still large repository
Build tool lock-in
Why rejected: Over-engineered for our team size and structure

Implementation Notes¶

Repository structure:

GitHub/GitLab Organization: spot-platform
├── spot-platform/          (main platform)
├── spot-sdk/         (shared contracts)
├── spot-analyzer-nlp/      (NLP analyzer)
├── spot-analyzer-llm/      (LLM analyzer)
└── spot-analyzer-context/  (context analyzer)

Coordination mechanisms:

Contracts: spot-sdk package versioned and published
Communication: Changes to contracts discussed in main platform repo issues
Documentation: Central docs in spot-platform repo
Testing: Integration tests in spot-platform verify analyzer compatibility

Analyzer repository template:

spot-analyzer-xxx/
├── src/              # Analyzer implementation
├── tests/            # Unit and integration tests
├── Dockerfile        # Container definition
├── pyproject.toml    # Dependencies (references spot-sdk)
└── README.md         # Analyzer-specific docs

References¶

Monorepo vs Polyrepo
SPOT Analyzer Development Guide: spot-sdk/docs/ANALYZER_DEVELOPMENT.md
Repository structure example: spot-analyzer-nlp/