2. RabbitMQ for Inter-Service Messaging¶

Date: 2025-11-04
Status: Accepted
Deciders: Core Team
Related: ADR-001 (Microservices Architecture)

Context¶

With microservices architecture, services need reliable async communication for:

Distributing email analysis jobs to multiple analyzers
Collecting results from analyzers back to orchestrator
Sending status updates and notifications
Implementing workflow orchestration patterns
Decoupling services to allow independent scaling

Requirements:

Reliable message delivery with acknowledgments
Message persistence to survive restarts
Support for both publish/subscribe and RPC patterns
Python async support (asyncio)
Battle-tested in production
Good monitoring and operational tools

Decision¶

Use RabbitMQ as the message broker with aio-pika client library for Python services.

Rationale¶

Reliability: RabbitMQ provides message acknowledgments, persistence, and guaranteed delivery
Pattern Support: Natively supports both pub/sub and RPC patterns we need for workflow orchestration
Async Python: aio-pika library provides excellent asyncio support for Python services
Battle-Tested: Proven in production at scale across many organizations
Features: Dead letter queues, message TTL, priority queues, exchanges, routing
Operations: Excellent management UI, monitoring plugins, and operational tools
Community: Large community, extensive documentation, active development

Consequences¶

Positive¶

Reliable async communication between services
Services can be deployed and scaled independently
Built-in retry mechanisms via dead letter queues
Message persistence survives broker restarts
Flexible routing with exchanges and bindings
Good debugging with management UI
Proven reliability and performance

Negative¶

Additional infrastructure component to deploy and maintain
Learning curve for team members unfamiliar with message brokers
Potential single point of failure (mitigated with clustering in production)
More complex local development setup
Network latency compared to in-process calls
Need monitoring for queue depths and throughput

Alternatives Considered¶

Alternative 1: Redis Pub/Sub¶

Pros:
Simple to set up and operate
Already using Redis for caching
Very low latency
Familiar to developers
Cons:
No message persistence (fire-and-forget)
No message acknowledgments
No built-in retry mechanisms
Messages lost if no subscriber connected
No dead letter queues
Why rejected: Reliability requirements need persistence and acknowledgments

Alternative 2: Apache Kafka¶

Pros:
Extremely high throughput
Built-in partitioning and replication
Message replay capability
Event sourcing patterns
Cons:
Overkill for our message volumes (hundreds/thousands per day, not millions)
More complex to operate and configure
Heavier resource usage (memory, disk, network)
Steeper learning curve
RPC pattern more complex to implement
Why rejected: Over-engineered for our scale and needs

Alternative 3: Direct HTTP calls¶

Pros:
Simple to implement
No additional infrastructure needed
Familiar to all developers
Easy debugging with browser tools
Cons:
Tight coupling between services
No async processing (blocking calls)
Difficult to implement retry logic
No built-in load balancing
Cascading failures possible
Cannot scale message handling independently
Why rejected: Doesn't support async patterns and decoupling we need

Implementation Notes¶

Use separate exchanges for different message types (analysis, notifications, admin)
Implement dead letter queues for failed message handling
Set appropriate message TTLs to prevent queue buildup
Configure connection pooling in client libraries
Use persistent messages for critical analysis jobs
Use transient messages for non-critical status updates
Implement RPC pattern for request/reply workflows
Create message broker abstraction layer for future flexibility

Architecture:

spot.analysis (exchange)
  ├─> analyze.request (routing key) → analyzer-orchestrator queue
  └─> analyze.result (routing key) → result-processor queue

spot.notifications (exchange)
  └─> notify.* (routing keys) → notification queues

References¶

RabbitMQ Documentation
aio-pika GitHub
RabbitMQ Patterns
Implementation: spot-platform/shared/messaging/