2. RabbitMQ for Inter-Service Messaging¶
Date: 2025-11-04
Status: Accepted
Deciders: Core Team
Related: ADR-001 (Microservices Architecture)
Context¶
With microservices architecture, services need reliable async communication for:
- Distributing email analysis jobs to multiple analyzers
- Collecting results from analyzers back to orchestrator
- Sending status updates and notifications
- Implementing workflow orchestration patterns
- Decoupling services to allow independent scaling
Requirements:
- Reliable message delivery with acknowledgments
- Message persistence to survive restarts
- Support for both publish/subscribe and RPC patterns
- Python async support (asyncio)
- Battle-tested in production
- Good monitoring and operational tools
Decision¶
Use RabbitMQ as the message broker with aio-pika client library for Python services.
Rationale¶
- Reliability: RabbitMQ provides message acknowledgments, persistence, and guaranteed delivery
- Pattern Support: Natively supports both pub/sub and RPC patterns we need for workflow orchestration
- Async Python: aio-pika library provides excellent asyncio support for Python services
- Battle-Tested: Proven in production at scale across many organizations
- Features: Dead letter queues, message TTL, priority queues, exchanges, routing
- Operations: Excellent management UI, monitoring plugins, and operational tools
- Community: Large community, extensive documentation, active development
Consequences¶
Positive¶
- Reliable async communication between services
- Services can be deployed and scaled independently
- Built-in retry mechanisms via dead letter queues
- Message persistence survives broker restarts
- Flexible routing with exchanges and bindings
- Good debugging with management UI
- Proven reliability and performance
Negative¶
- Additional infrastructure component to deploy and maintain
- Learning curve for team members unfamiliar with message brokers
- Potential single point of failure (mitigated with clustering in production)
- More complex local development setup
- Network latency compared to in-process calls
- Need monitoring for queue depths and throughput
Alternatives Considered¶
Alternative 1: Redis Pub/Sub¶
- Pros:
- Simple to set up and operate
- Already using Redis for caching
- Very low latency
- Familiar to developers
- Cons:
- No message persistence (fire-and-forget)
- No message acknowledgments
- No built-in retry mechanisms
- Messages lost if no subscriber connected
- No dead letter queues
- Why rejected: Reliability requirements need persistence and acknowledgments
Alternative 2: Apache Kafka¶
- Pros:
- Extremely high throughput
- Built-in partitioning and replication
- Message replay capability
- Event sourcing patterns
- Cons:
- Overkill for our message volumes (hundreds/thousands per day, not millions)
- More complex to operate and configure
- Heavier resource usage (memory, disk, network)
- Steeper learning curve
- RPC pattern more complex to implement
- Why rejected: Over-engineered for our scale and needs
Alternative 3: Direct HTTP calls¶
- Pros:
- Simple to implement
- No additional infrastructure needed
- Familiar to all developers
- Easy debugging with browser tools
- Cons:
- Tight coupling between services
- No async processing (blocking calls)
- Difficult to implement retry logic
- No built-in load balancing
- Cascading failures possible
- Cannot scale message handling independently
- Why rejected: Doesn't support async patterns and decoupling we need
Implementation Notes¶
- Use separate exchanges for different message types (analysis, notifications, admin)
- Implement dead letter queues for failed message handling
- Set appropriate message TTLs to prevent queue buildup
- Configure connection pooling in client libraries
- Use persistent messages for critical analysis jobs
- Use transient messages for non-critical status updates
- Implement RPC pattern for request/reply workflows
- Create message broker abstraction layer for future flexibility
Architecture:
spot.analysis (exchange)
├─> analyze.request (routing key) → analyzer-orchestrator queue
└─> analyze.result (routing key) → result-processor queue
spot.notifications (exchange)
└─> notify.* (routing keys) → notification queues
References¶
- RabbitMQ Documentation
- aio-pika GitHub
- RabbitMQ Patterns
- Implementation:
spot-platform/shared/messaging/