Skip to content

2. RabbitMQ for Inter-Service Messaging

Date: 2025-11-04
Status: Accepted
Deciders: Core Team
Related: ADR-001 (Microservices Architecture)

Context

With microservices architecture, services need reliable async communication for:

  • Distributing email analysis jobs to multiple analyzers
  • Collecting results from analyzers back to orchestrator
  • Sending status updates and notifications
  • Implementing workflow orchestration patterns
  • Decoupling services to allow independent scaling

Requirements:

  • Reliable message delivery with acknowledgments
  • Message persistence to survive restarts
  • Support for both publish/subscribe and RPC patterns
  • Python async support (asyncio)
  • Battle-tested in production
  • Good monitoring and operational tools

Decision

Use RabbitMQ as the message broker with aio-pika client library for Python services.

Rationale

  1. Reliability: RabbitMQ provides message acknowledgments, persistence, and guaranteed delivery
  2. Pattern Support: Natively supports both pub/sub and RPC patterns we need for workflow orchestration
  3. Async Python: aio-pika library provides excellent asyncio support for Python services
  4. Battle-Tested: Proven in production at scale across many organizations
  5. Features: Dead letter queues, message TTL, priority queues, exchanges, routing
  6. Operations: Excellent management UI, monitoring plugins, and operational tools
  7. Community: Large community, extensive documentation, active development

Consequences

Positive

  • Reliable async communication between services
  • Services can be deployed and scaled independently
  • Built-in retry mechanisms via dead letter queues
  • Message persistence survives broker restarts
  • Flexible routing with exchanges and bindings
  • Good debugging with management UI
  • Proven reliability and performance

Negative

  • Additional infrastructure component to deploy and maintain
  • Learning curve for team members unfamiliar with message brokers
  • Potential single point of failure (mitigated with clustering in production)
  • More complex local development setup
  • Network latency compared to in-process calls
  • Need monitoring for queue depths and throughput

Alternatives Considered

Alternative 1: Redis Pub/Sub

  • Pros:
  • Simple to set up and operate
  • Already using Redis for caching
  • Very low latency
  • Familiar to developers
  • Cons:
  • No message persistence (fire-and-forget)
  • No message acknowledgments
  • No built-in retry mechanisms
  • Messages lost if no subscriber connected
  • No dead letter queues
  • Why rejected: Reliability requirements need persistence and acknowledgments

Alternative 2: Apache Kafka

  • Pros:
  • Extremely high throughput
  • Built-in partitioning and replication
  • Message replay capability
  • Event sourcing patterns
  • Cons:
  • Overkill for our message volumes (hundreds/thousands per day, not millions)
  • More complex to operate and configure
  • Heavier resource usage (memory, disk, network)
  • Steeper learning curve
  • RPC pattern more complex to implement
  • Why rejected: Over-engineered for our scale and needs

Alternative 3: Direct HTTP calls

  • Pros:
  • Simple to implement
  • No additional infrastructure needed
  • Familiar to all developers
  • Easy debugging with browser tools
  • Cons:
  • Tight coupling between services
  • No async processing (blocking calls)
  • Difficult to implement retry logic
  • No built-in load balancing
  • Cascading failures possible
  • Cannot scale message handling independently
  • Why rejected: Doesn't support async patterns and decoupling we need

Implementation Notes

  • Use separate exchanges for different message types (analysis, notifications, admin)
  • Implement dead letter queues for failed message handling
  • Set appropriate message TTLs to prevent queue buildup
  • Configure connection pooling in client libraries
  • Use persistent messages for critical analysis jobs
  • Use transient messages for non-critical status updates
  • Implement RPC pattern for request/reply workflows
  • Create message broker abstraction layer for future flexibility

Architecture:

spot.analysis (exchange)
  ├─> analyze.request (routing key) → analyzer-orchestrator queue
  └─> analyze.result (routing key) → result-processor queue

spot.notifications (exchange)
  └─> notify.* (routing keys) → notification queues

References