Dead-Letter Topics¶
Dead-letter topics (DLT) provide bounded failure handling for subscriptions that cannot process some messages successfully. Instead of retrying indefinitely, Pub/Sub redirects repeatedly failing messages to a separate topic after a configured number of delivery attempts.
In production systems, this mechanism protects healthy traffic from poison messages and creates an explicit queue for operational triage.
Conceptual Model¶
A message follows this lifecycle:
- It is delivered to a subscription.
- The handler succeeds and acknowledges the message, or fails and triggers a retry.
- Retry continues until
max_delivery_attemptsis reached. - Pub/Sub moves the message to
dead_letter_topic.
This means dead-letter routing is not a replacement for handler quality. It is a containment mechanism that prevents one pathological message class from degrading the full subscription.
flowchart LR
A[Message delivered] --> B{Handler success?}
B -->|Yes| C[Acked and removed]
B -->|No| D{Attempts < max?}
D -->|Yes| E[Retry with backoff]
E --> A
D -->|No| F[Move to dead-letter topic]
Baseline Configuration¶
Configure dead-letter routing directly in the subscriber declaration:
@broker.subscriber(
alias="order-processor",
topic_name="orders",
subscription_name="orders-subscription",
dead_letter_topic="orders-dlq",
max_delivery_attempts=5,
autocreate=True,
)
async def process_order(message: Message):
await process_payment(message.data)
Parameters and Their Roles¶
| Parameter | Role | Typical Decision Rule |
|---|---|---|
dead_letter_topic |
Destination for failed messages | Use {topic}-dlq or {topic}-dlt naming convention. |
max_delivery_attempts |
Retry ceiling before reroute | Start with 5 (the minimum), increase only if transient failures are common. |
autocreate |
Creates resources at startup | Keep True in local or dev environment but decide by platform policy in prod. |
Retry Dynamics and Backoff¶
Dead-letter topics are most effective when paired with deliberate retry pacing.
@broker.subscriber(
alias="api-caller",
topic_name="api-requests",
subscription_name="api-requests-subscription",
dead_letter_topic="api-requests-dlq",
max_delivery_attempts=10,
min_backoff_delay_secs=10,
max_backoff_delay_secs=600,
autocreate=True,
)
async def call_api(message: Message):
await call_external_api(message.data)
With exponential backoff, transient outages receive time to recover while permanent failures are eventually quarantined.
| Attempt | Approximate Wait |
|---|---|
| 1 | Immediate |
| 2 | ~10 seconds |
| 3 | ~20 seconds |
| 4 | ~40 seconds |
| 5 | ~80 seconds |
| 6+ | 600 seconds as maximum back-off period |
Handling Dead-Letter Traffic¶
A dead-letter topic should always have a dedicated consumer path. If not, failures become invisible operational debt.
@broker.subscriber(
alias="dlq-handler",
topic_name="orders-dlq",
subscription_name="orders-dlq-subscription",
autocreate=True,
)
async def handle_failed_orders(message: Message):
# Log the failure with details
logger.error(
f"Message {message.id} failed permanently",
extra={
"message_data": message.data.decode("utf-8"),
"attributes": message.attributes,
"delivery_attempt": message.delivery_attempt,
},
)
# Alert your operations team
await send_alert_to_ops_team(message)
# Store for later analysis
await store_failed_message(message)
The handler should implement at least one of the following:
- Alerting for immediate operator awareness.
- Persistence for forensic analysis and replay workflows.
- Enrichment with diagnostic context (correlation IDs, tenant, source service).
Operational Patterns¶
Alert + Persist Pattern¶
@broker.subscriber(
alias="dlq-alert-store",
topic_name="events-dlq",
subscription_name="events-dlq-subscription",
)
async def handle_dlq_alert_store(message: Message):
await slack_webhook.send(f"Failed message: {message.id}")
await database.insert(
"failed_messages",
{
"message_id": message.id,
"data": message.data,
"failed_at": datetime.now(UTC),
},
)
Fallback Execution Pattern¶
@broker.subscriber(
alias="dlq-retry",
topic_name="payments-dlq",
subscription_name="payments-dlq-subscription",
)
async def retry_with_fallback(message: Message):
# Try a fallback payment processor
await fallback_payment_service.process(message.data)
Manual Review Queue Pattern¶
@broker.subscriber(
alias="dlq-review",
topic_name="tickets-dlq",
subscription_name="tickets-dlq-subscription",
)
async def queue_for_review(message: Message):
await admin_dashboard.create_ticket(
title=f"Failed order: {message.id}",
data=message.data,
priority="high",
)
Validation with PubSubTestClient¶
PubSubTestClient is useful to verify local failure behavior (for example, that handler failures are observable in test results)
without infrastructure dependency.
@pytest.mark.asyncio
async def test_failed_message_reaches_error_result_stream() -> None:
test_broker = PubSubBroker(project_id="test-project")
@test_broker.subscriber(
alias="always-fails",
topic_name="orders",
subscription_name="orders-subscription",
dead_letter_topic="orders-dlq",
max_delivery_attempts=5,
)
async def always_fails(_: Message) -> None:
raise ValueError("invalid payload")
async with PubSubTestClient(test_broker) as client:
await client.publish(topic="orders", data={"order_id": "ord-1"})
results = client.get_results()
assert len(results) == 1
assert isinstance(results[0].error, ValueError)
Note that in-memory tests validate application behavior and error surfaces. Final validation of managed dead-letter routing itself should still be exercised in an integration environment.
Design Recommendations¶
Choosing max_delivery_attempts¶
- Keep values low when failures are deterministic (schema errors, impossible states).
- Increase values when downstream dependencies are known to recover quickly.
- Prefer explicit tuning over high defaults; large values delay incident visibility.
Naming Strategy¶
Use a consistent suffix and include bounded domain context:
orders-dlqpayments-dlqinventory-dlq
Consistent names simplify dashboards, alerts, and runbook lookup.
Monitoring Signals¶
Track:
- Dead-letter message ingress rate.
- Most frequent failure class.
- Time-to-resolution per dead-letter message.
- Replay success rate after remediation.
Common Failure Modes¶
- Configuring a dead-letter topic but never subscribing to it.
- Setting
max_delivery_attemptstoo high and delaying diagnosis. - Ignoring retry backoff, causing rapid failure loops.
- Mixing naming conventions and losing traceability.
Recap¶
- Dead-letter topics isolate persistent failures from healthy traffic.
- Configure
dead_letter_topicandmax_delivery_attemptsper subscriber. - Pair dead-letter routing with retry backoff for controlled failure pacing.
- Always consume and monitor the dead-letter topic.
- Validate handler failure behavior early with
PubSubTestClient, then verify managed routing in integration.