Production Deployment Guide¶
FastPubSub applications are designed to be simple to run and scale. The fastpubsub run command uses Uvicorn under the hood and is production-capable when configured correctly.
Note
FastPubSub integrates its consumer lifecycle directly into the CLI. You must use fastpubsub run to start your application. Running with Gunicorn or the uvicorn CLI directly is not supported.
Deployment Concepts¶
Replication¶
Scale by running multiple instances of the same fastpubsub run command. All instances connect to the same Pub/Sub subscription, and Google Cloud automatically load-balances messages between them.
Statelessness¶
Design your consumers to be stateless. Message processing state should be managed by Pub/Sub (acknowledgments) or an external database. Stateless applications can be:
- Shut down, restarted, or moved without data loss.
- Automatically recovered by your orchestrator.
- Scaled based on demand.
Multiple Workers¶
When you use --workers N, the CLI starts one master process managing N worker processes. Each worker:
- Is a separate Python process with its own memory.
- Loads your entire application independently.
- Bypasses the Python GIL for true parallel CPU utilization.
Resource Scaling
Be careful when increasing the number of workers. If one worker uses 100MB RAM, 4 workers use ~400MB total.
Deployment Checklist¶
- Start with
fastpubsub run(CLI-only). - Set
GOOGLE_APPLICATION_CREDENTIALSorPUBSUB_EMULATOR_HOST. - Choose workers based on CPU and workload type.
- Configure health probes (
/consumers/alive,/consumers/ready). - Set
shutdown_timeoutandterminationGracePeriodSeconds.
Kubernetes Deployment¶
Strategy: 1 Worker per Pod¶
For container-based orchestration, use 1 worker per container and let Kubernetes handle scaling:
Scale by increasing replicas in your Deployment. This provides:
- Finer-grained scaling (one Pod at a time).
- Better resource management.
- Isolation (one crash only affects one Pod).
Dockerfile¶
Use a multi-stage build for a minimal production image:
# --- Build Stage ---
FROM python:3.12 as builder
WORKDIR /usr/src/app
RUN pip install --upgrade pip && pip install poetry
COPY poetry.lock pyproject.toml ./
RUN poetry config virtualenvs.in-project true && \
poetry install --no-root --no-dev --no-interaction --no-ansi
# --- Production Stage ---
FROM python:3.12-slim
WORKDIR /app
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser
COPY --from=builder --chown=appuser:appuser /usr/src/app/.venv ./.venv
COPY --chown=appuser:appuser . .
CMD [".venv/bin/fastpubsub", "run", "my_project.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]
Deployment Manifest¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastpubsub-worker
spec:
replicas: 3
selector:
matchLabels:
app: fastpubsub-worker
template:
metadata:
labels:
app: fastpubsub-worker
spec:
containers:
- name: app
image: my-registry/my-fastpubsub-app:latest
ports:
- containerPort: 8000
env:
- name: "GCP_PROJECT_ID"
value: "your-project-id"
- name: "GOOGLE_APPLICATION_CREDENTIALS"
value: "/path/to/creds/in/container"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1"
livenessProbe:
httpGet:
path: /consumers/alive
port: 8000
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /consumers/ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
terminationGracePeriodSeconds: 60
Health Probes Explained¶
| Probe | Endpoint | Checks |
|---|---|---|
| Liveness | /consumers/alive |
Web server is running, StreamingPull requests are active |
| Readiness | /consumers/ready |
Broker has started all consumer tasks, actively polling |
If readiness returns 503, the app is starting or has an error. Kubernetes waits before routing traffic.
Graceful Shutdown¶
When Kubernetes sends SIGTERM, FastPubSub:
- Signals all consumer tasks to stop polling
- Waits for in-flight messages to be cancelled
- Exits cleanly
Set terminationGracePeriodSeconds higher than your longest message processing time:
Virtual Machine Deployment¶
Multiple Workers¶
On a single VM, scale with the --workers flag:
Recommendation: (2 * CPU_CORES) + 1 workers. For a 4-core VM: (2 * 4) + 1 = 9.
systemd Service¶
Run as a persistent service with systemd:
# /etc/systemd/system/fastpubsub.service
[Unit]
Description=FastPubSub Application
After=network.target
[Service]
Type=simple
User=appuser
Group=appuser
WorkingDirectory=/opt/my-app
Environment="GCP_PROJECT_ID=your-project-id"
Environment="GOOGLE_APPLICATION_CREDENTIALS=/opt/my-app/credentials.json"
ExecStart=/opt/my-app/.venv/bin/fastpubsub run my_project.main:app --host 0.0.0.0 --port 8000 --workers 4
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Commands:
Configuration Best Practices¶
Load from Environment¶
Never hardcode production values:
import os
from fastpubsub import FastPubSub, PubSubBroker
PROJECT_ID = os.environ.get("GCP_PROJECT_ID")
if not PROJECT_ID:
raise RuntimeError("GCP_PROJECT_ID environment variable not set.")
broker = PubSubBroker(project_id=PROJECT_ID)
app = FastPubSub(broker)
In Kubernetes, inject via ConfigMaps or Secrets.
Logging to stdout¶
FastPubSub logs to stdout and stderr by default. In Kubernetes, container runtime captures these streams, and log aggregators (fluentd, Datadog) forward them to your logging solution.
Enable JSON logging for production:
Shutdown Timeout¶
Configure adequate shutdown time for in-flight messages:
broker = PubSubBroker(
project_id=PROJECT_ID,
shutdown_timeout=30.0, # Wait 30s for in-flight messages
)
Hybrid vs. Standalone Apps¶
| Type | Description | Health Checks |
|---|---|---|
| Hybrid | Has both @broker.subscriber handlers and FastAPI endpoints |
Built-in subscriber endpoints + your API endpoints |
| Standalone | Only @broker.subscriber handlers |
Built-in subscriber endpoints only |
Both types use the same deployment strategy.
Common Pitfalls¶
- Running with the
uvicornCLI directly (not supported). - Setting too many workers for available memory.
- Using a shutdown timeout lower than typical processing time.
Recap¶
- Entrypoint: Always use
fastpubsub runto start your application - Configuration: Load sensitive values from environment variables
- Logging: Log to stdout/stderr, use
--log-serializefor JSON - Kubernetes:
- Use
--workers 1in your container - Scale by increasing
replicas - Use health check endpoints for probes
- Set appropriate
terminationGracePeriodSeconds
- Use
- Virtual Machine:
- Use
--workers Nbased on CPU cores - Manage with systemd as a long-running service
- Use