Production Deployment Guide¶

FastPubSub applications are designed to be simple to run and scale. The fastpubsub run command uses Uvicorn under the hood and is production-capable when configured correctly.

fastpubsub run my_app.main:app --host 0.0.0.0 --port 8000 --workers 4

Note

FastPubSub integrates its consumer lifecycle directly into the CLI. You must use fastpubsub run to start your application. Running with Gunicorn or the uvicorn CLI directly is not supported.

Deployment Concepts¶

Replication¶

Scale by running multiple instances of the same fastpubsub run command. All instances connect to the same Pub/Sub subscription, and Google Cloud automatically load-balances messages between them.

Statelessness¶

Design your consumers to be stateless. Message processing state should be managed by Pub/Sub (acknowledgments) or an external database. Stateless applications can be:

Shut down, restarted, or moved without data loss.
Automatically recovered by your orchestrator.
Scaled based on demand.

Multiple Workers¶

When you use --workers N, the CLI starts one master process managing N worker processes. Each worker:

Is a separate Python process with its own memory.
Loads your entire application independently.
Bypasses the Python GIL for true parallel CPU utilization.

Resource Scaling

Be careful when increasing the number of workers. If one worker uses 100MB RAM, 4 workers use ~400MB total.

Deployment Checklist¶

Start with fastpubsub run (CLI-only).
Set GOOGLE_APPLICATION_CREDENTIALS or PUBSUB_EMULATOR_HOST.
Choose workers based on CPU and workload type.
Configure health probes (/consumers/alive, /consumers/ready).
Set shutdown_timeout and terminationGracePeriodSeconds.

Kubernetes Deployment¶

Strategy: 1 Worker per Pod¶

For container-based orchestration, use 1 worker per container and let Kubernetes handle scaling:

# In your Dockerfile CMD
fastpubsub run app:app --host 0.0.0.0 --port 8000 --workers 1

Scale by increasing replicas in your Deployment. This provides:

Finer-grained scaling (one Pod at a time).
Better resource management.
Isolation (one crash only affects one Pod).

Dockerfile¶

Use a multi-stage build for a minimal production image:

# --- Build Stage ---
FROM python:3.12 as builder

WORKDIR /usr/src/app

RUN pip install --upgrade pip && pip install poetry

COPY poetry.lock pyproject.toml ./

RUN poetry config virtualenvs.in-project true && \
    poetry install --no-root --no-dev --no-interaction --no-ansi


# --- Production Stage ---
FROM python:3.12-slim

WORKDIR /app

RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser

COPY --from=builder --chown=appuser:appuser /usr/src/app/.venv ./.venv
COPY --chown=appuser:appuser . .

CMD [".venv/bin/fastpubsub", "run", "my_project.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]

Deployment Manifest¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastpubsub-worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fastpubsub-worker
  template:
    metadata:
      labels:
        app: fastpubsub-worker
    spec:
      containers:
      - name: app
        image: my-registry/my-fastpubsub-app:latest
        ports:
        - containerPort: 8000
        env:
        - name: "GCP_PROJECT_ID"
          value: "your-project-id"
        - name: "GOOGLE_APPLICATION_CREDENTIALS"
          value: "/path/to/creds/in/container"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "1"
        livenessProbe:
          httpGet:
            path: /consumers/alive
            port: 8000
          initialDelaySeconds: 15
          periodSeconds: 20
        readinessProbe:
          httpGet:
            path: /consumers/ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 10
      terminationGracePeriodSeconds: 60

Health Probes Explained¶

Probe	Endpoint	Checks
Liveness	`/consumers/alive`	Web server is running, StreamingPull requests are active
Readiness	`/consumers/ready`	Broker has started all consumer tasks, actively polling

If readiness returns 503, the app is starting or has an error. Kubernetes waits before routing traffic.

Graceful Shutdown¶

When Kubernetes sends SIGTERM, FastPubSub:

Signals all consumer tasks to stop polling
Waits for in-flight messages to be cancelled
Exits cleanly

Set terminationGracePeriodSeconds higher than your longest message processing time:

terminationGracePeriodSeconds: 60  # Wait up to 60s

Virtual Machine Deployment¶

Multiple Workers¶

On a single VM, scale with the --workers flag:

fastpubsub run my_app.main:app --host 0.0.0.0 --port 8000 --workers 9

Recommendation: (2 * CPU_CORES) + 1 workers. For a 4-core VM: (2 * 4) + 1 = 9.

systemd Service¶

Run as a persistent service with systemd:

# /etc/systemd/system/fastpubsub.service
[Unit]
Description=FastPubSub Application
After=network.target

[Service]
Type=simple
User=appuser
Group=appuser
WorkingDirectory=/opt/my-app
Environment="GCP_PROJECT_ID=your-project-id"
Environment="GOOGLE_APPLICATION_CREDENTIALS=/opt/my-app/credentials.json"
ExecStart=/opt/my-app/.venv/bin/fastpubsub run my_project.main:app --host 0.0.0.0 --port 8000 --workers 4
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Commands:

sudo systemctl enable fastpubsub
sudo systemctl start fastpubsub
sudo systemctl status fastpubsub

Configuration Best Practices¶

Load from Environment¶

Never hardcode production values:

import os

from fastpubsub import FastPubSub, PubSubBroker

PROJECT_ID = os.environ.get("GCP_PROJECT_ID")
if not PROJECT_ID:
    raise RuntimeError("GCP_PROJECT_ID environment variable not set.")

broker = PubSubBroker(project_id=PROJECT_ID)
app = FastPubSub(broker)

In Kubernetes, inject via ConfigMaps or Secrets.

Logging to stdout¶

FastPubSub logs to stdout and stderr by default. In Kubernetes, container runtime captures these streams, and log aggregators (fluentd, Datadog) forward them to your logging solution.

Enable JSON logging for production:

fastpubsub run app:app --log-serialize

Shutdown Timeout¶

Configure adequate shutdown time for in-flight messages:

broker = PubSubBroker(
    project_id=PROJECT_ID,
    shutdown_timeout=30.0,  # Wait 30s for in-flight messages
)

Hybrid vs. Standalone Apps¶

Type	Description	Health Checks
Hybrid	Has both `@broker.subscriber` handlers and FastAPI endpoints	Built-in subscriber endpoints + your API endpoints
Standalone	Only `@broker.subscriber` handlers	Built-in subscriber endpoints only

Both types use the same deployment strategy.

Common Pitfalls¶

Running with the uvicorn CLI directly (not supported).
Setting too many workers for available memory.
Using a shutdown timeout lower than typical processing time.

Recap¶

Entrypoint: Always use fastpubsub run to start your application
Configuration: Load sensitive values from environment variables
Logging: Log to stdout/stderr, use --log-serialize for JSON
Kubernetes:
- Use --workers 1 in your container
- Scale by increasing replicas
- Use health check endpoints for probes
- Set appropriate terminationGracePeriodSeconds
Virtual Machine:
- Use --workers N based on CPU cores
- Manage with systemd as a long-running service