Skip to content

Postgres Read Replicas

paas's managed Postgres addon (CNPG) supports asynchronous read replicas to offload read-heavy traffic from the writable primary. Cycle 2 ships the API + K8s wiring; the dashboard's "Read replicas" pane and the connection-pooling guides land in this page.

At a glance

Capability Default Where to configure
Backing operator CNPG (postgresql.cnpg.io/v1::Cluster) inherited from AD/39 postgres addon
Replica topology 1 primary + N standbys (async streaming) read_replicas field
Connection routing DATABASE_URL (writes) / DATABASE_URL_RO (reads) dual env vars injected via Secrets
Replication mode Async streaming via WAL CNPG default
Plans supporting replicas business (1 standby) / enterprise (≥1 standby) enforced by validate_plan_supports_read_replicas
Plans NOT supporting replicas starter (single-instance) rejected with HTTP 400

CNPG palette is starter / business / enterprise

NOT free / standard / pro. The latter is the AddonRequest display palette (consumed by the create endpoint validator). The CNPG-side palette is what app_addons.plan carries forward into validate_plan_supports_read_replicas. Only business and enterprise ship at least one standby instance — the plans where read replicas are physically possible.

Lifecycle

flowchart LR
    A["PATCH /v1/apps/{id}/addons/{addon_id}<br/>{read_replicas: 2}"]
    A --> B["addons.rs::patch_addon"]
    B --> V{"plan ∈ business/enterprise?"}
    V -- no --> X["400 Bad Request"]
    V -- yes --> C["cnpg::ensure_cnpg_cluster_with_replicas<br/>instances = max(plan.instances, 1+N)"]
    C --> D["CNPG Operator scales StatefulSet<br/>+ emits {cluster}-ro Service"]
    D --> E["cnpg::ensure_database_url_ro_secret<br/>app-{app}-database-url-ro"]
    E --> F["state.config.set_config_vars<br/>DATABASE_URL_RO_SECRET = ..."]
    F --> G["Next deploy: pod envFrom both<br/>DATABASE_URL + DATABASE_URL_RO"]

Plans

CNPG plan Instances Standby slots Replica capacity
starter 1 0 0 (rejected)
business 2 1 1 (no scale-up; same pod doubles as standby + replica target)
enterprise 3 2 up to 2 (or more — ensure_cnpg_cluster_with_replicas scales out: instances = max(plan.instances, 1+N))

The max(...) invariant means the plan's HA floor wins: an enterprise cluster (3 instances) with read_replicas: 0 keeps its 3 instances. No downscale below the plan's HA floor — it would silently break the plan's availability guarantees.

API surface

PATCH /v1/apps/{app_id}/addons/{addon_id}
Body: { "read_replicas": 2 }

→ 200 OK
{
  "data": {
    "app_id": "...",
    "addon_id": "...",
    "read_replicas": 2,
    "total_instances": 3,
    "status": "updating"
  }
}

→ 400 Bad Request
{ "error": { "code": "bad_request",
             "message": "read_replicas requires plan=business or enterprise (CNPG palette, got plan=starter)" } }
GET /v1/apps/{app_id}/addons/{addon_id}/metrics

→ 200 OK
{
  "data": {
    "app_id": "...",
    "addon_id": "...",
    "replication_lag_seconds": null,
    "note": "cycle-2-stub: real Prometheus scrape wired in cycle 3 via parse_replication_lag"
  }
}

The replication_lag_seconds is null in cycle 2 — the endpoint shape is final but the live Prometheus scrape lands in cycle 3 via paas_database::cnpg::parse_replication_lag.

Connection routing in your application

The Postgres addon now injects two environment variables into the app pod:

  • DATABASE_URL — points at {cluster}-rw (writable primary). Use for INSERT / UPDATE / DELETE.
  • DATABASE_URL_RO — points at {cluster}-ro (CNPG-managed round-robin Service across the standbys). Use for SELECT on data that tolerates async replication lag.

Standby write attempts are rejected at the Postgres protocol level — even if your client misconfigures the URL, write queries get an explicit error rather than silent corruption.

Django

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'OPTIONS': dict(host=parsed_url.hostname, ...),
        ...
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'OPTIONS': dict(host=parsed_ro.hostname, ...),
        ...
    },
}

DATABASE_ROUTERS = ['myapp.routers.PrimaryReplicaRouter']

# myapp/routers.py
class PrimaryReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'
    def db_for_write(self, model, **hints):
        return 'default'

SQLAlchemy

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

write_engine = create_engine(os.environ['DATABASE_URL'])
read_engine = create_engine(os.environ['DATABASE_URL_RO'])

WriteSession = sessionmaker(bind=write_engine)
ReadSession = sessionmaker(bind=read_engine)

# In your view / handler:
def list_users():
    with ReadSession() as s:
        return s.query(User).all()

Node.js (pg-pool)

const { Pool } = require('pg');

const writePool = new Pool({ connectionString: process.env.DATABASE_URL });
const readPool  = new Pool({ connectionString: process.env.DATABASE_URL_RO });

// Use readPool for SELECT-only paths, writePool for everything else.

Rust (sqlx)

let write_pool = sqlx::PgPool::connect(&std::env::var("DATABASE_URL")?).await?;
let read_pool  = sqlx::PgPool::connect(&std::env::var("DATABASE_URL_RO")?).await?;

Replication lag — what it means

CNPG runs asynchronous streaming replication. Standbys apply WAL records emitted by the primary with some delay — typically sub-second on a healthy network, but a slow standby or a write spike can push lag into seconds. Read-only queries hitting DATABASE_URL_RO may see slightly stale data.

If your code path requires read-your-own-writes consistency (e.g. "user just submitted a form, immediately load the created row"), use DATABASE_URL (the writable primary) for that path. Async replication is not a replacement for strong consistency — it's a load-spreading optimisation for read-heavy paths.

Limits (cycle 2)

  • Downscale: setting read_replicas below the current count is out of scope cycle 2 — the wire accepts the request but the K8s Patch::Apply will trigger a rolling-down of standbys without a pre-flight safety check (no warning if the standby is the only one). Cycle 3 adds a --force flag plus a confirmation prompt in the dashboard.

  • replication_lag_seconds: null in cycle 2 (the metrics endpoint shape is final; the scrape wire lands cycle 3 via parse_replication_lag).

  • Cross-AZ routing: the -ro Service is round-robin across all standbys in the cluster (no AZ affinity). Cycle 3+ may add a closest-standby mode if cross-AZ latency becomes a real issue.

  • Connection pooling (PgBouncer): out of scope. Tenants who need tight pooling can run their own sidecar.

FAQ

Q: Does enterprise automatically have read replicas? A: Yes — enterprise ships 3 instances out of the box (1 primary + 2 standbys). The standbys are queryable via DATABASE_URL_RO from day one without a PATCH. The PATCH read_replicas flow is for scaling beyond the plan's default count (e.g. enterprise + read_replicas: 5 → 6 instances total).

Q: Can business have 2 read replicas? A: Yes — PATCH read_replicas: 2 on business sets instances = max(2, 1+2) = 3, so the cluster scales from 2 to 3 instances (1 primary + 2 standbys).

Q: What happens to DATABASE_URL_RO if I disable read replicas (read_replicas: 0)? A: The Secret stays in place (the -ro Service still exists on business/enterprise because the plan ships ≥ 1 standby). The URL keeps working but with reduced redundancy.

Q: How do I detect replication lag programmatically? A: Cycle 2 ships the endpoint shape only. Cycle 3 will return the live replication_lag_seconds from the CNPG sidecar metrics. Until then, use pg_stat_replication.lag_seconds directly via a psql connection to the primary if you need the value urgently.

Implementation pointers

Concern File
AddonConfig read_replicas field crates/build/src/paas_toml.rs::AddonConfig
CNPG palette predicate crates/database/src/cnpg.rs::plan_supports_read_replicas
Instance count computation crates/database/src/cnpg.rs::cnpg_instances_for_replicas
Cluster Patch::Apply wrapper crates/database/src/cnpg.rs::ensure_cnpg_cluster_with_replicas
RO URL formatter crates/database/src/cnpg.rs::database_url_ro
RO URL derive from RW crates/database/src/cnpg.rs::database_url_ro_from_rw
RO Secret materialiser crates/database/src/cnpg.rs::ensure_database_url_ro_secret
Replication lag parser crates/database/src/cnpg.rs::parse_replication_lag
PATCH route handler crates/control-plane/src/routes/addons.rs::patch_addon
Validator crates/control-plane/src/routes/addons.rs::validate_plan_supports_read_replicas
Metrics endpoint crates/control-plane/src/routes/addons.rs::get_addon_replication_metrics