Postgres Read Replicas¶
paas's managed Postgres addon (CNPG) supports asynchronous read replicas to offload read-heavy traffic from the writable primary. Cycle 2 ships the API + K8s wiring; the dashboard's "Read replicas" pane and the connection-pooling guides land in this page.
At a glance¶
| Capability | Default | Where to configure |
|---|---|---|
| Backing operator | CNPG (postgresql.cnpg.io/v1::Cluster) |
inherited from AD/39 postgres addon |
| Replica topology | 1 primary + N standbys (async streaming) | read_replicas field |
| Connection routing | DATABASE_URL (writes) / DATABASE_URL_RO (reads) |
dual env vars injected via Secrets |
| Replication mode | Async streaming via WAL | CNPG default |
| Plans supporting replicas | business (1 standby) / enterprise (≥1 standby) |
enforced by validate_plan_supports_read_replicas |
| Plans NOT supporting replicas | starter (single-instance) |
rejected with HTTP 400 |
CNPG palette is starter / business / enterprise
NOT free / standard / pro. The latter is the
AddonRequest display palette (consumed by the create
endpoint validator). The CNPG-side palette is what
app_addons.plan carries forward into
validate_plan_supports_read_replicas. Only business
and enterprise ship at least one standby instance — the
plans where read replicas are physically possible.
Lifecycle¶
flowchart LR
A["PATCH /v1/apps/{id}/addons/{addon_id}<br/>{read_replicas: 2}"]
A --> B["addons.rs::patch_addon"]
B --> V{"plan ∈ business/enterprise?"}
V -- no --> X["400 Bad Request"]
V -- yes --> C["cnpg::ensure_cnpg_cluster_with_replicas<br/>instances = max(plan.instances, 1+N)"]
C --> D["CNPG Operator scales StatefulSet<br/>+ emits {cluster}-ro Service"]
D --> E["cnpg::ensure_database_url_ro_secret<br/>app-{app}-database-url-ro"]
E --> F["state.config.set_config_vars<br/>DATABASE_URL_RO_SECRET = ..."]
F --> G["Next deploy: pod envFrom both<br/>DATABASE_URL + DATABASE_URL_RO"]
Plans¶
| CNPG plan | Instances | Standby slots | Replica capacity |
|---|---|---|---|
starter |
1 | 0 | 0 (rejected) |
business |
2 | 1 | 1 (no scale-up; same pod doubles as standby + replica target) |
enterprise |
3 | 2 | up to 2 (or more — ensure_cnpg_cluster_with_replicas scales out: instances = max(plan.instances, 1+N)) |
The max(...) invariant means the plan's HA floor wins:
an enterprise cluster (3 instances) with read_replicas: 0
keeps its 3 instances. No downscale below the plan's HA floor —
it would silently break the plan's availability guarantees.
API surface¶
PATCH /v1/apps/{app_id}/addons/{addon_id}
Body: { "read_replicas": 2 }
→ 200 OK
{
"data": {
"app_id": "...",
"addon_id": "...",
"read_replicas": 2,
"total_instances": 3,
"status": "updating"
}
}
→ 400 Bad Request
{ "error": { "code": "bad_request",
"message": "read_replicas requires plan=business or enterprise (CNPG palette, got plan=starter)" } }
GET /v1/apps/{app_id}/addons/{addon_id}/metrics
→ 200 OK
{
"data": {
"app_id": "...",
"addon_id": "...",
"replication_lag_seconds": null,
"note": "cycle-2-stub: real Prometheus scrape wired in cycle 3 via parse_replication_lag"
}
}
The replication_lag_seconds is null in cycle 2 — the
endpoint shape is final but the live Prometheus scrape lands
in cycle 3 via paas_database::cnpg::parse_replication_lag.
Connection routing in your application¶
The Postgres addon now injects two environment variables into the app pod:
DATABASE_URL— points at{cluster}-rw(writable primary). Use forINSERT/UPDATE/DELETE.DATABASE_URL_RO— points at{cluster}-ro(CNPG-managed round-robin Service across the standbys). Use forSELECTon data that tolerates async replication lag.
Standby write attempts are rejected at the Postgres protocol level — even if your client misconfigures the URL, write queries get an explicit error rather than silent corruption.
Django¶
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'OPTIONS': dict(host=parsed_url.hostname, ...),
...
},
'replica': {
'ENGINE': 'django.db.backends.postgresql',
'OPTIONS': dict(host=parsed_ro.hostname, ...),
...
},
}
DATABASE_ROUTERS = ['myapp.routers.PrimaryReplicaRouter']
# myapp/routers.py
class PrimaryReplicaRouter:
def db_for_read(self, model, **hints):
return 'replica'
def db_for_write(self, model, **hints):
return 'default'
SQLAlchemy¶
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
write_engine = create_engine(os.environ['DATABASE_URL'])
read_engine = create_engine(os.environ['DATABASE_URL_RO'])
WriteSession = sessionmaker(bind=write_engine)
ReadSession = sessionmaker(bind=read_engine)
# In your view / handler:
def list_users():
with ReadSession() as s:
return s.query(User).all()
Node.js (pg-pool)¶
const { Pool } = require('pg');
const writePool = new Pool({ connectionString: process.env.DATABASE_URL });
const readPool = new Pool({ connectionString: process.env.DATABASE_URL_RO });
// Use readPool for SELECT-only paths, writePool for everything else.
Rust (sqlx)¶
let write_pool = sqlx::PgPool::connect(&std::env::var("DATABASE_URL")?).await?;
let read_pool = sqlx::PgPool::connect(&std::env::var("DATABASE_URL_RO")?).await?;
Replication lag — what it means¶
CNPG runs asynchronous streaming replication. Standbys
apply WAL records emitted by the primary with some delay —
typically sub-second on a healthy network, but a slow standby
or a write spike can push lag into seconds. Read-only queries
hitting DATABASE_URL_RO may see slightly stale data.
If your code path requires read-your-own-writes consistency
(e.g. "user just submitted a form, immediately load the
created row"), use DATABASE_URL (the writable primary) for
that path. Async replication is not a replacement for strong
consistency — it's a load-spreading optimisation for
read-heavy paths.
Limits (cycle 2)¶
-
Downscale: setting
read_replicasbelow the current count is out of scope cycle 2 — the wire accepts the request but the K8sPatch::Applywill trigger a rolling-down of standbys without a pre-flight safety check (no warning if the standby is the only one). Cycle 3 adds a--forceflag plus a confirmation prompt in the dashboard. -
replication_lag_seconds:nullin cycle 2 (the metrics endpoint shape is final; the scrape wire lands cycle 3 viaparse_replication_lag). -
Cross-AZ routing: the
-roService is round-robin across all standbys in the cluster (no AZ affinity). Cycle 3+ may add aclosest-standbymode if cross-AZ latency becomes a real issue. -
Connection pooling (PgBouncer): out of scope. Tenants who need tight pooling can run their own sidecar.
FAQ¶
Q: Does enterprise automatically have read replicas?
A: Yes — enterprise ships 3 instances out of the box (1
primary + 2 standbys). The standbys are queryable via
DATABASE_URL_RO from day one without a PATCH. The
PATCH read_replicas flow is for scaling beyond the
plan's default count (e.g. enterprise + read_replicas: 5 →
6 instances total).
Q: Can business have 2 read replicas?
A: Yes — PATCH read_replicas: 2 on business sets
instances = max(2, 1+2) = 3, so the cluster scales from 2
to 3 instances (1 primary + 2 standbys).
Q: What happens to DATABASE_URL_RO if I disable read
replicas (read_replicas: 0)?
A: The Secret stays in place (the -ro Service still exists
on business/enterprise because the plan ships ≥ 1
standby). The URL keeps working but with reduced redundancy.
Q: How do I detect replication lag programmatically?
A: Cycle 2 ships the endpoint shape only. Cycle 3 will return
the live replication_lag_seconds from the CNPG sidecar
metrics. Until then, use pg_stat_replication.lag_seconds
directly via a psql connection to the primary if you need
the value urgently.
Implementation pointers¶
| Concern | File |
|---|---|
AddonConfig read_replicas field |
crates/build/src/paas_toml.rs::AddonConfig |
| CNPG palette predicate | crates/database/src/cnpg.rs::plan_supports_read_replicas |
| Instance count computation | crates/database/src/cnpg.rs::cnpg_instances_for_replicas |
| Cluster Patch::Apply wrapper | crates/database/src/cnpg.rs::ensure_cnpg_cluster_with_replicas |
| RO URL formatter | crates/database/src/cnpg.rs::database_url_ro |
| RO URL derive from RW | crates/database/src/cnpg.rs::database_url_ro_from_rw |
| RO Secret materialiser | crates/database/src/cnpg.rs::ensure_database_url_ro_secret |
| Replication lag parser | crates/database/src/cnpg.rs::parse_replication_lag |
| PATCH route handler | crates/control-plane/src/routes/addons.rs::patch_addon |
| Validator | crates/control-plane/src/routes/addons.rs::validate_plan_supports_read_replicas |
| Metrics endpoint | crates/control-plane/src/routes/addons.rs::get_addon_replication_metrics |
Related concepts¶
- Postgres Addon — base CNPG addon flow this extends.
- PITR Backup — sister CNPG capability (continuous WAL archiving).
- Add-ons — umbrella addon flow.