Skip to content

PITR Backup

CloudNative-PG (CNPG) handles automatic backups for the Postgres addon. Two tiers ship today, mapped from the addon plan:

Plan Backup tier Retention WAL streaming
free none
standard daily 7 days scheduled base backup only
pro pitr 35 days continuous (point-in-time recovery)

The tier is rendered as a spec.backup stanza on the CNPG Cluster CR — barman handles the base/WAL upload to the operator-provided S3 bucket; cert-manager-style operator involvement is zero per tenant.

How it works

sequenceDiagram
  participant U as Operator
  participant CP as Control Plane
  participant K as Kubernetes
  participant CNPG as CNPG operator
  participant Barman as barman-cloud
  participant S3 as OVH S3

  Note over CP: cycle 3 wire-up — for now the spec is built but<br/>create_addon_generic doesn't yet pass it through.
  CP->>CP: build_cluster_spec_full(version, plan, "pitr", tenant, app)
  CP->>K: create Cluster CR with spec.backup stanza
  K->>CNPG: reconciles Cluster
  loop continuous WAL
    CNPG->>Barman: stream WAL segment
    Barman->>S3: upload to s3://paas-pg-backups/{tenant}/{app}/wals/
  end
  loop daily / scheduled
    CNPG->>Barman: trigger base backup
    Barman->>S3: upload to s3://paas-pg-backups/{tenant}/{app}/base/
  end

  U->>CP: POST /v1/apps/{id}/addons/{addon}/restore { timestamp }
  CP->>CP: validate ISO 8601 timestamp
  CP->>CP: audit_repository::log_event("addon.restore_pitr", …)
  CP-->>U: { restore_id, status: "restore_triggered", target: "new_cluster" }
  Note over CP,K: cycle 3 wire-up — calls create_recovery_cluster<br/>which spawns a NEW Cluster CR with spec.bootstrap.recovery

API endpoints

Verb Path Body Notes
GET /v1/apps/{app_id}/addons/{addon_id}/backups barman base + WAL backups, projected from CNPG Backup CRs
POST /v1/apps/{app_id}/addons/{addon_id}/restore { "timestamp": "ISO 8601", "target"?: "new_cluster" } fires addon.restore_pitr audit event, returns restore_id

The pre-AE/44 flat stubs at /v1/addons/{addon_id}/backups and /v1/addons/{addon_id}/restore stay wired for backwards-compat but the dashboard targets the nested paths.

Restore example

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"timestamp":"2026-05-04T15:30:00Z"}' \
  https://runtime.di2amp.com/api/v1/apps/$APP/addons/$ADDON/restore

Response:

{
  "data": {
    "restore_id": "0bd91…",
    "app_id":     "…",
    "addon_id":   "db-acme-app-1",
    "status":     "restore_triggered",
    "timestamp":  "2026-05-04T15:30:00Z",
    "target":     "new_cluster"
  }
}
target Effect
new_cluster (default) spawns a new CNPG Cluster alongside the source via spec.bootstrap.recovery; the operator promotes manually after verification
in_place reserved — not implemented in cycle 2; the validate path accepts it for forward-compat

Backup spec on the Cluster CR

The spec.backup stanza emitted by paas_database::cnpg::backup_config_for_plan(tier, tenant, app):

spec:
  backup:
    barmanObjectStore:
      destinationPath: "s3://paas-pg-backups/{tenant}/{app}"
      endpointURL:     "https://s3.gra.io.cloud.ovh.net"
      s3Credentials:
        accessKeyId:     { name: ovh-s3-creds, key: AWS_ACCESS_KEY_ID }
        secretAccessKey: { name: ovh-s3-creds, key: AWS_SECRET_ACCESS_KEY }
      wal:
        compression:    gzip
        archivingMode:  continuous   # only on `pitr`
    retentionPolicy: 35d              # 7d on daily, 35d on pitr

Naming pinned by tests: - destination path is per-tenant per-app (no shared bucket prefix across tenants), - credentials live in a single ovh-s3-creds Secret managed by the platform operator (NOT auto-provisioned per tenant), - compression is always gzip.

Recovery cluster spec

The recovery path emits a Cluster with spec.bootstrap.recovery.recoveryTarget.targetTime and an externalClusters[] entry pointing at the source's S3 archive:

spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16
  storage:   { size: 50Gi }
  bootstrap:
    recovery:
      source: db-acme-app-1
      recoveryTarget:
        targetTime: "2026-05-04T15:30:00Z"
  externalClusters:
    - name: db-acme-app-1
      barmanObjectStore:
        destinationPath: "s3://paas-pg-backups/db-acme-app-1"
        endpointURL:     "https://s3.gra.io.cloud.ovh.net"
        s3Credentials:   

The new Cluster runs alongside the source — destructive in-place restore is intentionally not supported in cycle 2 (a botched recovery should never destroy live data).

Audit trail

Every POST /restore writes a row to paas_audit_events (action addon.restore_pitr) with details_jsonb carrying the app_id, addon_id, and timestamp. The fire-and-forget call doesn't block the response — a transient DB hiccup never fails the restore start.

SELECT created_at, target_id, details_jsonb
FROM paas_audit_events
WHERE action = 'addon.restore_pitr'
ORDER BY created_at DESC
LIMIT 10;

Operator setup

Backups need an S3 bucket and a Kubernetes Secret with the OVH credentials. One-time runbook (the platform doesn't auto-provision the bucket — operator-owned):

# 1. Create the OVH S3 bucket (or any S3-compatible endpoint).
ovhai bucket create paas-pg-backups

# 2. Generate AWS-flavored credentials.
ovhai credentials create --bucket paas-pg-backups

# 3. Plant the Secret in the addon namespace (paas-apps).
kubectl create secret generic ovh-s3-creds \
  --namespace paas-apps \
  --from-literal=AWS_ACCESS_KEY_ID= \
  --from-literal=AWS_SECRET_ACCESS_KEY=

Until step 3 is done, CNPG will emit BackupFailed events on every scheduled run — the dashboard's backup list shows status: "failed" and the operator gets an actionable error.

Failure modes

Symptom Likely cause Recovery
status: "failed" on every backup ovh-s3-creds Secret missing or wrong re-run the operator runbook above
Restore returns restore_id but no Cluster appears wire-up not landed yet (cycle 3) expected — the audit row is the proof the request was logged
BackupFailed: cannot reach endpoint network policy denies egress to S3 add an allow-https-egress exception for the S3 endpoint CIDR

Phase 2

  • Wire build_cluster_spec_full into create_addon_generic — cycle 3 (44f) routes the addon plan's backup field through to the Cluster CR so a pro plan actually emits the backup stanza.
  • Wire create_recovery_cluster into restore_addon_pitr — cycle 3 calls the kube helper after the audit log fires.
  • In-place restore (target: "in_place") — destructive but faster; lands when the platform has a "blue-green" deploy story for addons.
  • Postgres Addon — the lifecycle this hangs off
  • Add-ons — the unified catalogue
  • crates/database/src/cnpg.rs::backup_config_for_plan — the pure builder pinned by 4 tests