PITR Backup¶

CloudNative-PG (CNPG) handles automatic backups for the Postgres addon. Two tiers ship today, mapped from the addon plan:

Plan	Backup tier	Retention	WAL streaming
`free`	`none`	—	—
`standard`	`daily`	7 days	scheduled base backup only
`pro`	`pitr`	35 days	continuous (point-in-time recovery)

The tier is rendered as a spec.backup stanza on the CNPG Cluster CR — barman handles the base/WAL upload to the operator-provided S3 bucket; cert-manager-style operator involvement is zero per tenant.

How it works¶

sequenceDiagram
  participant U as Operator
  participant CP as Control Plane
  participant K as Kubernetes
  participant CNPG as CNPG operator
  participant Barman as barman-cloud
  participant S3 as OVH S3

  Note over CP: cycle 3 wire-up — for now the spec is built but<br/>create_addon_generic doesn't yet pass it through.
  CP->>CP: build_cluster_spec_full(version, plan, "pitr", tenant, app)
  CP->>K: create Cluster CR with spec.backup stanza
  K->>CNPG: reconciles Cluster
  loop continuous WAL
    CNPG->>Barman: stream WAL segment
    Barman->>S3: upload to s3://paas-pg-backups/{tenant}/{app}/wals/
  end
  loop daily / scheduled
    CNPG->>Barman: trigger base backup
    Barman->>S3: upload to s3://paas-pg-backups/{tenant}/{app}/base/
  end

  U->>CP: POST /v1/apps/{id}/addons/{addon}/restore { timestamp }
  CP->>CP: validate ISO 8601 timestamp
  CP->>CP: audit_repository::log_event("addon.restore_pitr", …)
  CP-->>U: { restore_id, status: "restore_triggered", target: "new_cluster" }
  Note over CP,K: cycle 3 wire-up — calls create_recovery_cluster<br/>which spawns a NEW Cluster CR with spec.bootstrap.recovery

API endpoints¶

Verb	Path	Body	Notes
GET	`/v1/apps/{app_id}/addons/{addon_id}/backups`	—	barman base + WAL backups, projected from CNPG `Backup` CRs
POST	`/v1/apps/{app_id}/addons/{addon_id}/restore`	`{ "timestamp": "ISO 8601", "target"?: "new_cluster" }`	fires `addon.restore_pitr` audit event, returns `restore_id`

The pre-AE/44 flat stubs at /v1/addons/{addon_id}/backups and /v1/addons/{addon_id}/restore stay wired for backwards-compat but the dashboard targets the nested paths.

Restore example¶

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"timestamp":"2026-05-04T15:30:00Z"}' \
  https://runtime.di2amp.com/api/v1/apps/$APP/addons/$ADDON/restore

Response:

{
  "data": {
    "restore_id": "0bd91…",
    "app_id":     "…",
    "addon_id":   "db-acme-app-1",
    "status":     "restore_triggered",
    "timestamp":  "2026-05-04T15:30:00Z",
    "target":     "new_cluster"
  }
}

target	Effect
`new_cluster` (default)	spawns a new CNPG Cluster alongside the source via `spec.bootstrap.recovery`; the operator promotes manually after verification
`in_place`	reserved — not implemented in cycle 2; the validate path accepts it for forward-compat

Backup spec on the Cluster CR¶

The spec.backup stanza emitted by paas_database::cnpg::backup_config_for_plan(tier, tenant, app):

spec:
  backup:
    barmanObjectStore:
      destinationPath: "s3://paas-pg-backups/{tenant}/{app}"
      endpointURL:     "https://s3.gra.io.cloud.ovh.net"
      s3Credentials:
        accessKeyId:     { name: ovh-s3-creds, key: AWS_ACCESS_KEY_ID }
        secretAccessKey: { name: ovh-s3-creds, key: AWS_SECRET_ACCESS_KEY }
      wal:
        compression:    gzip
        archivingMode:  continuous   # only on `pitr`
    retentionPolicy: 35d              # 7d on daily, 35d on pitr

Naming pinned by tests: - destination path is per-tenant per-app (no shared bucket prefix across tenants), - credentials live in a single ovh-s3-creds Secret managed by the platform operator (NOT auto-provisioned per tenant), - compression is always gzip.

Recovery cluster spec¶

The recovery path emits a Cluster with spec.bootstrap.recovery.recoveryTarget.targetTime and an externalClusters[] entry pointing at the source's S3 archive:

spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16
  storage:   { size: 50Gi }
  bootstrap:
    recovery:
      source: db-acme-app-1
      recoveryTarget:
        targetTime: "2026-05-04T15:30:00Z"
  externalClusters:
    - name: db-acme-app-1
      barmanObjectStore:
        destinationPath: "s3://paas-pg-backups/db-acme-app-1"
        endpointURL:     "https://s3.gra.io.cloud.ovh.net"
        s3Credentials:   …

The new Cluster runs alongside the source — destructive in-place restore is intentionally not supported in cycle 2 (a botched recovery should never destroy live data).

Audit trail¶

Every POST /restore writes a row to paas_audit_events (action addon.restore_pitr) with details_jsonb carrying the app_id, addon_id, and timestamp. The fire-and-forget call doesn't block the response — a transient DB hiccup never fails the restore start.

SELECT created_at, target_id, details_jsonb
FROM paas_audit_events
WHERE action = 'addon.restore_pitr'
ORDER BY created_at DESC
LIMIT 10;

Operator setup¶

Backups need an S3 bucket and a Kubernetes Secret with the OVH credentials. One-time runbook (the platform doesn't auto-provision the bucket — operator-owned):

# 1. Create the OVH S3 bucket (or any S3-compatible endpoint).
ovhai bucket create paas-pg-backups

# 2. Generate AWS-flavored credentials.
ovhai credentials create --bucket paas-pg-backups

# 3. Plant the Secret in the addon namespace (paas-apps).
kubectl create secret generic ovh-s3-creds \
  --namespace paas-apps \
  --from-literal=AWS_ACCESS_KEY_ID=… \
  --from-literal=AWS_SECRET_ACCESS_KEY=…

Until step 3 is done, CNPG will emit BackupFailed events on every scheduled run — the dashboard's backup list shows status: "failed" and the operator gets an actionable error.

Failure modes¶

Symptom	Likely cause	Recovery
`status: "failed"` on every backup	`ovh-s3-creds` Secret missing or wrong	re-run the operator runbook above
Restore returns `restore_id` but no Cluster appears	wire-up not landed yet (cycle 3)	expected — the audit row is the proof the request was logged
`BackupFailed: cannot reach endpoint`	network policy denies egress to S3	add an `allow-https-egress` exception for the S3 endpoint CIDR

Phase 2¶

Wire build_cluster_spec_full into create_addon_generic — cycle 3 (44f) routes the addon plan's backup field through to the Cluster CR so a pro plan actually emits the backup stanza.
Wire create_recovery_cluster into restore_addon_pitr — cycle 3 calls the kube helper after the audit log fires.
In-place restore (target: "in_place") — destructive but faster; lands when the platform has a "blue-green" deploy story for addons.

Postgres Addon — the lifecycle this hangs off
Add-ons — the unified catalogue
crates/database/src/cnpg.rs::backup_config_for_plan — the pure builder pinned by 4 tests