Skip to content

Rolling Deploy

PaaS Runtime ships every deploy as a Kubernetes RollingUpdate with maxSurge: 1 and maxUnavailable: 0 — one extra pod comes up before the old one is torn down, so single-replica apps keep serving traffic through the roll.

How it works

sequenceDiagram
  participant CP as Control Plane
  participant K as Kubernetes
  participant LB as Service / Traefik
  participant U as User

  CP->>K: apply Deployment (image vN)
  K->>K: maxSurge +1 → spawn pod vN
  K->>K: readinessProbe GET / on :8080
  Note over K: pod vN ready → endpoints add
  LB->>U: route traffic to vN AND vN-1
  K->>K: SIGTERM pod vN-1
  K->>K: preStop sleep 5s (drain LB)
  Note over K: terminationGracePeriod 30s
  K->>K: pod vN-1 evicted, endpoints remove
  LB->>U: route traffic to vN only

A paas.io/deploy-id={uuid} label is stamped on the pod template at roll time, so operators can grep logs for the exact roll under investigation:

kubectl logs -n paas-system -l paas.io/deploy-id=abc-123-def --tail=200

Zero-downtime guarantee

Knob Value Effect
maxSurge 1 Spawn one extra pod before tearing down the old one
maxUnavailable 0 Never drop below the desired replica count
readinessProbe GET / :8080 (5s/10s/3 fails) LB only sends traffic when the new pod actually serves
preStop sleep 5 Existing connections drain before SIGTERM
terminationGracePeriodSeconds 30 App has 25s of graceful shutdown after the preStop sleep returns

Together these make a single-replica deploy survive a roll without dropping a single in-flight request — the LB sees the new pod ready before it removes the old one from its endpoint set.

Configure via paas.toml

Default values match the podspec.rs hard-coded strategy, so an empty [deploy] section is a no-op:

[deploy]
surge = 1
max_unavailable = 0
timeout = 600                   # progress deadline, seconds
readiness_probe_path = "/healthz"
Field Default Meaning
surge 1 maxSurge for the RollingUpdate
max_unavailable 0 maxUnavailable
timeout 600 Seconds before the rollout is marked Timeout
readiness_probe_path "/" HTTP path the readiness probe hits

A higher surge (2 or more) speeds up multi-replica rolls but uses more headroom on the node — keep it 1 unless you have spare capacity.

Rollout status API

Polled by the dashboard's RolloutProgress component every 3s while in flight, then stops as soon as the status is terminal:

curl -sf -H "Authorization: Bearer $TOKEN" \
  https://runtime.di2amp.com/api/v1/apps/$APP_ID/deploys/latest \
  | jq '{ id, status, rollout_status, replicas }'

Sample response while a roll is in progress:

{
  "id":  "deploy-7b2c…",
  "status": "deploying",
  "rollout_status": "InProgress",
  "replicas": { "desired": 3, "ready": 1, "available": 1 }
}
status Meaning
queued Deploy created, waiting for build to finish
building Tekton PipelineRun in progress
deploying Image pushed, RollingUpdate in flight
ready All pods ready, rollout complete
failed Build failed, image push failed, or rollout failed
timeout Rollout still in progress after timeout seconds

Rollout timeout

When the rollout doesn't reach Completed within paas.toml [deploy] timeout seconds (default 600s), the control plane marks the deploy timeout and stops polling. The pods themselves keep going — operator can inspect the K8s events (kubectl describe deployment …) to see whether it was a stuck image pull, a failing readiness probe, or resource pressure.

A future Phase 2 enhancement may auto-rollback on timeout; today the operator runs the rollback explicitly.

Filter pods by deploy

The paas.io/deploy-id label is the cleanest cross-section for diagnostics:

# All pods of a specific roll
kubectl get pod -l paas.io/deploy-id=$DEPLOY_ID

# Logs of every container of every pod in this roll
kubectl logs -l paas.io/deploy-id=$DEPLOY_ID --all-containers --tail=500

# Compare CPU between the new and old roll mid-flight
kubectl top pod -l paas.io/app=$APP_ID --no-headers \
  | sort -k2 -h