TLS Auto¶
Every custom domain attached to an app gets an HTTPS certificate
automatically. There's no operator step beyond pointing DNS at the
platform — cert-manager runs the ACME HTTP-01 challenge against
Let's Encrypt, traefik picks up the issued secret, and the next
TLS handshake on https://your-domain serves the new certificate.
How it works¶
sequenceDiagram
participant CP as Control Plane
participant T as Traffic Service
participant CM as cert-manager
participant LE as Let's Encrypt (ACMEv2)
participant K as Kubernetes
participant Tr as Traefik
CP->>T: add_custom_domain(app, domain)
T->>K: patch IngressRoute (add host)
T->>K: ensure_certificate(name, dnsNames=[domain], secret=tls-…)
K->>CM: Certificate CR (issuerRef=letsencrypt-prod)
CM->>LE: order, HTTP-01 challenge
LE->>Tr: GET /.well-known/acme-challenge/<token>
Tr-->>LE: <validation>
LE-->>CM: certificate issued
CM->>K: Secret tls-… (cert + key)
Note over Tr: traefik watches Secret, hot-reloads TLS
CP->>CM: get Certificate.status.conditions[Ready]
CP->>CP: parse_cert_status → "issued"
CP-->>operator: GET /domains/{domain} → tls_status="issued"
ClusterIssuer¶
deploy/k8s/cert-manager-clusterissuer-prod.yaml registers a
ClusterIssuer named letsencrypt-prod, and the platform code
hard-codes that name in
crates/traffic/src/tls.rs::CertificateSpec. Switching to a
different issuer (staging, internal CA, DNS-01 for wildcards)
means applying a new ClusterIssuer and tweaking that one constant.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ops@di2amp.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: traefik
LIVE check after kubectl apply:
$ kubectl get clusterissuer letsencrypt-prod \
-o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
True
Certificate CR¶
tls::ensure_certificate(client, ns, name, dns_names, secret)
is idempotent — the custom-domain flow can call it on every
POST /domains without 409-AlreadyExists. Existing rows get
their dnsNames merged in via strategic-merge-patch with field
manager paas-control-plane so ownership is visible:
$ kubectl get certificate -n paas-apps cert-app-123-blog-example-com -o yaml
…
metadata:
managedFields:
- manager: paas-control-plane
operation: Apply
…
spec:
secretName: tls-app-123-blog-example-com
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- blog.example.com
Naming convention (pinned by tests in tls.rs):
| Resource | Pattern |
|---|---|
| Certificate name | cert-{app_id}-{domain.replace('.', '-')} |
| Secret name | tls-{app_id}-{domain.replace('.', '-')} |
Status vocabulary¶
tls::parse_cert_status projects cert-manager's
status.conditions[type=Ready] into a small operator-facing
vocabulary the dashboard renders:
| cert-manager condition | tls_status |
|---|---|
Ready=True |
issued |
Ready=False |
failed |
| missing / not yet | pending |
Surfaced via:
$ curl -H "Authorization: Bearer $TOKEN" \
https://runtime.di2amp.com/api/v1/apps/$APP/domains/$DOMAIN
{ "data": { "domain": "...", "tls_status": "issued", "dns_configured": true, … } }
Renewal¶
cert-manager renews automatically 30 days before expiration.
Operators don't touch this — the new Secret is hot-reloaded by
traefik on the next handshake without restarting any pod. A
recurring tls_status: "renewing" value is not surfaced today
(there's no distinct cert-manager condition for renewal-in-flight);
the row just stays issued while a fresh cert quietly replaces
the old one.
Wildcard / DNS-01 (Phase 2)¶
HTTP-01 only validates exact hostnames — wildcard certs
(*.example.com) need DNS-01. deploy/k8s/cert-manager-clusterissuer-prod-dns.yaml
ships as a placeholder for an OVH-DNS DNS-01 ClusterIssuer; activation
needs the OVH API credentials in a Kubernetes Secret. See the
operator runbook in
bilans/ad32-summary.md for the bring-up steps.
Failures and recovery¶
| Symptom | Likely cause | Recovery |
|---|---|---|
tls_status: "pending" for >5 min |
DNS not propagated | check CNAME with dig, then re-call POST /verify |
tls_status: "failed" |
rate-limit, invalid domain, HTTP-01 challenge couldn't reach traefik | kubectl describe certificate ..., fix the underlying issue, then kubectl delete certificate ... (cert-manager re-creates) |
tls_status: "issued" but browser still shows old cert |
traefik / HSTS cache | give traefik 30s to pick up the new Secret |
Related¶
- Custom Domain — the lifecycle the TLS path hangs off (POST /domains → ensure_certificate → verify)
- Apps — the auto-subdomain
app-{id}.runtime.di2amp.comuses the same ClusterIssuer crates/traffic/src/tls.rs— Rust implementation of all of the above, with theparse_cert_statusprojection contract pinned by 6 tests (cycle 1)