Production Guides
Graceful Shutdown
Summary
GoFr listens for SIGINT and SIGTERM and, on either signal, runs App.Shutdown which calls Shutdown on the HTTP, gRPC, and metrics servers and Close on the container's datasource connections. The shutdown is bounded by SHUTDOWN_GRACE_PERIOD (default 30s); if it expires the process exits with whatever connections remain. Pair this with Kubernetes' terminationGracePeriodSeconds and a small preStop sleep to avoid losing in-flight requests during rolling restarts.
When to use
Every production GoFr deployment on Kubernetes should be configured for graceful shutdown. Without it, rolling updates and node drains return 502/504s for any request that is mid-flight when a pod is terminated, and Pub/Sub consumers can lose un-committed messages.
How GoFr handles signals
App.Run sets up a signal-aware context:
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGINT, syscall.SIGTERM)
When that context is canceled, a goroutine creates a timeout context using SHUTDOWN_GRACE_PERIOD (default 30s) and calls App.Shutdown. The order is fixed by the framework — see pkg/gofr/gofr.go:96-114 — and Shutdown joins errors from each step:
httpServer.Shutdown(ctx)— stops accepting new connections, waits for in-flight handlersgrpcServer.Shutdown(ctx)— drains active streamscontainer.Close()— closes SQL pools, Redis clients, Pub/Sub consumers, and other registered datasourcesmetricServer.Shutdown(ctx)— stops/metrics- Logger close — if the logger implements
io.Closer, itsClose()is called last
The container's Close is what commits Pub/Sub offsets and lets SQL drivers finish in-progress queries. Application code does not need to coordinate this order.
OnStart hooks vs shutdown hooks
GoFr exposes OnStart hooks for synchronous startup work (cache warmup, seeding). There is no public OnShutdown hook today; App.Shutdown is what gets called and it operates on the framework's own resources. If you need cleanup on exit for resources you own (custom goroutines, file handles, third-party clients), use context-cancellation: pass a context.Context derived from signal.NotifyContext(...) into your goroutines and have each goroutine defer its own cleanup when that context is cancelled. The framework's App.Shutdown runs concurrently with this, so total wind-down stays within SHUTDOWN_GRACE_PERIOD.
The Kubernetes termination flow
When kubelet decides to evict a pod, it executes this sequence:
- Pod's status flips to
Terminating; endpoints controllers begin removing the pod from ServiceEndpoints. preStophook runs (if configured).SIGTERMis sent to PID 1.- After
terminationGracePeriodSeconds(default 30s),SIGKILLis sent.
Steps 1 and 3 race: kube-proxy on every node needs time to update iptables/IPVS rules. A pod can still receive new traffic for a second or two after SIGTERM. The fix is a preStop sleep that delays shutdown long enough for endpoint removal to propagate.
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
image: ghcr.io/example/orders-api:1.4.2
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
env:
- name: SHUTDOWN_GRACE_PERIOD
value: "45s"
readinessProbe:
httpGet:
path: /.well-known/health
port: 8000
livenessProbe:
httpGet:
path: /.well-known/alive
port: 8000
Sizing the grace period
Set the values so preStop + SHUTDOWN_GRACE_PERIOD is comfortably less than terminationGracePeriodSeconds. A useful starting point:
preStop: 5s (covers endpoint propagation on most clusters)SHUTDOWN_GRACE_PERIOD: P99 request latency × 2, plus headroom for Pub/Sub commitsterminationGracePeriodSeconds:preStop+SHUTDOWN_GRACE_PERIOD+ 10s buffer
For a service with 2s P99, that's 5s + 30s + 10s = 45–60s.
Per-datasource behavior
- SQL.
database/sqlwaits for active queries to finish onClose(). Long-running transactions can extend shutdown — keep request timeouts shorter thanSHUTDOWN_GRACE_PERIOD. - Redis / NoSQL. Clients close idle connections immediately and wait for in-flight commands.
- Pub/Sub. GoFr's subscription manager respects the shutdown context — consumers stop polling and commit current offsets where the broker supports it (Kafka, NATS JetStream).
- Cron jobs. GoFr's
App.Shutdowndrains HTTP, gRPC, and metrics servers and closes datasource connections — it does not stop the cron scheduler or wait for in-flight cron tasks. Cron jobs run withcontext.Background(), so they continue past SIGTERM and may be cut off when the container is killed atterminationGracePeriodSeconds. If you have long-running cron work that must finish, run it as a separate KubernetesJobtriggered by aCronJobresource instead of inside the same pod, so the pod's lifecycle doesn't interrupt it.
Verification
Trigger a rolling restart and watch the logs:
kubectl rollout restart deployment/orders-api -n prod
kubectl logs -f -l app=orders-api -n prod --previous
You should see Shutting down server with a timeout of 30s followed by Application shutdown complete on each terminating pod, with no connection reset errors on the client side. From a load-test client running during the restart, error rate should stay below 0.1%.