Last November we processed 1.1 million database requests per second during our peak Black Friday window. Here's what we changed to get there.

Connection pooling is not optional

The first thing that breaks at scale is connection overhead. We moved all services behind PgBouncer in transaction mode, which reduced our pg_stat_activity count from 3,200 to 180 at peak.

Read replicas and application-level routing

We added 4 read replicas and rewrote our ORM layer to route all non-mutating queries to replicas. This alone cut primary load by 70%.

Aggressive caching with stale-while-revalidate

For product catalog queries (90% of traffic) we applied a cache-aside pattern with Redis. TTL is 60 seconds with stale-while-revalidate. Cache hit rate sits at 94%.

The lesson: most database scaling problems are actually caching problems in disguise.