What Is Nginx Load Balancing and Why Does It Matter
Load balancing distributes incoming traffic across multiple backend servers so that no single server carries the full request load. This improves both performance, because more servers can handle more requests simultaneously, and reliability, because traffic continues flowing to healthy servers when one fails. For web applications that have outgrown what a single server can handle, or for services where uptime matters, load balancing is often the first significant architectural step beyond a standalone setup.
Nginx is commonly used for this purpose because it handles high concurrency efficiently, has a straightforward configuration syntax, and is well-suited to acting as a reverse proxy in front of application servers. This guide covers how Nginx load balancing works, the different balancing algorithms available, how to configure health checks, and the practical considerations that determine whether a load-balanced setup actually performs reliably in production.
How Nginx Load Balancing Works
Nginx acts as a reverse proxy and load balancer. Incoming HTTP requests arrive at Nginx, which forwards them to one of a pool of backend servers according to the configured balancing algorithm. The response from the backend server returns to Nginx, which then sends it to the original client. The client has no knowledge of the load-balanced architecture behind the scenes.
This setup also creates a natural security boundary. Backend servers do not need direct internet access. Only the Nginx load balancer requires a public IP address. The backends can operate on private IP addresses behind the load balancer, which reduces their exposure to direct attack and limits the attack surface considerably.
Terminating HTTPS connections at the Nginx load balancer means traffic is decrypted before being forwarded to backend servers over your internal network. This keeps cryptographic overhead off the application servers and simplifies certificate management, because you only need to manage certificates in one place rather than on every backend instance.
The Upstream Configuration
In Nginx, the group of backend servers is defined using an upstream block. This block names the pool and lists the servers within it:
upstream backend {
server 10.0.0.1:8000;
server 10.0.0.2:8000;
server 10.0.0.3:8000;
}
server {
listen 80;
server_name yourdomain.com;
location / {
proxy_pass http://backend;
}
}
Requests forwarded to http://backend resolve to one of the listed servers based on the balancing algorithm in use. All proxy-related configuration, including headers, timeouts, and buffering settings, goes in the location block that handles the request.
The proxy_pass directive tells Nginx where to forward requests. By pointing it at the upstream name rather than a specific server IP, you delegate the server selection to the load balancing logic. This separation between the upstream definition and the proxy configuration is what makes it straightforward to adjust the backend pool without changing the main server block.
Load Balancing Algorithms
Nginx supports several load balancing algorithms, each suited to different traffic patterns and server capacities. Choosing the right algorithm depends on your backend setup and the nature of your application.
Round Robin
Round Robin is the default algorithm in Nginx. The load balancer distributes requests evenly across backend servers in the order they appear in the configuration. Over time, each server receives approximately the same number of requests. This approach works well when all backend servers have similar hardware capacity and are handling similar types of requests.
upstream backend {
server 10.0.0.1:8000;
server 10.0.0.2:8000;
}
Round Robin requires no special configuration because it is the default. If your servers differ significantly in capacity, you can use weighted round robin to account for this, which is covered in the weighted load balancing section below.
Least Connections
Least Connections forwards each new request to the backend server with the fewest active connections at that moment. This algorithm is useful when request processing times vary considerably. A server handling long-lived connections, such as those used by streaming services or WebSocket applications, will accumulate fewer concurrent connections than one handling brief, synchronous requests. Least Connections helps balance actual load rather than just the raw count of requests sent to each server.
upstream backend {
least_conn;
server 10.0.0.1:8000;
server 10.0.0.2:8000;
}
IP Hash
IP Hash uses a hash of the client IP address to determine which backend server handles each request. This means a particular client address consistently connects to the same backend server. For applications that store session data locally on a specific server, this consistency prevents users from losing their session when their next request routes to a different backend.
upstream backend {
ip_hash;
server 10.0.0.1:8000;
server 10.0.0.2:8000;
server 10.0.0.3:8000;
}
IP Hash does have limitations worth understanding. When a backend server is removed from the pool, the hash mapping is not automatically recalculated for existing clients, which can cause temporary routing issues. Additionally, if a large proportion of your traffic comes from behind shared IP addresses, such as corporate NAT gateways or mobile carrier proxies, the distribution across backends may become uneven. For most modern applications, storing session data in a shared external store like Redis is preferable to relying on server-side session affinity.
Weighted Load Balancing
Both Round Robin and Least Connections support weighting, which lets you direct a disproportionate share of traffic to servers with greater capacity. A server with a weight of 3 receives approximately three times as many requests as a server with a weight of 1:
upstream backend {
server 10.0.0.1:8000 weight=3;
server 10.0.0.2:8000 weight=2;
server 10.0.0.3:8000 weight=1;
}
Weighting is useful when you are gradually scaling your backend infrastructure, when servers have different hardware specifications, or when you want to test new server configurations with a smaller portion of live traffic before routing everything through them.
Health Checks
Without health checks, Nginx continues forwarding requests to a backend that has stopped responding correctly, resulting in errors for affected users. Configuring passive health checks tells Nginx to detect failing backends and route traffic away from them automatically.
upstream backend {
server 10.0.0.1:8000 max_fails=3 fail_timeout=30s;
server 10.0.0.2:8000 max_fails=3 fail_timeout=30s;
server 10.0.0.3:8000 max_fails=3 fail_timeout=30s;
}
The max_fails parameter sets how many consecutive failed responses Nginx tolerates before marking a server as unavailable. The fail_timeout parameter defines both how long the failures must occur within and how long Nginx keeps the server marked as failed. In this example, if Nginx records three consecutive failures from a backend within 30 seconds, it stops sending traffic to that server for 30 seconds. After that interval, it sends a single probe request; if that succeeds, the backend returns to the pool.
What Nginx counts as a failure depends on the proxy_next_upstream directive, which defines which HTTP status codes, timeouts, and other conditions trigger failover to the next server. Review this directive to ensure it matches your application behaviour and expected error codes.
Active health checks, where Nginx periodically probes backends regardless of request failures, require Nginx Plus or a third-party module such as nginx_upstream_check_module. For most open-source deployments, passive health checks provide adequate resilience without additional tooling.
Session Persistence and Sticky Sessions
Some applications store session data locally on the server that first handled the request. In a load-balanced setup without session persistence, subsequent requests from the same user could route to a different backend, causing that session data to be unavailable. IP Hash provides basic session affinity, but it has the limitations described earlier.
For open-source Nginx, the nginx-sticky-module third-party module adds sticky cookie support, where Nginx sets a tracking cookie on the client and uses it to route subsequent requests to the same backend. Alternatively, the most scalable approach is to externalise session storage entirely, using a shared session store like Redis or Memcached that all backend servers can access. This decouples session management from server affinity and makes horizontal scaling straightforward.
If you are building an application that will run behind a load balancer, designing it to use shared external session storage from the start avoids this problem entirely and removes the need for sticky sessions at the load balancer level.
SSL Termination on the Load Balancer
When HTTPS traffic reaches the load balancer, it can be decrypted there and forwarded to backends over plain HTTP. This approach, known as SSL termination, reduces the computational overhead on backend servers because they no longer need to perform cryptographic operations for every request. Because backends typically sit on a private network, the lack of encryption between the load balancer and backends is an acceptable trade-off in most configurations.
server {
listen 443 ssl;
server_name yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
location / {
proxy_pass http://backend;
}
}
Managing certificates at the load balancer level means you only need to renew and rotate certificates in one location. Tools like Certbot integrate well with Nginx and can automate certificate renewal for your domain.
Logging and Monitoring in Load Balanced Setups
Load balancing introduces complexity to logging and monitoring because a single user request may pass through Nginx and multiple backend servers. When diagnosing issues or analysing traffic patterns, you need to correlate logs across all components to build a complete picture.
Use a centralised log management solution such as the ELK stack (Elasticsearch, Logstash, Kibana), Loki, or a commercial service. Configure Nginx to log in a structured format, such as JSON, that your log aggregation tool can parse reliably. Include a unique request identifier in every request so you can trace a single transaction across Nginx logs and individual backend application logs.
Each backend server in a load-balanced setup should be monitored individually as well as through the overall service endpoint. Monitoring helps you understand whether traffic is distributing as expected and whether any single backend is experiencing degraded performance before it causes user-visible errors.
Connection Pooling and Keep-Alive
Every new TCP connection from Nginx to a backend server incurs overhead: the handshake, TLS negotiation if applicable, and the time to send the request headers. HTTP keep-alive allows Nginx to reuse the same TCP connection for multiple requests to the same backend server, reducing this overhead significantly.
upstream backend {
server 10.0.0.1:8000;
server 10.0.0.2:8000;
keepalive 32;
}
location / {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
The keepalive directive in the upstream block creates a pool of idle connections that Nginx can reuse. The proxy_http_version 1.1 and proxy_set_header Connection "" directives enable HTTP/1.1 keep-alive for the upstream connection. Setting the pool size appropriately for your traffic volume prevents either underutilisation of connections or tying up connections that sit idle.
Graceful Upstream Changes
When you need to remove a backend server from the pool for maintenance, an upgrade, or because it has failed, Nginx lets you mark it as inactive using the down flag without reloading the full configuration. Marking a server down stops Nginx from sending new traffic to it while allowing existing requests to complete.
upstream backend {
server 10.0.0.1:8000;
server 10.0.0.2:8000 down;
server 10.0.0.3:8000;
}
This approach enables zero-downtime upgrades of backend servers. Take one server out of the pool by marking it down, perform the upgrade, remove the down flag, and move to the next server. The max_fails and fail_timeout parameters handle automatic removal of failed servers, but the down flag is the correct tool for planned maintenance because it gives you explicit control over when traffic stops flowing.
Common Load Balancing Pitfalls
Several configuration mistakes cause load-balanced setups to behave unexpectedly in production. Being aware of these helps you avoid them during initial setup and troubleshoot them if they arise.
- Misconfigured health checks: If
proxy_next_upstreamis too restrictive or too permissive, Nginx may remove healthy servers from the pool or keep routing traffic to failing ones. Match the health check settings to your application's expected error responses. - Session loss after backend changes: Removing or adding a server to an IP Hash upstream disrupts the hash mapping for some clients, potentially causing session loss. Plan backend changes during low-traffic windows and consider whether your application can tolerate brief session disruption.
- Unbalanced server capacities: Using round robin without weighting when servers have different capacities leads to some servers being overloaded while others sit underutilised. Assess relative capacities and adjust weights accordingly.
- Missing request logging context: Without a unique request identifier passed through to backend logs, correlating a user-visible error with the specific server that handled the request becomes difficult and time-consuming.
- Insufficient keep-alive connections: If the keepalive pool is too small, Nginx spends more time establishing new connections than processing requests. Monitor backend connection metrics to tune the pool size.
Next Steps
Setting up Nginx load balancing requires careful consideration of your backend architecture, traffic patterns, and availability requirements. If your current setup uses a single server and you are experiencing reliability issues or performance limits, moving to a load-balanced architecture with at least two backend servers is a practical step worth evaluating.
If you want a practical review of your server setup and load balancing configuration, you can get in touch with details of your current architecture, the platform you use, and the availability or performance concerns you are facing.