Nginx Load Balancing Configuration & Algorithms Guide

How Nginx Load Balancing Distributes Traffic Across Multiple Servers

When a web application outgrows what a single server can handle, the practical solution is to run multiple application servers and use a load balancer to distribute incoming requests between them. Nginx handles this role by sitting in front of your backend servers, accepting all HTTP and HTTPS traffic, and forwarding each request according to rules you define.

Getting the load balancing configuration right matters because the distribution algorithm affects response times, session reliability, and how well your infrastructure handles server failures. This guide covers the main algorithms available in Nginx, how to configure them, and what to watch for when traffic starts flowing across multiple backend instances.

Basic Nginx Load Balancer Configuration

The core of Nginx load balancing uses an upstream block to define your backend servers and a server block to accept incoming requests and forward them. Here is a basic configuration that distributes traffic across three application instances:

upstream backend {
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Each server in the upstream block represents a separate application instance. These can run on the same machine using different ports, or on different physical or virtual servers entirely. The proxy_set_header directives pass the original client IP address through to the backend, which keeps your application logs accurate and ensures session handling works correctly.

If you are setting up the Nginx configuration from scratch on an Ubuntu server, the reverse proxy configuration guide covers certificate installation and HTTPS proxying alongside the basic setup. That process pairs well with load balancing when you need both secure connections and traffic distribution across multiple servers.

Round Robin Distribution

Round robin is the default algorithm in Nginx and the simplest to understand. Each incoming request goes to the next server in the list, cycling back to the first server after reaching the end. No state is maintained between requests, which keeps the processing overhead very low.

upstream backend {
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

Round robin works well when your backend servers have identical hardware specifications and your application processes each request in roughly the same amount of time. If one server takes significantly longer to respond than the others, it will accumulate requests while the faster servers sit underutilised.

This algorithm suits stateless applications where no client needs to return to the same server between requests. API backends, content delivery services, and static file servers typically fit this profile. For a deeper look at how traffic distribution algorithms work in principle, the load balancing explained guide covers the conceptual foundations alongside the configuration details.

Least Connections Distribution

Least connections forwards each new request to the backend server that currently has the fewest active connections. This prevents a server handling slow or long-running requests from building up a queue while faster servers complete their work.

upstream backend {
    least_conn;
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

Applications with variable request processing times benefit most from this approach. A simple page that generates in 20 milliseconds should not wait behind a complex report that takes 30 seconds to build. Least connections ensures the fast request goes to a server that is not already occupied with something time-consuming.

PHP applications using PHP-FPM often see significant improvements with least connections because script execution times vary considerably depending on what the code does. Database queries, file operations, and external API calls all introduce unpredictability into response times.

IP Hash Distribution

IP hash assigns each client IP address to a specific backend server using a hash algorithm. The same client IP always connects to the same backend server, provided that server remains available. This matters for applications that store session data locally on the server rather than in a shared location.

upstream backend {
    ip_hash;
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

When a backend server is removed from the pool for maintenance or because it has failed, the hash mapping redistributes clients to the remaining servers. Some clients will be assigned to different servers and will lose their local session data unless the application stores sessions elsewhere. This redistribution behaviour is worth knowing before relying on IP hash for session persistence.

Most modern web applications store sessions in a shared session store such as a database or Redis. When sessions are shared, IP hash becomes unnecessary because any backend server can handle any client request. If your application still uses local file-based sessions, consider moving to shared session storage as a more reliable long-term approach.

Weighted Distribution

All the algorithms above can be combined with server weights to control how traffic is divided. A server with a higher weight receives a proportionally larger share of requests. This is useful when your backend servers have different hardware specifications or capacities.

upstream backend {
    least_conn;
    server 192.168.1.10:8000 weight=3;
    server 192.168.1.11:8000 weight=2;
    server 192.168.1.12:8000;
}

In this example, the first server receives three times as many requests as the third server, and the second server receives twice as many. Use weighted distribution to align traffic allocation with the actual capacity of each backend when they are not identical in performance.

Weights also help during gradual capacity upgrades. If you add a newer, faster server alongside older hardware, assigning a higher weight to the newer machine eases the load off the older servers while you migrate workloads. This approach allows you to upgrade infrastructure incrementally without downtime.

Health Checks and Automatic Failover

Nginx can detect when a backend server is failing and remove it from the rotation automatically. Without health checks configured, a failed server continues receiving requests until someone manually updates the configuration.

Active health checks are a feature of Nginx Plus, the commercial version. In open-source Nginx, the max_fails and fail_timeout parameters provide passive health checking based on connection failures:

upstream backend {
    server 192.168.1.10:8000 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8000 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:8000 max_fails=3 fail_timeout=30s;
}

The max_fails parameter sets how many connection failures trigger Nginx to consider a server unavailable. The fail_timeout parameter specifies how long to mark the server as unavailable before attempting to use it again. When a server is marked down, Nginx distributes requests to the remaining healthy servers until it comes back online.

This passive approach catches servers that become completely unreachable, but it does not detect servers that are responding slowly or returning errors for specific requests. For deeper health monitoring, you may need external monitoring tools or the commercial Nginx Plus features.

When configuring max_fails, consider your typical request failure rate. A value of 3 works well for most setups, but busy applications with occasional transient failures might benefit from a slightly higher threshold to avoid removing servers unnecessarily during brief network hiccups.

Sticky Sessions for Local Session Storage

When an application stores session data in the local server process and you cannot change that behaviour immediately, sticky sessions ensure a client consistently connects to the same backend server. Nginx supports cookie-based sticky sessions:

upstream backend {
    least_conn;
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;

    sticky cookie srv_id expires=1h domain=example.com path=/;
}

When a client makes its first request, Nginx sets a cookie identifying which backend server handled it. Subsequent requests from that client include the cookie, and Nginx forwards the request to the same server.

Sticky sessions are useful as a temporary measure while you migrate to shared session storage. The cleaner long-term solution is to store session data in a shared location such as a database or Redis, which allows any backend server to handle any client request without relying on cookies or IP affinity.

Logging Backend Performance and Request Distribution

Monitoring which backend server handles each request helps you spot distribution problems and performance issues early. Configure a custom log format that includes the upstream server address:

log_format upstream '$remote_addr - $upstream_addr '
                    '"$request" $status $body_bytes_sent';

access_log /var/log/nginx/upstream.access.log upstream;

The $upstream_addr variable records the IP address of the backend server that handled the request. Review these logs regularly to check that traffic is distributed as expected. If one server consistently receives more or fewer requests than intended, review your algorithm configuration and server weights.

Adding response time tracking helps identify servers that are overloaded or experiencing hardware or database issues:

log_format upstream '$remote_addr - $upstream_addr '
                    '"$request" $status $body_bytes_sent '
                    'rt=$upstream_response_time uct=$upstream_connect_time';

A server with significantly higher average upstream response times than the others usually indicates it is either overloaded or has a problem connecting to a database or external service. Investigate those servers first when troubleshooting performance issues.

The rt variable records the total time the backend took to respond, while uct captures the time spent establishing the connection. High uct values suggest network latency between Nginx and the backend, which might indicate the servers are geographically distant from each other or experiencing network congestion.

Common Load Balancing Mistakes to Avoid

Several configuration errors cause problems in load balanced environments that are difficult to diagnose without knowing what to look for.

Missing proxy headers: Without X-Real-IP and X-Forwarded-For headers, your application logs all requests from the load balancer IP, making troubleshooting very difficult.
No health checks on heterogeneous servers: If your backend servers have different capacities and you do not use weighted distribution, weaker servers will overload while stronger ones sit idle.
Ignoring response time monitoring: A backend server that is up and accepting connections but responding slowly will still receive requests unless you monitor upstream response times.
Relying on IP hash for session persistence without a backup plan: When a server fails and clients redistribute, session loss occurs. Have a plan for session recovery or migration to shared storage.
No failover testing: Test what happens when a backend server goes offline. Know how quickly Nginx detects the failure and how traffic redistributes.
Forgetting connection timeouts: Configure proxy_connect_timeout, proxy_read_timeout, and proxy_send_timeout to prevent requests from hanging indefinitely on a slow backend.

Connection Handling and Timeouts

Beyond choosing a distribution algorithm, proper timeout configuration prevents slow or failed backends from affecting overall system performance. Nginx provides several timeout directives that control how long it waits for the backend to respond.

location / {
    proxy_pass http://backend;
    proxy_connect_timeout 5s;
    proxy_read_timeout 30s;
    proxy_send_timeout 30s;
    proxy_next_upstream error timeout http_500 http_502 http_503;
}

The proxy_next_upstream directive tells Nginx to retry a request on another backend server when the first attempt fails with certain conditions. This automatic retry handles transient failures without returning an error to the client, provided another healthy server is available.

Set these timeouts based on your application characteristics. API services that respond quickly should have shorter timeouts, while applications handling long-running operations need longer windows. Review timeout values during load testing to ensure they match your actual response patterns.

When to Consider a CDN Alongside Load Balancing

Load balancing handles traffic distribution between your backend servers, but it does not reduce the distance between users and your servers. For UK-based businesses serving a global audience, adding a content delivery network in front of your load balancer reduces latency by serving cached content from edge locations closer to users.

A CDN setup for business websites can cache static assets and reduce the load on your backend servers significantly. This works well alongside Nginx load balancing rather than replacing it. The CDN handles static content delivery while Nginx distributes dynamic requests across your application instances.

The combination works particularly well when your application serves a mix of static and dynamic content. Images, stylesheets, JavaScript files, and downloadable documents can all be cached at the CDN edge, leaving your load balanced application servers to focus on processing personalised or database-driven requests.

Operational Considerations Before Going Live

Before deploying a load balancer to production, there are operational factors beyond the configuration itself that affect long-term reliability.

Configuration management matters. Store your Nginx configuration in version control and have a deployment process for applying changes. This gives you a rollback path if a configuration update causes problems and helps you track what changed when issues arise.

Consider the backup and recovery implications of your load balancing setup. If a load balancer node fails, you need either a standby instance or a plan for quickly provisioning a replacement. For smaller setups, a single Nginx instance handling both load balancing and reverse proxy duties is manageable, but larger environments benefit from clustering or high-availability configurations.

Document your backend server inventory and the logic behind your distribution algorithm and weights. When you add new servers or adjust capacity, update this documentation. Future troubleshooting goes much smoother when the configuration intent is recorded alongside the actual settings.

Getting Started With Your Configuration

If you are evaluating load balancing for your current setup, begin with a simple round robin configuration and monitor the distribution over a few days. Check whether your application handles requests from different IP addresses correctly and whether your logs show the expected backend server activity.

For applications with varying response times, switch to least connections and compare the response time distributions across your backend servers. If servers have different capacities, add weights to balance the load according to their actual performance.

Review your application session handling before relying on IP hash or sticky sessions. Moving to shared session storage removes a significant source of fragility from your load balanced architecture.

If you want a practical review of your setup, you can get in touch with details of your current infrastructure, the load balancing goals you want to achieve, and any performance issues you have observed. Having your server count, hardware specifications, and current traffic patterns noted helps the conversation move directly to practical solutions.