How Nginx Load Balancing Distributes Traffic Across Multiple Servers

When a web application outgrows what a single server can handle, the practical solution is to run multiple application servers and use a load balancer to distribute incoming requests between them. Nginx handles this role by sitting in front of your backend servers, accepting all HTTP and HTTPS traffic, and forwarding each request according to rules you define.

Getting the load balancing configuration right matters because the distribution algorithm affects response times, session reliability, and how well your infrastructure handles server failures. This guide covers the main algorithms available in Nginx, how to configure them, and what to watch for when traffic starts flowing across multiple backend instances.

Basic Nginx Load Balancer Configuration

The core of Nginx load balancing uses an upstream block to define your backend servers and a server block to accept incoming requests and forward them. Here is a basic configuration that distributes traffic across three application instances:

upstream backend {
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Each server in the upstream block represents a separate application instance. These can run on the same machine using different ports, or on different physical or virtual servers entirely. The proxy_set_header directives pass the original client IP address through to the backend, which keeps your application logs accurate and ensures session handling works correctly.

If you are setting up the Nginx configuration from scratch on an Ubuntu server, the reverse proxy guide covers certificate installation and HTTPS proxying alongside the basic setup. That process pairs well with load balancing when you need both secure connections and traffic distribution.

Round Robin Distribution

Round robin is the default algorithm in Nginx and the simplest to understand. Each incoming request goes to the next server in the list, cycling back to the first server after reaching the end. No state is maintained between requests, which keeps the processing overhead very low.

upstream backend {
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

Round robin works well when your backend servers have identical hardware specifications and your application processes each request in roughly the same amount of time. If one server takes significantly longer to respond than the others, it will accumulate requests while the faster servers sit underutilised.

This algorithm suits stateless applications where no client needs to return to the same server between requests. API backends, content delivery services, and static file servers typically fit this profile.

Least Connections Distribution

Least connections forwards each new request to the backend server that currently has the fewest active connections. This prevents a server handling slow or long-running requests from building up a queue while faster servers complete their work.

upstream backend {
    least_conn;
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

Applications with variable request processing times benefit most from this approach. A simple page that generates in 20 milliseconds should not wait behind a complex report that takes 30 seconds to build. Least connections ensures the fast request goes to a server that is not already occupied with something time-consuming.

PHP applications using PHP-FPM often see significant improvements with least connections because script execution times vary considerably depending on what the code does. Database queries, file operations, and external API calls all introduce unpredictability into response times.

IP Hash Distribution

IP hash assigns each client IP address to a specific backend server using a hash algorithm. The same client IP always connects to the same backend server, provided that server remains available. This matters for applications that store session data locally on the server rather than in a shared location.

upstream backend {
    ip_hash;
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;
}

When a backend server is removed from the pool for maintenance or because it has failed, the hash mapping redistributes clients to the remaining servers. Some clients will be assigned to different servers and will lose their local session data unless the application stores sessions elsewhere. This redistribution behaviour is worth knowing before relying on IP hash for session persistence.

Most modern web applications store sessions in a shared session store such as a database or Redis. When sessions are shared, IP hash becomes unnecessary because any backend server can handle any client request. If your application still uses local file-based sessions, consider moving to shared session storage as a more reliable long-term approach.

Weighted Distribution

All the algorithms above can be combined with server weights to control how traffic is divided. A server with a higher weight receives a proportionally larger share of requests. This is useful when your backend servers have different hardware specifications or capacities.

upstream backend {
    least_conn;
    server 192.168.1.10:8000 weight=3;
    server 192.168.1.11:8000 weight=2;
    server 192.168.1.12:8000;
}

In this example, the first server receives three times as many requests as the third server, and the second server receives twice as many. Use weighted distribution to align traffic分配 with the actual capacity of each backend when they are not identical in performance.

Health Checks and Automatic Failover

Nginx can detect when a backend server is failing and remove it from the rotation automatically. Without health checks configured, a failed server continues receiving requests until someone manually updates the configuration.

Active health checks are a feature of Nginx Plus, the commercial version. In open-source Nginx, the max_fails and fail_timeout parameters provide passive health checking based on connection failures:

upstream backend {
    server 192.168.1.10:8000 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8000 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:8000 max_fails=3 fail_timeout=30s;
}

The max_fails parameter sets how many connection failures trigger Nginx to consider a server unavailable. The fail_timeout parameter specifies how long to mark the server as unavailable before attempting to use it again. When a server is marked down, Nginx distributes requests to the remaining healthy servers until it comes back online.

This passive approach catches servers that become completely unreachable, but it does not detect servers that are responding slowly or returning errors for specific requests. For deeper health monitoring, you may need external monitoring tools or the commercial Nginx Plus features.

Sticky Sessions for Local Session Storage

When an application stores session data in the local server process and you cannot change that behaviour immediately, sticky sessions ensure a client consistently connects to the same backend server. Nginx supports cookie-based sticky sessions:

upstream backend {
    least_conn;
    server 192.168.1.10:8000;
    server 192.168.1.11:8000;
    server 192.168.1.12:8000;

    sticky cookie srv_id expires=1h domain=example.com path=/;
}

When a client makes its first request, Nginx sets a cookie identifying which backend server handled it. Subsequent requests from that client include the cookie, and Nginx forwards the request to the same server.

Sticky sessions are useful as a temporary measure while you migrate to shared session storage. The cleaner long-term solution is to store session data in a shared location such as a database or Redis, which allows any backend server to handle any client request without relying on cookies or IP affinity.

Logging Backend Performance and Request Distribution

Monitoring which backend server handles each request helps you spot distribution problems and performance issues early. Configure a custom log format that includes the upstream server address:

log_format upstream '$remote_addr - $upstream_addr '
                    '"$request" $status $body_bytes_sent';

access_log /var/log/nginx/upstream.access.log upstream;

The $upstream_addr variable records the IP address of the backend server that handled the request. Review these logs regularly to check that traffic is distributed as expected. If one server consistently receives more or fewer requests than intended, review your algorithm configuration and server weights.

Adding response time tracking helps identify servers that are overloaded or experiencing hardware or database issues:

log_format upstream '$remote_addr - $upstream_addr '
                    '"$request" $status $body_bytes_sent '
                    'rt=$upstream_response_time uct=$upstream_connect_time';

A server with significantly higher average upstream response times than the others usually indicates it is either overloaded or has a problem connecting to a database or external service. Investigate those servers first when troubleshooting performance issues.

Common Load Balancing Mistakes to Avoid

Several configuration errors cause problems in load balanced environments that are difficult to diagnose without knowing what to look for.

  • Missing proxy headers: Without X-Real-IP and X-Forwarded-For headers, your application logs all requests from the load balancer IP, making troubleshooting very difficult.
  • No health checks on heterogeneous servers: If your backend servers have different capacities and you do not use weighted distribution, weaker servers will overload while stronger ones sit idle.
  • Ignoring response time monitoring: A backend server that is up and accepting connections but responding slowly will still receive requests unless you monitor upstream response times.
  • Relying on IP hash for session persistence without a backup plan: When a server fails and clients redistribute, session loss occurs. Have a plan for session recovery or migration to shared storage.
  • No failover testing: Test what happens when a backend server goes offline. Know how quickly Nginx detects the failure and how traffic redistributes.

When to Consider a CDN Alongside Load Balancing

Load balancing handles traffic distribution between your backend servers, but it does not reduce the distance between users and your servers. For UK-based businesses serving a global audience, adding a content delivery network in front of your load balancer reduces latency by serving cached content from edge locations closer to users.

A CDN setup for business websites can cache static assets and reduce the load on your backend servers significantly. This works well alongside Nginx load balancing rather than replacing it. The CDN handles static content delivery while Nginx distributes dynamic requests across your application instances.

Practical Steps for Getting Started

If you are evaluating load balancing for your current setup, begin with a simple round robin configuration and monitor the distribution over a few days. Check whether your application handles requests from different IP addresses correctly and whether your logs show the expected backend server activity.

For applications with varying response times, switch to least connections and compare the response time distributions across your backend servers. If servers have different capacities, add weights to balance the load according to their actual performance.

Review your application session handling before relying on IP hash or sticky sessions. Moving to shared session storage removes a significant source of fragility from your load balanced architecture.

If you need help reviewing your current setup, prepare a short note with your website URL, hosting details, current load balancer configuration, and any performance issues you have observed before getting in touch.