Siaw Young - Core Nginx Configuration and HTTP Load Testing with Autobench

The Core module controls essential Nginx features, some of which will have a direct impact on performance, such as the number of worker processes. It also includes some directives that are useful for debugging.

Official Nginx docs on the Core module

Below, I cover the directives that I think have the greatest impact on performance or are critical to change from default values for security purposes.

Core directives

`error_log`

Syntax: error_log file_path log_level
Accepted values for log_level: debug, info, notice, warn, error, and crit
Default: logs/error.log error

This directive can be placed in main, http, server, and location blocks to indicate specific rules for logging.

`include`

Nginx file importing. See Nginx Configuration Syntax.

`pid`

Syntax: pid pid/nginx.pid
Default: Defined at compile time

Defines the path of the pid file.

`multi_accept`

Syntax: multi_accept on
Default: off

Defines whether worker processes will accept all new connections (on), or one new connection at a time (off).

`use`

Syntax: use epoll
Default: Nginx will automatically choose the fastest one

Specifies the connection processing method to use. The available methods are select, poll, kqueue, epoll, rtsig, /dev/poll, and eventport.

For Linux systems, epoll seems to yield the best performance. Here is an interesting post comparing epoll and kqueue.

`user`

Syntax: user username groupname

Defines the user that will be used to start the worker processes. It’s dangerous to set the user and group of worker processes to root. Instead, create a new user specifically for Nginx worker processes (www-data is canonical).

user root root;
# change to
user www-data www-data;

`worker_connections`

Syntax: worker_connections 1024
Default: 1024

This sets the number of connections that can be received by each worker process. If you have 4 worker processes that can accept 1024 connections each, your system can accept a total of 4096 simultaneous connections. Related to worker_rlimit_nofile below.

`worker_cpu_affinity`

Syntax: worker_cpu_affinity 1000 0100 0010 0001

Allows you to assign worker processes to CPU cores. For example, if you’re running 3 worker processes on a dual-core CPU (which you shouldn’t, see below), you can configure the directive to assign 2 worker processes to the first CPU core and 1 to the second CPU core:

worker_cpu_affinity 10 01 10

There are 3 blocks for 3 worker processes, and each block has 2 digits for 2 CPU cores.

`worker_priority`

Syntax: worker_priority 5
Default: 0

Adjusts the priority level of worker processes. Decrease this number if your system is running other processes simultaneously and you want to micromanage their priority levels.¹

`worker_processes`

Syntax: worker_processes 1
Default: 1

This number should match the number of physical CPU cores on your system.

worker_proceses 1;
# set to match number of CPU cores
worker_processes 4; # assuming a quad-core system

`worker_rlimit_nofile`

Syntax: worker_rlimit_nofile 200000
Default: None, system determined

worker_rlimit_nofile sets the limit on the number of file descriptors that Nginx can open. You can see the OS limit by using the ulimit command.

Check out this excellent post for more on worker_rlimit_nofile. An excerpt:

When any program opens a file, the operating system (OS) returns a file descriptor (FD) that corresponds to that file. The program will refer to that FD in order to process the file. The limit for the maximum FDs on the server is usually set by the OS. To determine what the FD limits are on your server use the commands ulimit -Hn and ulimit -Sn which will give you the per user hard and soft file limits.

If you don’t set the worker_rlimit_nofile directive, then the settings of your OS will determine how many FDs can be used by NGINX (sic).

If the worker_rlimit_nofile directive is specified, then NGINX asks the OS to change the settings to the value specified in worker_rlimit_nofile.

In some configurations I’ve seen, the value is set at 2 times worker_connections to account for the two files per connection. So if worker_connections is set to 512, then a value of 1024 for worker_rlimit_nofile is used.

HTTP Load Testing with Autobench

The basic idea behind testing would be to test multiple load scenarios against different Nginx configuration settings to identify the ideal configuration settings for your expected load.

Autobench

A Perl wrapper around httperf that automatically runs httperf at increasing loads until saturation is reached. It can also generate .tsv graph files that can be opened in Excel.

Download it from here, make and make install.

A test command looks like this:

$ autobench --single_host --host1 192.168.1.10 --uri1 /index.html --quiet --low_rate 20 --high_rate 200 --rate_step 20 --num_call 10 --num_conn 5000 --timeout 5 --file results.tsv

Some Autobench options:

• --host1: The website host name you wish to test
• --uri1: The path of the file that will be downloaded
• --quiet: Does not display httperf information on the screen
• --low_rate: Connections per second at the beginning of the test
• --high_rate: Connections per second at the end of the test
• --rate_step: The number of connections to increase the rate by after each test
• --num_call: How many requests should be sent per connection
• --num_conn: Total amount of connections
• --timeout: The number of seconds elapsed before a request is considered lost
• --file: Export results as specified (.tsv file)

TODO: Find out more about Apache JMeter and locust.io

Graceful Upgrading

To upgrade Nginx gracefully, we need to:

Replace the binary
Replace the old pid file and run the new binary²

$ ps aux | grep nginx | grep master
root   19377  0.0  0.0  85868   152 ? Ss  Jul07  0:00 nginx: master process /usr/sbin/nginx
$ kill –USR2 19377

Kill all old worker processes

$ kill -WINCH 19377

Kill the old master process

$ kill -QUIT 19377

Footnotes

Priority levels range from -20 (most important) to 19 (least important). 0 is the default priority level for processes. Do not adjust this number past -5 as that is the priority level for kernel processes. ↩
See this Stack Overflow answer for what USR2 and WINCH mean; ↩