Benchmarking Nginx in the cloud

Summary

11,969 http requests handled @ 84 nanoseconds across 100 concurrent connections? Yeah. Here’s what happened:


root@ip-10-161-82-11:/var/www/nginx-default# ab -n 1000000 -c100 http://localhost:80/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100000 requests
Completed 200000 requests
Completed 300000 requests
Completed 400000 requests
Completed 500000 requests
Completed 600000 requests
Completed 700000 requests
Completed 800000 requests
Completed 900000 requests
Completed 1000000 requests
Finished 1000000 requests


Server Software:        nginx/0.7.65
Server Hostname:        localhost
Server Port:            80

Document Path:          /
Document Length:        34989 bytes

Concurrency Level:      100
Time taken for tests:   83.544 seconds
Complete requests:      1000000
Failed requests:        0
Write errors:           0
Total transferred:      35202867880 bytes
HTML transferred:       34989862555 bytes
Requests per second:    11969.72 [#/sec] (mean)
Time per request:       8.354 [ms] (mean)
Time per request:       0.084 [ms] (mean, across all concurrent requests)
Transfer rate:          411492.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   0.2      2       5
Processing:     2    7   0.7      6      15
Waiting:        1    2   0.5      2      12
Total:          5    8   0.7      8      17
WARNING: The median and mean for the processing time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%      8
  66%      9
  75%      9
  80%      9
  90%      9
  95%      9
  98%      9
  99%     10
 100%     17 (longest request)
root@ip-10-161-82-11:/var/www/nginx-default# w
 08:28:26 up 25 min,  1 user,  load average: 0.63, 0.23, 0.08
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    c-69-181-58-125. 08:21    0.00s  0.01s  0.00s w
root@ip-10-161-82-11:/var/www/nginx-default# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
stepping	: 10
cpu MHz		: 2659.998
cache size	: 6144 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips	: 5322.20
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
stepping	: 10
cpu MHz		: 2659.998
cache size	: 6144 KB
physical id	: 1
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips	: 5322.20
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
stepping	: 10
cpu MHz		: 2659.998
cache size	: 6144 KB
physical id	: 2
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips	: 5322.20
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
stepping	: 10
cpu MHz		: 2659.998
cache size	: 6144 KB
physical id	: 3
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 3
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips	: 5322.20
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

root@ip-10-161-82-11:/var/www/nginx-default# cat /proc/meminfo 
MemTotal:       15752364 kB
MemFree:        14964352 kB
Buffers:           22708 kB
Cached:           216504 kB
SwapCached:            0 kB
Active:           134052 kB
Inactive:         110996 kB
Active(anon):       6000 kB
Inactive(anon):        0 kB
Active(file):     128052 kB
Inactive(file):   110996 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                36 kB
Writeback:             0 kB
AnonPages:          5860 kB
Mapped:             5052 kB
Shmem:               164 kB
Slab:              28876 kB
SReclaimable:      12480 kB
SUnreclaim:        16396 kB
KernelStack:         872 kB
PageTables:            0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     7876180 kB
Committed_AS:      47812 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        5988 kB
VmallocChunk:   34359732359 kB
DirectMap4k:    15728640 kB
DirectMap2M:           0 kB

Here’s a larger configuration running the latest stable version of nginx:


root@ip-10-166-162-224:/var/www# cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 17
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 1
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 2
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 2
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 3
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 3
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 4
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 4
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 4
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 5
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 5
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 5
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 6
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 6
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 6
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
physical id	: 7
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 7
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5335.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

root@ip-10-166-162-224:/var/www# cat /proc/meminfo 
MemTotal:       71700024 kB
MemFree:        69213656 kB
Buffers:            9736 kB
Cached:           214992 kB
SwapCached:            0 kB
Active:           116788 kB
Inactive:         116540 kB
Active(anon):       8628 kB
Inactive(anon):      152 kB
Active(file):     108160 kB
Inactive(file):   116388 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          8628 kB
Mapped:             5756 kB
Shmem:               172 kB
Slab:              31148 kB
SReclaimable:      21044 kB
SUnreclaim:        10104 kB
KernelStack:        1480 kB
PageTables:            0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    35850012 kB
Committed_AS:      62120 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        6100 kB
VmallocChunk:   34359732247 kB
DirectMap4k:    71680000 kB
DirectMap2M:           0 kB

root@ip-10-166-162-224:/var/www# ls -la index.html 
-rw-r--r-- 1 root root 281180 2010-10-16 09:02 index.html

root@ip-10-166-162-224:/var/www# ab -n 1000000 -c100 http://localhost:80/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100000 requests
Completed 200000 requests
Completed 300000 requests
Completed 400000 requests
Completed 500000 requests
Completed 600000 requests
Completed 700000 requests
Completed 800000 requests
Completed 900000 requests
Completed 1000000 requests
Finished 1000000 requests


Server Software:        nginx/0.8.52
Server Hostname:        localhost
Server Port:            80

Document Path:          /
Document Length:        281180 bytes

Concurrency Level:      100
Time taken for tests:   232.069 seconds
Complete requests:      1000000
Failed requests:        0
Write errors:           0
Total transferred:      281395406970 bytes
HTML transferred:       281181405900 bytes
Requests per second:    4309.07 [#/sec] (mean)
Time per request:       23.207 [ms] (mean)
Time per request:       0.232 [ms] (mean, across all concurrent requests)
Transfer rate:          1184132.86 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.2      1       3
Processing:     8   22   0.6     22      65
Waiting:        0    1   0.5      1      46
Total:          9   23   0.6     23      65

Percentage of the requests served within a certain time (ms)
  50%     23
  66%     23
  75%     24
  80%     24
  90%     24
  95%     24
  98%     24
  99%     24
 100%     65 (longest request)


Here’s my smallest cloud instance running apache (tested from an m1.xlarge running in the same availability zone). The results are different because this is a network test involving two nodes. More latency is expected. Actually, there are a lot of differences in this next sample. 2 concurrent connections is much different than 100. The html page being distributed by the Apache http server here is similar to the one from the last sample.


This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking asherbond.com (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        Apache
Server Hostname:        asherbond.com
Server Port:            80

Document Path:          /blog
Document Length:        234 bytes

Concurrency Level:      2
Time taken for tests:   135.299 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Non-2xx responses:      100000
Total transferred:      46700000 bytes
HTML transferred:       23400000 bytes
Requests per second:    739.11 [#/sec] (mean)
Time per request:       2.706 [ms] (mean)
Time per request:       1.353 [ms] (mean, across all concurrent requests)
Transfer rate:          337.07 [Kbytes/sec] received

Connection Times (ms)
             min  mean[+/-sd] median   max
Connect:        1    1   9.5      1    3002
Processing:     1    1   0.4      1      33
Waiting:        1    1   0.4      1      32
Total:          2    3   9.5      3    3003

Percentage of the requests served within a certain time (ms)
 50%      3
 66%      3
 75%      3
 80%      3
 90%      3
 95%      4
 98%      4
 99%      5
100%   3003 (longest request)

Considerations

  1. This is not a comparative analysis, but rather a generally uncontrolled experiment to collect system performance data from the cloud.
  2. Service-oriented Architecture is volatile when the supporting service layers are volatile.
  3. Compute infrastructure services (even EC2 m1.* and especially t1.micro) may be volatile depending on network health and demands at a given time.
  4. Benchmarking a local loop-back may give understated performance results on computers with lower IO bandwidth.
  5. Benchmarking a local loop-back may give overstated performance when service traverses networks suffering from high latency between client and server nodes.
  6. Some networks, virtual, and paravirutal compute environments limit the amount of concurrent connections during high (or even moderate) utilization.
  7. 100 concurrent connections isn’t very many, especially for Amazon Web Services.
  8. It would be interesting to see how many requests could be handled with 1000 concurrent connections.

Conclusions

  1. Bigger may have the potential of being better, but requires additional performance tuning for a specific application in order to take advantage of the compute capabilities of an 8 processor configuration.
  2. Sometimes the purpose of data collection reveals itself after such data becomes information.
  3. Sometimes it’s fun to show what a machine is capable of, whether you’re revving the engine on a dyno or just riding through some neighborhoods…

Tagged with: , , , , , , , , , , , ,
Posted in cloud computing, performance analysis
One comment on “Benchmarking Nginx in the cloud
  1. Bailey Senosk says:

    Hi there, I found your blog via Google while searching for a related topic, your site came up, it looks good.