[vbox-dev] SMP & degraded network performance
jim at jameslittle.me.uk
Sat Jan 7 14:16:08 PST 2012
I have posted about the following on the VBox forums, but there has
been no insight yet, and I think it may be question for the devs
rather than users.
I decided to run some network throughput benchmarks (mainly interested
in max packets per second), and have noticed a large discrepancy when
assigning one CPU to a VM vs 2 or more CPUs. My setup (all 64-bit):
Host: Ubuntu 11.04 2.6.38-13-generic #53-Ubuntu SMP, 3 GHz Intel i7
950 (4 phys cores + HT), 12GB RAM, Ethernet: Intel 82574L Gigabit
Guest: Ubuntu 11.10 3.0.0-12-server kernel, 1GB Ram assigned
VirtualBox version: 4.1.8
Following are the results of the basic testing I performed using
netserver/netperf against a bridged network interface (bridged to
above Intel device). The following commands were run on the Guest
against its local interface (not the loopback):
netserver -4 (starts an ipv4 tcp/udp server).
netperf -H <IP_address_of_eth0> -t TCP_CRR (runs a TCP
connect/request/response transaction benchmark)
~17-18k transactions per second
2-CPU VM with eth0 interrupts and netserver/netperf all pinned to the same core
Confirmed v. low scheduling interrupts during benchmark (watching
2-CPU VM with 2nd core disabled via hotplugging
Disabled the second cpu with: echo '0' >
/sys/devices/system/cpu/cpu1/online and confirmed via /proc/interrupts
and other system tools.
Also worth noting that on the host system, the same test yields around
26k TPS. netfilter/conntrack is disabled on both host and guest.
So even with the second cpu disabled I'm seeing around a 50%
performance degradation vs the single-cpu VM. The results with more
than 2 CPUs were very similar to the 2-CPU scenario. I have repeated
the test with HT disabled, to rule out any possible issues there.
I repeated the same test on an OS X 10.6 host on similar architecture
(quad core intel i7) and the results were the same, also on VBox
I decided to extend the test to something CPU bound and ran a Linpack
benchmark (single thread), but the results are unaffected by number of
vCPUs (which is good). And so I also ran a a disk read benchmark using
hdparm, and this was also unaffected, so this seems to be confined to
network performance for now.
Does anyone have any insight to why this could be happening, or
whether it is a known issue?
More information about the vbox-dev