VirtualBox

Ticket #6936 (closed defect: fixed)

Opened 4 years ago

Last modified 2 years ago

High CPU consumption for multi-processor Windows guests

Reported by: MaxZinal Owned by:
Priority: major Component: guest smp
Version: VirtualBox 3.2.4 Keywords:
Cc: Guest type: Windows
Host type: Linux

Description

Our install is:

  • 8-core 2-processor x86-based server with Xeon E5335 CPUs, 32 Gbytes of RAM.
  • Debian/GNU Linux 5 (Lenny) as a host system
  • Windows XP Pro SP3 as a guest system

We tried the following VirtualBox releases, with the same results: 3.1.8, 3.2.2, 3.2.4

We get high CPU consumption on host (up to 80-120%, measured by top) even when our guest system is idle. The effect goes away when we switch the system to a single CPU and turn off IO APIC (and move to single-processor kernel, of course).

We do not see same effect with exactly the same virtual machine on a notebook with Intel Core i7 processor, running Windows 7. So this problem might be processor- or even operating system-specific.

One of the interesting side-effects is that even when CPU usage on guest is near maximum (we use 4-core guest. and all cores are pretty busy at boot time), the CPU usage on host is about 170-180%.

The whole problem makes it pretty hard to use VirtualBox on that server, so we are abandoning our plans to buy VBox license for that host (at least until we can find a workaround).

Attachments

VBox.log Download (63.4 KB) - added by MaxZinal 4 years ago.
VirtualBox logfile
cpuinfo Download (5.2 KB) - added by MaxZinal 4 years ago.
CPU information for the server
1-2010-06-09-16-16-09.log Download (64.0 KB) - added by MaxZinal 4 years ago.
VirtualBox log from Windows host system (on the same server)
CpuTests.zip Download (37.2 KB) - added by MaxZinal 4 years ago.
VirtualBox logfiles from 3 test runs (1,2 and 4 cpus)
VBox-3.2.4-debug.log Download (56.0 KB) - added by MaxZinal 4 years ago.
VBox.log from VirtualBox OSE 3.2.4 built in debug mode

Change History

Changed 4 years ago by MaxZinal

VirtualBox logfile

Changed 4 years ago by MaxZinal

CPU information for the server

comment:1 Changed 4 years ago by MaxZinal

This seems pretty much like tickets #6928, #6583, #6814, #6204, and, specifically, #4392.

It seems pretty obvious that there is some sort of design or implementation defect in VirtualBox multi-core guest support.

I also have to add, that we tried both 64-bit and 32-bit host systems (the latter using kernel compiled with large memory support) - no difference at all.

comment:2 follow-ups: ↓ 5 ↓ 9 Changed 4 years ago by sandervl73

Does it happen with a 1 CPU Windows XP guest with IO-APIC turned on?

Your CPU supports the extension required for properly dealing with IO-APIC overhead, so that can't be the problem (rules out #4392).

comment:3 Changed 4 years ago by ToddAndMargo

Just adding myself to the Cc: list

comment:4 Changed 4 years ago by sandervl73

  • Component changed from VMM to guest smp

comment:5 in reply to: ↑ 2 Changed 4 years ago by MaxZinal

Replying to sandervl73:

Does it happen with a 1 CPU Windows XP guest with IO-APIC turned on?

Your CPU supports the extension required for properly dealing with IO-APIC overhead, so that can't be the problem (rules out #4392).

I will check that on Friday. I promise :)

For now I can add that VirtualBox 3.2.4 works just fine on that server when we installed Windows 2003 x64 Server Standard Edition on it as a host system (temporarily, just for a test). We see no load at all on host system when our guest system is idle (as expected, of course).

At the same time there are some strange benchmarking results: we have a relatively large RAR archive, and we tried to unpack it in several configurations:

  • inside the VM on a notebook with Core i7 and Windows 7
  • inside the VM on that server
  • inside the host system on a notebook
  • inside the host system on the server (Linux OS)
  • inside the host system on the server (Windows OS)

Here are approximate timing numbers, pretty strange for me:

VM / notebook: 95 seconds VM / server: 81 seconds notebook: 63 seconds server (Linux): 16 seconds server (Windows) 15 seconds

Perhaps this performance difference for archive unpack operation on a server and on a VM inside that server is somehow connected with high resource usage on Linux host?

comment:6 follow-up: ↓ 8 Changed 4 years ago by sandervl73

I very much doubt that. I hope you realize benchmarking in a VM is a lot more complicated than on a real machine. Did you run the unrar once or several times? Keep in mind that the dynamic disk image of the VM might be expanded during heavy file io. File expansion can be very expensive.

SATA vs IDE can make a difference as well as host cache on/off.

comment:7 Changed 4 years ago by now

The same happens with a 64-bit Ubuntu host and 32-bit XP. stracing the virtualbox process shows an endless: read(17, 0x116a0a4, 4096) = -1 EAGAIN (Resource temporarily unavailable)

read(17, 0x116a0a4, 4096) = -1 EAGAIN (Resource temporarily unavailable)

read(30, 0x11fdbb4, 4096) = -1 EAGAIN (Resource temporarily unavailable)

poll([{fd=18, events=POLLIN}, {fd=25, events=POLLIN|POLLPRI}, {fd=27, events=POLLIN|POLLPRI},

{fd=28, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}, {fd=17, events=POLLIN}, {fd=34, events=POLLIN}], 11, 0) = 0 (Timeout)

read(17, 0x116a0a4, 4096) = -1 EAGAIN (Resource temporarily unavailable)

read(17, 0x116a0a4, 4096) = -1 EAGAIN (Resource temporarily unavailable)

read(30, 0x11fdbb4, 4096) = -1 EAGAIN (Resource temporarily unavailable)

Maybe this helps.

comment:8 in reply to: ↑ 6 Changed 4 years ago by MaxZinal

Replying to sandervl73:

I very much doubt that. I hope you realize benchmarking in a VM is a lot

more complicated than on a real machine.

I know that.

Did you run the unrar once or several times? Keep in mind that the dynamic disk image of the VM might be expanded during heavy file io. File expansion can be very expensive.

I know that. Unpacking was done multiple times with approximately the same results. Disk image size was stable during tests.

SATA vs IDE can make a difference as well as host cache on/off.

Host cache is turned on.

We use IDE drives for compatibility with other virtualization systems. If SATA/IDE matters so much in that case (powerful host running a single relatively small guest) , then I vote for a defect in IDE emulation.

Changed 4 years ago by MaxZinal

VirtualBox log from Windows host system (on the same server)

comment:9 in reply to: ↑ 2 Changed 4 years ago by MaxZinal

Replying to sandervl73:

Does it happen with a 1 CPU Windows XP guest with IO-APIC turned on?

Here are the results from 3 tests with the same virtual machine:

  • with one virtual CPU
  • with two virtual CPUs
  • with four virtual CPUs

IO APIC have been turned on for all tests.

All tests were performed when guest system was near idle (1-2% CPU load according to Windows XP Task Manager). I've changed CPU count in VM settings, restarted VM, waited for several minutes for things to settle up, and then captured CPU usage counters.

Host CPU usage was measured using 'top' utility, for appropriate VBoxHeadless process, with 5-seconds update interval.

In general host CPU usage was jumping about some averages, with rare relatively high peaks.

VCPU countHost CPU usage, %Host CPU peaks, %
14-830
210-2035
455-95120

I don't know how to properly interpret these numbers, but I know very well that I can't start more that 3-5 of *idle* 4-CPU VMs on that server. It makes me pretty sad :(

Changed 4 years ago by MaxZinal

VirtualBox logfiles from 3 test runs (1,2 and 4 cpus)

comment:10 follow-up: ↓ 11 Changed 4 years ago by sandervl73

Looks like the VT-x feature to reduce APIC overhead isn't working for some reason. The capability bit is set, but your measurements suggest it doesn't have the expected result. Quite strange. Do you have the latest BIOS installed for your server?

comment:11 in reply to: ↑ 10 Changed 4 years ago by MaxZinal

Replying to sandervl73:

Looks like the VT-x feature to reduce APIC overhead isn't working for some reason. The capability bit is set, but your measurements suggest it doesn't have the expected result. Quite strange. Do you have the latest BIOS installed for your server?

Installed the latest firmware updates package from Intel (2010-03-06). Still the same picture: idle guest, busy host.

Just for the record, here is the system description:

  • Motherboard: Intel S5000PAL
  • System: Intel SR2500 based
  • CPUs: 2 x Intel Xeon E5335, 2.0 GHz
  • Firmware revisions: BIOS: 99, BMC: 66, FRUSDR: 48

comment:12 Changed 4 years ago by MaxZinal

Perhaps I can collect some additional data for the analysis?

For now we installed VMWare Server 2.0.2, and it's SMP support just works (although limited to two virtual CPUs).

comment:13 Changed 4 years ago by sandervl73

I can send you a debug build (.run installer only though) or you could build OSE yourself. The debug statistics will show the cause of the performance problem.

Changed 4 years ago by MaxZinal

VBox.log from VirtualBox OSE 3.2.4 built in debug mode

comment:14 Changed 4 years ago by MaxZinal

We have build debug version of VirtualBox OSE 3.2.4, and I attached a log file from its run. I'm not pretty sure it's useful for anything, IMHO there's nothing new there in comparison to the old (non-debug) log.

It seems that high CPU usage on host system is mostly caused by even small amount of IO inside the virtual machine. Pure CPU-bound programs (like SuperPI or other benchmarks - both integer and floating point) - seem to perform well, with almost not slowdown.

On the opposite side, if some program in the virtual machine attempts to perform IO (e.g. WinRAR, file copy operations, etc.), than we see very slow performance inside the guest (file copy speed about 2-3 MBytes/sec) and pretty high CPU load on host.

Pretty strange.

comment:15 follow-up: ↓ 16 Changed 4 years ago by ToddAndMargo

I am not seeing anything in 3.2.8 that addresses this. Am I correct?

comment:16 in reply to: ↑ 15 Changed 4 years ago by MaxZinal

Replying to ToddAndMargo:

I am not seeing anything in 3.2.8 that addresses this. Am I correct?

Yes, we see same picture with 3.2.8.

comment:17 Changed 4 years ago by sandervl73

You've added the wrong log file. The debug build creates another file in the directory where you launch the VM. That file contains the statistics that can shed some light on your problem.

comment:18 Changed 2 years ago by frank

Still relevant with VBox 4.1.6?

comment:19 Changed 2 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

No response, closing.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use