[vbox-dev] questions about number of CPUs and host scheduling behaviour

Bayard Bell buffer.g.overflow at googlemail.com
Wed May 11 10:44:35 GMT 2011


On 10 May 2011, at 10:39, Nikolay Igotti wrote:

> Hi Bayard,
> 
> 06.05.2011 19:56, Bayard Bell пишет:
>> I've got 8 cores on my system, so I can hand it all over to the guests without sweating it. I'm looking at top, and I don't see any indication that other system load is contending. When I stop other apps running, it's only the amount of CPU idle time in the host that goes down, while the guest maintains the same level of CPU utilisation.
> CPU isn't that easily "given" to the VM, as RAM pages, for example. VirtualBox internally need to run few threads doing disk/network IO. Same situation with host OS too, so essentially some experiments is the best way to figure out how many vCPUs is reasonable to give to the guest to get best performance.

Any suggestions as to how to go about that methodically? What I know is that the run queue seems to back up to the point of crushing the host if I provide only two vCPUs, while with 4 vCPUs, I only seem to get consumption of 2 actual CPUs. I've got a slight further wrinkle, insofar as the default behaviour of the build environment is to look at the number of CPUs and amount of memory and decide for itself what the appropriate level of parallelism is, although I can work around this by setting a fixed value before experimenting with CPU count. Just to give this a bottom line, if I haven't mentioned this previously: I've got a compile job that normally takes at most few hours on comparable bare metal, and it's taking several days under VBox. Resolving this is the difference between being able to get acceptably slower performance under VBox and needing to sort myself out with a separate system.

>> The load I'm running is compilation. There shouldn't be a lot of system time, but the build system I'm using schedules a higher level of parallel jobs than there is CPU, using both CPU count and memory size to determine the maximum number of jobs. What nevertheless seems odd is that when the Solaris guest thinks it's got 3 or 4 threads on CPU, utilisation is half what I'd expect.
> With compilation, especially if you compile a lot of small files, significant part of load is fork/exec performance (and so, VMM in the guest), and of course, IO does matter too.

The I/O is trivial, but what I'm gathering is that the CPU overhead of the system calls is increased considerably. I don't see a lot of fork and exec load, but what I'm wondering is whether time spent in the kernel would actually be relatively longer, such that relatively lightweight system calls on a normal host would add up to a considerably higher percentage of CPU time in a virtual environment.

>> Now, I can imagine a variety of reasons for this, plenty of which I don't properly or at all understand, but looking at CPUPalette.app (I'm not aware of anything on OS X that approximates the functionality of mpstat), it looks like the load on the system is being spread evenly across CPUs.
> That's pretty much expected.
>>  My very naive reaction to this is that this isn't quite right, that VirtualBox should be trying to maintain processor affinity and pushing the CPU flat-out and not itself being subject to unnecessary additional SMP overhead, which is cumulative with the overhead of the guest.
> It's up to host OS scheduler to maintain (soft) affinity of threads the way it thinks most reasonable. SMP overhead, such as need for TLB shootdown, couldn't be cured by forcing affinity, affinity would only help with CPU cache entries reuse, if some form of address space ID is used (or if switches happens inside same address space).
> 
>>  (My understanding is that the ability to create CPU affinity in OS X is a bit weak compared to compared to Linux or Solaris [i.e. affinity is between threads and is meant to be defined by applications based on hw.cacheconfig and friends, whereas in Linux and Solaris it can be defined more strictly in terms of processors and processes].)
> Don't think you really need that. As VBox doesn't do explicit gang scheduling, some assistance from host scheduler on that would be helpful, not explicit assignment of CPU affinity. In theory, good scheduler shall gang schedule threads with the same address space even without additional hints, as this will likely increase performance. Not sure if OSX does that, although.

Thanks for that info. I'll see if there's any documentation or source to satisfy my curiosity on this point. It might also be useful to see what DTrace can tell me. Does VBox have its own DTrace probes to help with these kinds of problems?

Cheers,
Bayard

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1515 bytes
Desc: not available
URL: <http://www.virtualbox.org/pipermail/vbox-dev/attachments/20110511/44bf596e/attachment.p7s>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 841 bytes
Desc: This is a digitally signed message part
URL: <http://www.virtualbox.org/pipermail/vbox-dev/attachments/20110511/44bf596e/attachment.sig>


More information about the vbox-dev mailing list