<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Nikolay,</div><div><br></div><div>What I've got at this point is: if I try to do the compilation with 2 virtual CPUs, the system falls over. If I try to do the compilation with 4-6 virtual CPUs, I never get significantly above 200% CPU utilisation. After 6 CPUs or so, I start to see stability problems in the host, possibly because Solaris tries to sync clocks across CPUs, which shows up more clearly if you build a debug kernel off the below. Here's the info on the workload I've been trying most frequently:</div><div><br></div><div><a href="https://www.illumos.org/projects/illumos-gate/wiki/How_To_Build_Illumos">https://www.illumos.org/projects/illumos-gate/wiki/How_To_Build_Illumos</a></div><div><br></div><div>Talking about it with other developers, the feedback is that VBox has problems where VMWare doesn't. Could you give a go and let me know what kinds of result you see? Automating as you suggest is certainly possible and reasonable if you have a local clone of the source gate. Clobbering before a build is default behaviour, and it does add some time before compilation gets underway to give you an idea of performance. My impression is that there are, relatively speaking, plenty of Illumos developers out there who have VBox and would prefer to use it, but some are using VMWare because it work.</div><div><br></div><div>Cheers,</div><div>Bayard</div><br><div><div>On 11 May 2011, at 12:29, Nikolay Igotti wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Bayard Bell wrote:<br><blockquote type="cite"><blockquote type="cite">CPU isn't that easily "given" to the VM, as RAM pages, for example. VirtualBox internally need to run few threads doing disk/network IO. Same situation with host OS too, so essentially some experiments is the best way to figure out how many vCPUs is reasonable to give to the guest to get best performance.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">    <br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Any suggestions as to how to go about that methodically? <br></blockquote>Well, just try some representative subset (10-20 mins of compilation) with 1,2,3,4... vCPUs and see the result :).<br>Could be easily automated with vboxshell and guest commands execution facility.<br><br><br><blockquote type="cite">What I know is that the run queue seems to back up to the point of crushing the host if I provide only two vCPUs, while with 4 vCPUs, I only seem to get consumption of 2 actual CPUs. I've got a slight further wrinkle, insofar as the default behaviour of the build environment is to look at the number of CPUs and amount of memory and decide for itself what the appropriate level of parallelism is, although I can work around this by setting a fixed value before experimenting with CPU count. Just to give this a bottom line, if I haven't mentioned this previously: I've got a compile job that normally takes at most few hours on comparable bare metal, and it's taking several days under VBox. Resolving this is the difference between being able to get acceptably slower performance under VBox and needing to sort myself out with a separate system.<br></blockquote><blockquote type="cite">  <br></blockquote>Is project you're compiling open source? This could make analysis simpler.<br><br><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">With compilation, especially if you compile a lot of small files, significant part of load is fork/exec performance (and so, VMM in the guest), and of course, IO does matter too.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">    <br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The I/O is trivial, but what I'm gathering is that the CPU overhead of the system calls is increased considerably. I don't see a lot of fork and exec load, but what I'm wondering is whether time spent in the kernel would actually be relatively longer, such that relatively lightweight system calls on a normal host would add up to a considerably higher percentage of CPU time in a virtual environment.<br></blockquote><blockquote type="cite">  <br></blockquote>Syscalls per se aren't affected much by virtualization, but privileged operations they perform sometimes are.<br>Generally, this need deeper analysis, and you may want to try running same guest on different host OS (ideally with<br>the same hardware), to see if some host specifics presented.<br><br>Also no sure if OSX is best OS to run SMP load in general.<br><br><blockquote type="cite"><blockquote type="cite">Don't think you really need that. As VBox doesn't do explicit gang scheduling, some assistance from host scheduler on that would be helpful, not explicit assignment of CPU affinity. In theory, good scheduler shall gang schedule threads with the same address space even without additional hints, as this will likely increase performance. Not sure if OSX does that, although.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">    <br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks for that info. I'll see if there's any documentation or source to satisfy my curiosity on this point. It might also be useful to see what DTrace can tell me. Does VBox have its own DTrace probes to help with these kinds of problems?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">  <br></blockquote>  Don't think VBox has much of probes on its own, but even OS traces could be sufficiently useful.<br><br>   <br>  Nikolay<br><br></div></blockquote></div><br></body></html>