VirtualBox

Opened 10 years ago

Closed 8 years ago

#12912 closed defect (obsolete)

VBoxSVC locks with more than 63 VM`s running on the same host

Reported by: Alexandre Felipe Salgado Figueiredo Owned by:
Component: other Version: VirtualBox 4.3.10
Keywords: VBoxSVC Cc:
Guest type: Windows Host type: Windows

Description

Hi,

I´ve being experiencing problems with VBoxSVC when more than 63 VM´s running ( the issue begins when the 64th VM is started) , it locks with 13 to 15 % of system resource utilization, and if I try to stop or save one VM, it keeps locking the files preventing to change configurations/restarting the VM. The only way to unlock the files is to stop all VM´s and reboot the Host (killing VBoxSVC after all VM´s down just messes up everything ). I use this environment to test our multiuser application , and I need an environment with 70-80 VM´s running (heterogeneous , with Windows XP, 2003 and 7 as guests ), memory is not an issue, as with all VM´s running I still have 15Gb free on the host.

The host is a Core i7 4870K , with 64Gb of memory running Windows 2012R2 Enterprise. The vhd/vdi files are stored in a RAID 5 array .

If , for example, a new VM is created after the issue is triggered, and a new VM is created , after attaching the VHD, the VBox Manager frozen, i´ve killed it and opened a new one, when i tried to change the VM settings, i got this message >

Failed to open a session for the virtual machine Microsoft2.

The machine 'Microsoft2' is already locked for a session (or being unlocked).

Result Code: VBOX_E_INVALID_OBJECT_STATE (0x80BB0007) Component: Machine Interface: IMachine {480cf695-2d8d-4256-9c7c-cce4184fa048}

In this case, I´ve saved all running VM´s, but all VirtualBox.exe still running ( but no program shows on the task manager ), i´ve attached both print screen.

The only solution i found so far, is to reboot the host, after saving all running VM´s when the host is rebooted, a Vbox Error popup showed up many times ( it seems one per "lost" running VirtualBox.exe process ) , attached the print screen.

No core dumps were generated.

This issue happens since version 4.0 ( i´ve tested all up to 4.3.11 )

Let me know which steps/information is needed in order to pinpoint the root cause and help you fixing this issue.

Attachments (5)

Host Resources utilization.JPG (70.6 KB ) - added by Alexandre Felipe Salgado Figueiredo 10 years ago.
VBoxSVC resource consuption.JPG (89.5 KB ) - added by Alexandre Felipe Salgado Figueiredo 10 years ago.
No VB VM´s running.JPG (28.1 KB ) - added by Alexandre Felipe Salgado Figueiredo 10 years ago.
Many VM´s Process running.JPG (94.2 KB ) - added by Alexandre Felipe Salgado Figueiredo 10 years ago.
Vbox Error on Reboot.JPG (20.6 KB ) - added by Alexandre Felipe Salgado Figueiredo 10 years ago.

Download all attachments as: .zip

Change History (9)

by Alexandre Felipe Salgado Figueiredo, 10 years ago

by Alexandre Felipe Salgado Figueiredo, 10 years ago

by Alexandre Felipe Salgado Figueiredo, 10 years ago

Attachment: No VB VM´s running.JPG added

by Alexandre Felipe Salgado Figueiredo, 10 years ago

by Alexandre Felipe Salgado Figueiredo, 10 years ago

Attachment: Vbox Error on Reboot.JPG added

comment:1 by Ramshankar Venkataraman, 10 years ago

Component: VMM/HWACCMother

comment:2 by Klaus Espenlaub, 10 years ago

I like the offer to help fixing this... the reason for this limit is extremely simple: VirtualBox uses the Windows API function WaitForMultipleObjects to monitor the active API clients (each VM process is a client, but also each VM manager GUI or VBoxManage while it's doing its job changing the settings of a VM).

Microsoft in their unlimited wisdom decided when they released Windows NT that no one ever will need to wait for more than 64 handles, and haven't bothered to change their mind in over 21 years, in which the problem became more and more pressing (bigger systems, more complex apps, more processes and threads)...

In our case one handle is needed for an event semaphore, to be able to poke the client watcher thread out of its wait.

The code is at https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Main/src-server/ClientWatcher.cpp#L192

Microsoft's advice for getting past this limit is to set up a hierarchy of helper threads, where each would wait for up to 63 objects (one also needs to have some way to terminate the whole mess), signalling some semaphore if the waiting completed, which is picked up by the next higher level of threads aggregating 63 of the results from the lower level threads. It's essentially a "base 63 tree" design. Two levels (64 threads total) can handle up to 63*63=3969 API clients, which I think is quite OK for now (not maxing out the theoretical limits, as there can be up to approx. 8000 VMs on a 64 bit host, and many more API clients which take up slots, too). Creating the threads should of course only be done on demand, and the updates to the handle arrays should be as incremental as possible, to limit the overhead.

This is so much effort for something which is the job of the OS that obviously so far no one in the VirtualBox team could collect enough motivation to go there. Which implies that no customer ever asked for it.

All other VirtualBox platforms (Linux, Mac OS X, Solaris) actually use the "generic" implementation, see https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Main/src-server/ClientWatcher.cpp#L692 which would in principle work for Windows as well. There's just one hitch: This code assumes that the death of a process which holds the last reference to a COM object is signaled immediately to the server where the object lives, triggering its destruction. Alas, with COM this doesn't work when the client process crashes, it takes 6 minutes (!) which is a totally unacceptable delay. No one would be able to do anything with the VM config which is blocked this way before this timeout elapsed.

So while you went straight for the hornet's nest, it's actually a programming task which doesn't need turning VirtualBox upside down. Anyone with sufficient Windows programming knowledge, multithreading experience and some time at hand could write a more or less drop-in replacement for WaitForMultipleObjects (let's call it WaitForManyObjects)... and we wouldn't insist on a drop-in replacement, it could also be done as a modification to the current Windows specific client watcher code (using some helper functions, otherwise the already complex code would become totally unreadable).

comment:3 by Alexandre Felipe Salgado Figueiredo, 10 years ago

Hi Klaus,

Thank you very much for the detailed explanation, but unfortunatelly i lack of the necessary knowledge in doing such change ( but i could understood very well the issue , i am an "old school" programmer far away from coding for a long time ) .

The question is, would changing the client watcher function to be able to run more than 63 VMs in Windows, be an "interesting enough cause" for one of the high level developers with the necessary knowledge to do this change ? VB has being arround for many years, and with your description this is a concept which is arround since the begining, so as you well said, no one complained about it yet.

In my specific case, i really need to run more than 63 VM´s in one host, i´ve made some tests , as nesting VM´s ( like 10 VM´s inside a VM ) and it worked on versions prior 4.3.8, but i followed in the ticket´s thread that the problem VERR_CPUM_TOO_MANY_CPUID_SUBLEAVES has being fixed on 4.3.10 , i think, for now i´ll resume the Nested VM´s idea, and keep tracking of VB tickets, if one day this is changed, for sure i´ll give a try .

Again, thank you Klaus for taking the necessary time to explain the issue.

comment:4 by aeichner, 8 years ago

Resolution: obsolete
Status: newclosed

Please reopen if still relevant with a recent VirtualBox release.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use