VirtualBox

Opened 15 years ago

Closed 15 years ago

#3156 closed defect (fixed)

Linux (Debian) host and conflicting UUIDs (VBoxSVC sync issue)

Reported by: corvus Owned by:
Component: other Version: VirtualBox 2.1.2
Keywords: semaphore Linux VBoxSVC sync syncronization poweroff Cc:
Guest type: other Host type: other

Description (last modified by Frank Mehnert)

There seems to be a bug in syncronization objects on linux hosts. We have been seeing a bug in VBox 1.6, 2.0, 2.1.0, 2.1.2. Reproduced on Debian Lenny 2.6.25-amd64, 2.6.26-amd64, 2.6.25-i686.

I haven't reported the bug for few months just because I found a workaround with UUID patching (see below) and it was hard to describe where the bug was. Now I have more info and that's why posting full details.

To reproduce:

  1. create at least 10 machines by cloning the same VDI file and creating the same settings.
  2. start machines in ascending order (by order of creation)
  3. shutdown machine by machine in descending order. Check if shutting down one machine may cause changing state of other (running) machines.

For instance, powering off machine number 8 may cause changing state of machine number 5 to "Aborted". The pairs of conflicting machines remain the same each start you start/stop machines. If you discovered the conflicting pair you may reproduce the bug by starting and shutting down just these 2 machines (but keep the order).

Playing around with the bug showed that the problem is connected to machine UUIDs and semaphores used for synchronization of the VirtualBox process and VBoxSVC (see details below). Therefore, just by changing UUID of one of conflicting machines the problem seems to disappear. But at the same time when UUID is changed there might appear another conflict with other machine in the set.

Looks like there is some semaphore which id is generated basing on machines UUID. The hashing function for creating semaphore id seems to be the key problem. I believe it is inside VBoxSVC module but haven't found yet.

Example:

I started machine N5, then started machine N8. Powered off machine N8 and machine N5 got into 'Aborted' state same moment. The VirtualBox window for machine N8 disappeared but the process was still running in the background.

I have attached to the VirtualBox process for machine N8 with gdb and checked the stack backtrace. You may see it in the attachment. There is a reference to a sourcecode: src/VBox/Main/SessionImpl.cpp (line 860). Seems like machine N8 got stuck at this point:

progress->WaitForCompletion (-1);

I hope this helps!

Attachments (1)

Screenshot.png (48.2 KB ) - added by corvus 15 years ago.
GDB stack backtrace of the hanging VirtualBox process

Download all attachments as: .zip

Change History (7)

by corvus, 15 years ago

Attachment: Screenshot.png added

GDB stack backtrace of the hanging VirtualBox process

comment:1 by Frank Mehnert, 15 years ago

Description: modified (diff)

comment:2 by Dmitry A. Kuminov, 15 years ago

I tried what you suggest on an 2.6.27-11-amd64 system (Ubuntu): started VMs with

bash -c "for ((a=1;a<=10;++a)) do ./VBoxManage startvm test\$a; done"

and then stopped them with

bash -c "for ((a=10;a>=1;--a)) do ./VBoxManage controlvm test\$a poweroff; done"

and didn't observe the behavior you describe.

A clash of the semaphore names at the VirtualBox side is impossible because the VM's full XML file path is used as a SYSV IPC semaphore name to guarantee its unicity. My guess is the problem is specific to your installation.

I can recommend to build a debug version of VirtualBox and collect the relevant logs to better understand what's going on. This can be done by running the clients from the following environment:

export VBOX_LOG=main.e.l.f+gui.e.l.f
export VBOX_LOG_FLAGS="time tid thread"
export VBOX_LOG_DEST=dir=/path/to/all/logs

Once you've got the logs of your crash, you may zip all of them and attach here.

in reply to:  2 comment:3 by corvus, 15 years ago

Replying to dmik:

I tried what you suggest on an 2.6.27-11-amd64 system (Ubuntu):

I have made additional tests on Ubuntu Gutsy 2.6.22-14-amd64 and Gentoo 2.6.27-8-i686. The bug doesn't show up. Also, we have updated kernels (to 2.6.26-1-amd64) on the Debian machine that can reproduce this bug and it is still there.

A clash of the semaphore names at the VirtualBox side is impossible because the VM's full XML file path is used as a SYSV IPC semaphore name to guarantee its unicity. My guess is the problem is specific to your installation.

I know, I learnt it from sourcecode. :) But it somehow happens on Debian. I found a little misuse of ftok libc function during my analysis. The second parameter proj_id should not be null (according to the man), but it is 0 according to the sources: check src/VBox/Main/SessionImpl.cpp: line 961

This misuse isn't critical (I checked the libc implementation of ftok) and should not alter this bug. Setting non-null value didn't help, but I would recommend you changing the code for future compatibility with libc.

I can recommend to build a debug version of VirtualBox and collect the relevant logs to better understand what's going on.

I have tried already, but debug build of 2.1.2 doesn't simply restore the machine from saved state. It (VirtualBox) crashes right after restoring machine state.

This can be done by running the clients from the following environment:

export VBOX_LOG=main.e.l.f+gui.e.l.f
export VBOX_LOG_FLAGS="time tid thread"
export VBOX_LOG_DEST=dir=/path/to/all/logs

Thank you for the hints! I was wondering where to see the full list of components/groups and available flags for debug logging. Is there a common list for that?

Once you've got the logs of your crash, you may zip all of them and attach here.

I will do as soon as I resolve the crashing of debug version when restoring machine from saved state.

comment:4 by Dmitry A. Kuminov, 15 years ago

Thank you for noticing the ftok() misuse.

The full list of all predefined logging groups is defined in include/iprt/log.h and in include/VBox/log.h.

comment:5 by corvus, 15 years ago

FYI. We have finally moved to new hardware and the bug doesn't replicate anymore. I assume that we had problems with filesystem which affected VBox stability somehow.

I guess you may close the bug. Sorry for disturbing you.

comment:6 by Sander van Leeuwen, 15 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use