VirtualBox

Opened 10 years ago

Closed 8 years ago

#13518 closed defect (fixed)

XPCOM IPC deadlock

Reported by: gim Owned by:
Component: other Version: VirtualBox 4.3.18
Keywords: XPCOM, IPC, DEADLOCK Cc:
Guest type: all Host type: Linux

Description

So, get this deadlock is small chance, but it will increase when you use a lot VirtualMachines (over 50) and doing a lot of VBoxManage operation at the same time.

Here is callstack when VBoxManage stuck:

Thread 2 (Thread 0x7f879d91f700 (LWP 8336)):
#0  0x00007f879d24e89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f879d24a09b in _L_lock_1006 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f879d24a01c in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f879c9f4237 in VBoxNsprPR_Lock (lock=0x127b8e8)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptsynch.c:193
#4  0x00007f879c9f4fd5 in VBoxNsprPR_EnterMonitor (mon=0x127b8e0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptsynch.c:518
#5  0x00007f879c99ec59 in nsAutoMonitor::nsAutoMonitor (this=0x7f879d91e670, mon=0x127b8e0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/threads/nsAutoLock.h:250
#6  0x00007f879c9c44e6 in nsComponentManagerImpl::GetService (this=0x126f180, aClass=..., aIID=..., 
    result=0x7f879d91e7b0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/nsComponentManager.cpp:2049
#7  0x00007f879678008e in CallGetService<nsIEventQueueService> (aClass=..., aDestination=0x7f879d91e7b0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/nsIServiceManagerUtils.h:160
#8  0x00007f879677f13b in NS_GetEventQueueService (result=0x7f879d91e7b0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/threads/nsEventQueueUtils.h:49
#9  0x00007f879677f267 in NS_GetMainEventQ (result=0x7f879d91e800, serv=0x0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/threads/nsEventQueueUtils.h:71
#10 0x00007f879677c64e in PostEventToMainThread (ev=0x7f8790001840)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcdclient.cpp:521
#11 0x00007f879677e883 in IPC_OnMessageAvailable (msg=0x7f87900013b0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcdclient.cpp:1453
#12 0x00007f8796797c2a in ConnRead (s=0x1292840)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcConnectionUnix.cpp:257
#13 0x00007f8796797f4d in ConnThread (arg=0x1292840)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcConnectionUnix.cpp:361
#14 0x00007f879c9f1f0c in _pt_root (arg=0x1292b50)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptthread.c:224
#15 0x00007f879c9f2061 in _pt_iprt_root (Thread=0x1292cc0, pvUser=0x1292b50)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptthread.c:272
#16 0x00007f879ce31a7a in rtThreadMain (pThread=0x1292cc0, NativeThread=140220440901376, 
    pszThreadName=0x12935a0 "nspr-1")
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/common/misc/thread.cpp:712
#17 0x00007f879cf06177 in rtThreadNativeMain (pvArgs=0x1292cc0)
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/r3/posix/thread-posix.cpp:324
#18 0x00007f879d247e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007f879c11331d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#20 0x0000000000000000 in ?? ()


Thread 1 (Thread 0x7f879d921740 (LWP 8332)):
#0  0x00007f879d24bd84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f879cf0e020 in rtSemEventMultiPosixWaitIndefinite (pThis=0x1293690, fFlags=80, pSrcPos=0x7fffaf187c60)
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/r3/linux/../posix/semeventmulti-posix.cpp:432
#2  0x00007f879cf0e84c in rtSemEventMultiPosixWait (hEventMultiSem=0x1293690, fFlags=80, uTimeout=0, 
    pSrcPos=0x7fffaf187c60)
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/r3/linux/../posix/semeventmulti-posix.cpp:605
#3  0x00007f879cf0e8d6 in RTSemEventMultiWaitEx (hEventMultiSem=0x1293690, fFlags=80, uTimeout=0)
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/r3/linux/../posix/semeventmulti-posix.cpp:617
#4  0x00007f879cee9b67 in RTSemEventMultiWait (hEventMultiSem=0x1293690, cMillies=4294967295)
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/generic/RTSemEventMultiWait-2-ex-generic.cpp:44
#5  0x00007f879ce32520 in rtThreadWait (Thread=0x1292cc0, cMillies=4294967295, prc=0x7fffaf187d88, fAutoResume=true)
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/common/misc/thread.cpp:1164
#6  0x00007f879ce3265a in RTThreadWait (Thread=0x1292cc0, cMillies=4294967295, prc=0x7fffaf187d88)
    at /home/user/VirtualBox-4.3.18/src/VBox/Runtime/common/misc/thread.cpp:1205
#7  0x00007f879c9f26fa in VBoxNsprPR_JoinThread (thred=0x1292b50)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptthread.c:675
#8  0x00007f879679843a in IPC_Disconnect ()
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcConnectionUnix.cpp:534
#9  0x00007f879677d25d in IPC_Shutdown ()
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcdclient.cpp:878
#10 0x00007f8796782b32 in ipcdclient_shutdown (module=0x128fcc0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcModuleFactory.cpp:187
#11 0x00007f879ca26038 in nsGenericModule::Shutdown (this=0x128fcc0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/glue/nsGenericFactory.cpp:344
#12 0x00007f879ca25a86 in nsGenericModule::~nsGenericModule (this=0x128fcc0, __in_chrg=<optimized out>)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/glue/nsGenericFactory.cpp:241
#13 0x00007f879ca25b9b in nsGenericModule::Release (this=0x128fcc0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/glue/nsGenericFactory.cpp:249
#14 0x00007f879c9b9a23 in nsDll::Shutdown (this=0x128f210)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/xcDll.cpp:392
#15 0x00007f879c9bb03a in nsFreeLibrary (dll=0x128f210, serviceMgr=0x0, when=3)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/nsNativeComponentLoader.cpp:296
#16 0x00007f879c9bb2c7 in nsFreeLibraryEnum (aKey=0x128f260, aData=0x128f210, closure=0x7fffaf1880b0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/nsNativeComponentLoader.cpp:369
#17 0x00007f879c9806a7 in hashEnumerate (table=0x127ba30, hdr=0x127ca60, i=2, arg=0x7fffaf188070)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/ds/nsHashtable.cpp:131
#18 0x00007f879c98a765 in VBoxNsplPL_DHashTableEnumerate (table=0x127ba30, 
    etor=0x7f879c980658 <hashEnumerate(PLDHashTable*, PLDHashEntryHdr*, PRUint32, void*)>, arg=0x7fffaf188070)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/ds/pldhash.c:675
#19 0x00007f879c980d9a in nsHashtable::Enumerate (this=0x127ba20, 
    aEnumFunc=0x7f879c9bb26a <nsFreeLibraryEnum(nsHashKey*, void*, void*)>, aClosure=0x7fffaf1880b0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/ds/nsHashtable.cpp:319
#20 0x00007f879c9bd5ca in nsNativeComponentLoader::UnloadAll (this=0x127b9b0, aWhen=3)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/nsNativeComponentLoader.cpp:1035
#21 0x00007f879c9c7362 in nsComponentManagerImpl::UnloadLibraries (this=0x126f180, serviceMgr=0x0, aWhen=3)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/nsComponentManager.cpp:3091
#22 0x00007f879c9c10e9 in nsComponentManagerImpl::Shutdown (this=0x126f180)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/components/nsComponentManager.cpp:888
#23 0x00007f879ca2b449 in VBoxNsxpNS_ShutdownXPCOM (servMgr=0x0)
    at /home/user/VirtualBox-4.3.18/src/libs/xpcom18a4/xpcom/build/nsXPComInit.cpp:933
#24 0x00000000004bf0ca in com::Shutdown ()
    at /home/user/VirtualBox-4.3.18/src/VBox/Main/glue/initterm.cpp:572
#25 0x0000000000407618 in main (argc=4, argv=0x125b560)
    at /home/user/VirtualBox-4.3.18/src/VBox/Frontends/VBoxManage/VBoxManage.cpp:593

We have two threads. Thread 1 - trying to shutdown XPCOM (UnloadLibraries and call IPC_Disconnect) Thread 2 - this is the main thread (ConnThread), which wait some data and when need read/write some data

Basically the problem is that "gXPCOMShuttingDown and nsAutoMonitor mon(mMon);" it's not atomic operation inside nsComponentManagerImpl::GetService (nsComponentManager.cpp:866) and possible situation when thread check аlag "gXPCOMShuttingDown" and it's not true yet, after it we have context change and settings it true inside nsXPComInit.cpp:888

Thread action sequence which cause deadlock:

Thread		Action
-----------------------------
2		Want to read data by ConnRead 
2		Call GetService at nsComponentManager.cpp:866
2		Check the gXPCOMShuttingDown flag inside GetService at nsComponentManager.cpp:866
1		Set gXPCOMShuttingDown to TRUE at nsXPComInit.cpp:888
1		Call UnloadLibraries at nsComponentManager.cpp:888
1		Inside of UnloadLibraries acquire lock monitor (mMon) at nsComponentManager.cpp:3074
1		For shutdown reason - wait ending thread 2 to finish main loop by "PR_JoinThread(gConnThread)" at ipcConnectionUnix.cpp:534
2		But here is! Wait the lock mMon can be acquired at nsComponentManager.cpp:2049. So we can't finish our thread before get same lock

Here I found new XPCOM source code with some logic changes: https://hg.mozilla.org/mozilla-central/file/77f3ca1fe052/xpcom/build/XPCOMInit.cpp Probably they fix this bug. But you using another old version.

I don't include VBox.log because it's not necessary with this situation

Change History (10)

comment:1 by gim, 10 years ago

Probably same bugs: https://www.virtualbox.org/ticket/12428, https://www.virtualbox.org/ticket/10264

Tomorrow I can post our solution...

comment:2 by gim, 10 years ago

So, our fix doesn't work well, because we don't know all details of internal architecture XPCOM. And all XPCOM code looks like govnokod =(

Waiting for your solution...

comment:3 by gim, 9 years ago

Also I found git repository synchronized with old CVS Mozilla's repository. XPCOM and IPC source loks like almost compatible with your source. I think it's a good idea try to merge your source with this source (https://github.com/ehsan/mozilla-cvs-history/tree/master/xpcom) to fix a lot of bugs in XPCOM code.

comment:4 by Frank Mehnert, 9 years ago

I fear that's almost impossible as we fixed a lot of bugs in the meantime making the code drift away.

comment:5 by Klaus Espenlaub, 9 years ago

Exactly, the state of the XPCOM code in VirtualBox (in particular the IPC variant which the Mozilla project has ditched completely years ago) is far more stable and functional as the original sources ever were. Most of the changes in the git repo you're pointing to are totally irrelevant for what we use. There's actually rather little chance that the fix for the problem you're seeing hiding in this repository, because the Mozilla project never truly used XPCOM for the kind of cross-process communication we're using it exclusively for. This doesn't encourage spending the necessary resources for digging through the approximately 1300 changes since we forked off our copy of XPCOM in April 2005.

We'll see how fast we can free up some dev time to look into this... what we see extremely rarely are crashes in this code area due to shutdown object destruction races, and maybe these share the root cause, just need a different timing than your use case triggers.

comment:6 by gim, 9 years ago

actually rather little chance that the fix for the problem you're seeing hiding in this repository

In spite of this, for example they have some new shutdown actions in NS_ShutdownXPCOM routine: http://i58.tinypic.com/23i7c4m.jpg

Problem that is described in bug happens due to the fact that some thread still working, so probably this new stages can help:

// - Notify "xpcom-shutdown" for modules to release primary (root) references
// - Notify "xpcom-shutdown-threads" for thread joins
// - Shutdown the event queues


We'll see how fast we can free up some dev time to look into this...

That would be very nice, otherwise it makes VirtualBox not suitable for use in a multi-threaded-vm linux managment environment =(

comment:7 by gim, 9 years ago

That's it?

comment:8 by Klaus Espenlaub, 9 years ago

No, that's not it, but it meant that other issues were keeping us busy. Dev time is always a very scarce resource when working on new features, ironing out bugs reported by customers and users and so on...

comment:9 by Klaus Espenlaub, 9 years ago

Actually I forgot to ask - does the problem still happen with 4.3.22? There's a fix included which helps with other deadlocks at termination (https://www.virtualbox.org/changeset/53891/vbox).

comment:10 by Frank Mehnert, 8 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use