VirtualBox

Ticket #15568 (closed defect: fixed)

Opened 11 months ago

Last modified 10 months ago

VBoxHeadless crash ipcDConnectService Shutdown

Reported by: gim Owned by:
Priority: major Component: other
Version: VirtualBox 5.0.20 Keywords:
Cc: Guest type: other
Host type: Linux

Description

Some times crash happens:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
66      ../nptl/pthread_mutex_lock.c: No such file or directory.

(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
#1  0x00007fbdcfdce1d0 in VBoxNsprPR_Lock (lock=0x0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptsynch.c:193
#2  0x00007fbdcfd8296c in nsExceptionService::DropThread (thread=0x7fbcf40008c0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/base/nsExceptionService.cpp:368
#3  0x00007fbdcfd82995 in nsExceptionService::ThreadDestruct (data=<optimized out>) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/base/nsExceptionService.cpp:221
#4  0x00007fbdcfdd6db5 in _PR_DestroyThreadPrivate (self=self@entry=0x7fbcf4000910) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/threads/prtpd.c:265
#5  0x00007fbdcfdcd2d2 in _pt_thread_death (arg=0x7fbcf4000910) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptthread.c:882
#6  0x00007fbdd04a3f82 in __nptl_deallocate_tsd () at pthread_create.c:158
#7  0x00007fbdd04a4197 in start_thread (arg=0x7fbdd0868700) at pthread_create.c:325
#8  0x00007fbdcf53737d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) list
188
189     PR_IMPLEMENT(void) PR_Lock(PRLock *lock)
190     {
191         PRIntn rv;
192         PR_ASSERT(lock != NULL);
193         rv = pthread_mutex_lock(&lock->mutex);
194         PR_ASSERT(0 == rv);
195         PR_ASSERT(0 == lock->notified.length);
196         PR_ASSERT(NULL == lock->notified.link);
197         PR_ASSERT(PR_FALSE == lock->locked);

(gdb) print lock
$1 = (PRLock *) 0x0

(gdb) thread apply all bt

Thread 4 (Thread 0x7fbdc97ea700 (LWP 28972)):
#0  0x00007fbdcf52ec53 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fbdc234c2fc in ?? () from /usr/lib/x86_64-linux-gnu/libvncserver.so.0
#2  0x00007fbdd04a4184 in start_thread (arg=0x7fbdc97ea700) at pthread_create.c:312
#3  0x00007fbdcf53737d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7fbdd08aa700 (LWP 28922)):
#0  0x00007fbdcf529fdd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fbdcfdd041b in poll (__timeout=-1, __nfds=2, __fds=0x7fbdd08a9b90) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  _pr_poll_with_poll (pds=0xc37ea8, npds=2, timeout=4294967295) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptio.c:3906
#3  0x00007fbdcfdd2064 in VBoxNsprPR_Poll (pds=<optimized out>, npds=<optimized out>, timeout=<optimized out>)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptio.c:4308
#4  0x00007fbdc8de17df in ConnThread (arg=0xc37ea0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcConnectionUnix.cpp:336
#5  0x00007fbdcfdcd19b in _pt_root (arg=0xc381b0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptthread.c:224
#6  _pt_iprt_root (Thread=<optimized out>, pvUser=0xc381b0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptthread.c:272
#7  0x00007fbdd013f644 in rtThreadMain (pThread=0xc38320, NativeThread=140453224294144, pszThreadName=<optimized out>)
    at /.../VirtualBox-5.0.20/src/VBox/Runtime/common/misc/thread.cpp:717
#8  0x00007fbdd01beb4c in rtThreadNativeMain (pvArgs=0xc38320) at /.../VirtualBox-5.0.20/src/VBox/Runtime/r3/posix/thread-posix.cpp:324
#9  0x00007fbdd04a4184 in start_thread (arg=0x7fbdd08aa700) at pthread_create.c:312
#10 0x00007fbdcf53737d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7fbdd08ac780 (LWP 28908)):
#0  0x00007fbdd04abb9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fbdd01bf0ad in RTThreadSleep (cMillies=10) at /.../VirtualBox-5.0.20/src/VBox/Runtime/r3/posix/thread2-posix.cpp:82
#2  0x00007fbdc8dda1a9 in ipcDConnectService::Shutdown (this=this@entry=0xc39000) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/ipc/ipcd/extensions/dconnect/src/ipcDConnectService.cpp:3236
#3  0x00007fbdc8ddd4b6 in ipcDConnectService::OnClientStateChange (this=0xc39000, aClientID=4294967295, aClientState=<optimized out>)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/ipc/ipcd/extensions/dconnect/src/ipcDConnectService.cpp:3646
#4  0x00007fbdc8dd58b1 in HandleEvent (ev=0xc2ea50) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcdclient.cpp:553
#5  IPC_Shutdown () at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcdclient.cpp:875
#6  0x00007fbdc8dd7e3e in ipcdclient_shutdown (module=<optimized out>) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/ipc/ipcd/client/src/ipcModuleFactory.cpp:187
#7  0x00007fbdcfdeb774 in nsGenericModule::Shutdown (this=this@entry=0xc35460) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/glue/nsGenericFactory.cpp:344
#8  0x00007fbdcfdeb78d in nsGenericModule::~nsGenericModule (this=this@entry=0xc35460, __in_chrg=<optimized out>)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/glue/nsGenericFactory.cpp:241
#9  0x00007fbdcfdeb7be in nsGenericModule::Release (this=0xc35460) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/glue/nsGenericFactory.cpp:249
#10 0x00007fbdcfdafb76 in nsDll::Shutdown (this=this@entry=0xc349f0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/components/xcDll.cpp:392
#11 0x00007fbdcfdb099f in nsFreeLibrary (dll=0xc349f0, serviceMgr=<optimized out>, when=3) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/components/nsNativeComponentLoader.cpp:296
#12 0x00007fbdcfdb0a09 in nsFreeLibraryEnum (aKey=<optimized out>, aData=<optimized out>, closure=<optimized out>)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/components/nsNativeComponentLoader.cpp:369
#13 0x00007fbdcfd9414b in hashEnumerate (table=table@entry=0xc217c8, hdr=hdr@entry=0xc22800, i=i@entry=2, arg=arg@entry=0x7ffdf4e1ccc0)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/ds/nsHashtable.cpp:131
#14 0x00007fbdcfd98f7d in VBoxNsplPL_DHashTableEnumerate (table=0xc217c8, etor=0x7fbdcfd94136 <hashEnumerate(PLDHashTable*, PLDHashEntryHdr*, PRUint32, void*)>, arg=0x7ffdf4e1ccc0)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/ds/pldhash.c:675
#15 0x00007fbdcfd94796 in nsHashtable::Enumerate (this=0xc217b8, aEnumFunc=aEnumFunc@entry=0x7fbdcfdb09e3 <nsFreeLibraryEnum(nsHashKey*, void*, void*)>, aClosure=aClosure@entry=0x7ffdf4e1ccf0)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/ds/nsHashtable.cpp:319
#16 0x00007fbdcfdb0a37 in nsNativeComponentLoader::UnloadAll (this=<optimized out>, aWhen=<optimized out>)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/components/nsNativeComponentLoader.cpp:1035
#17 0x00007fbdcfdb71a1 in nsComponentManagerImpl::UnloadLibraries (this=this@entry=0xc14ed0, serviceMgr=serviceMgr@entry=0x0, aWhen=aWhen@entry=3)
    at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/components/nsComponentManager.cpp:3091
#18 0x00007fbdcfdb7298 in nsComponentManagerImpl::Shutdown (this=0xc14ed0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/components/nsComponentManager.cpp:888
#19 0x00007fbdcfdedf11 in VBoxNsxpNS_ShutdownXPCOM (servMgr=<optimized out>) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/build/nsXPComInit.cpp:933
#20 0x000000000040ee44 in com::Shutdown () at /.../VirtualBox-5.0.20/src/VBox/Main/glue/initterm.cpp:571
#21 0x0000000000404e52 in TrustedMain (argc=argc@entry=7, argv=<optimized out>, envp=envp@entry=0x7ffdf4e1f4a8)
    at /.../VirtualBox-5.0.20/src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp:1340
#22 0x0000000000403419 in main (argc=7, argv=0xbf19f0, envp=0x7ffdf4e1f4a8) at /.../VirtualBox-5.0.20/src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp:1372

Thread 1 (Thread 0x7fbdd0868700 (LWP 28924)):
#0  __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
#1  0x00007fbdcfdce1d0 in VBoxNsprPR_Lock (lock=0x0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptsynch.c:193
#2  0x00007fbdcfd8296c in nsExceptionService::DropThread (thread=0x7fbcf40008c0) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/base/nsExceptionService.cpp:368
#3  0x00007fbdcfd82995 in nsExceptionService::ThreadDestruct (data=<optimized out>) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/base/nsExceptionService.cpp:221
#4  0x00007fbdcfdd6db5 in _PR_DestroyThreadPrivate (self=self@entry=0x7fbcf4000910) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/threads/prtpd.c:265
#5  0x00007fbdcfdcd2d2 in _pt_thread_death (arg=0x7fbcf4000910) at /.../VirtualBox-5.0.20/src/libs/xpcom18a4/nsprpub/pr/src/pthreads/ptthread.c:882
#6  0x00007fbdd04a3f82 in __nptl_deallocate_tsd () at pthread_create.c:158
#7  0x00007fbdd04a4197 in start_thread (arg=0x7fbdd0868700) at pthread_create.c:325
#8  0x00007fbdcf53737d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Attachments

exceptionservice.diff Download (2.9 KB) - added by klaus 11 months ago.

Change History

comment:1 Changed 11 months ago by gim

So it's race condition on XPCOM (Thread 2) and nsExceptionService (Thread 1) shutdown actions.

The problem is

VBoxNsxpNS_ShutdownXPCOM notify observers about shutdown, one of observers is nsExceptionService (nsExceptionService::Observe) that calls nsExceptionService::Shutdown() which destoying lock:

if (lock) {
    DropAllThreads();
    PR_DestroyLock(lock);
    lock = nsnull;
  }

On other side thread desctuction happens via nsExceptionService::ThreadDestruct which checks that lock stil alive "if (!lock)" and trying call nsExceptionService::DropThread which use this lock:

void nsExceptionService::ThreadDestruct( void *data )
{
  if (!lock) {
    // a typical situation is when IPC worker threads that have instantiated
    // exception managers are stopped after nsExceptionService is shut down,
    // which will result into this warning.  disable it.
#if 0    
    NS_WARNING("nsExceptionService ignoring thread destruction after shutdown");
#endif    
    return;
  }
  DropThread( (nsExceptionManager *)data );
}

void nsExceptionService::DropThread(nsExceptionManager *thread)
{
    PR_Lock(lock);
    DoDropThread(thread);
    PR_Unlock(lock);
}

(!!!) But if lock will be destroyed after checking "if (!lock)" and before using it inside DropThread, crash will happen.

Last edited 11 months ago by gim (previous) (diff)

comment:2 Changed 11 months ago by frank

Can you provide a reproduction scenario?

comment:3 Changed 11 months ago by gim

It's not so simple to reproduce this bug. System must working with high load multi threaded environment with n-days. I will try reproduce once more again with logs. But probably more productive way is to understand what happend by looking backtrace in crash dump... How possible that "lock = nsnull" was executed before DropThread was called.

comment:4 Changed 11 months ago by gim

Is it possible to enable some logging at VirtualBox-5.0.20/src/libs/xpcom18a4/xpcom/base/* files?

comment:5 Changed 11 months ago by frank

I don't see any logging statements in nsExceptionService.cpp. Of course you could add logging yourself (recompilation required). I'm asking for the scenario to have a test case which we could use to verify if a fix would work. It's easier to work on a fix having a reproduction scenario ready.

comment:6 Changed 11 months ago by klaus

Looks to me like yet another programming goof in XPCOM. Whoever wrote the comment in nsExceptionService::ThreadDestruct should have realized that he bumped into a serious issue with multithreading, and not just a harmless issue with an annoying warning. The bug was present before (checking a pointer to be non-NULL is nice, but it can be set to NULL a clock cycle later, leading to the crashes). Silencing the code which warns about screwup is not clever. The entire code structure is pushing things, because no one waits for thread termination (probably out of fear to cause hangs, replacing that danger with sporadic crashes).

Yes, a clear race, with most likely a pretty small window and needing very specific thread scheduling.

I'm currently thinking about postponing the lock destruction to the destructor, but I have no proof that this is guaranteed to fix the issue (it likely is, because nsExceptionService is a regular XPCOM object, i.e. has refcounting).

comment:7 Changed 11 months ago by gim

frank, I was trying write some code with more simple logic to reproduce this bug and give you this code, but still no luck to catch this bug with simple code.

klaus, If you can provide some patch I can recompile and test in my high load environment and will let you know if everything works fine.

comment:8 Changed 11 months ago by klaus

Pretty certain that this crash is rather difficult to provoke. Otherwise this bug wouldn't have survived for well over 10 years.

The fix attempt is in VirtualBox 5.1.0, but I wouldn't throw it immediately at a production setup.

The last days of source code changes didn't make it to the public repository yet, otherwise I would point you at the corresponding changeset. Will attach the patch.

Last edited 11 months ago by klaus (previous) (diff)

Changed 11 months ago by klaus

comment:9 Changed 11 months ago by klaus

Sigh... looks like we crippled trac a bit too much, removing the easy way to download attachments. Append ?format=raw to the URL.

comment:10 Changed 10 months ago by klaus

The public repo is up to date. The changeset is r62170.

comment:11 Changed 10 months ago by gim

Great! I will try run for next weekend...

comment:12 Changed 10 months ago by gim

No crash anymore at this place for 3-4 days. Probably fixed. But new crash happens (see https://www.virtualbox.org/ticket/15652)

comment:13 Changed 10 months ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fix is part of 5.1.2.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use