VirtualBox

Opened 13 years ago

Closed 13 years ago

#9305 closed defect (fixed)

VBox modules randomly cause kernel panic on computer shutdown -> fixed as of 28-Jul 2011

Reported by: Artem S. Tashkinov Owned by:
Component: other Version: VirtualBox 4.1.0
Keywords: Cc:
Guest type: other Host type: Linux

Description

!!Assertion Failed!! Expression idCpu == RTMpCpuId() Location : /tmp/vbox.0/r0drv/linux/mpnotification-r0drv-linux.c(85) rtMpNotificationLinuxOnCurrentCpu int3: 0000 #1 PREEMPT SMP

The rest of it is in the attached screenshot.

Attachments (6)

rec.jpeg (248.0 KB ) - added by Artem S. Tashkinov 13 years ago.
A kernel panic screenshot
patched.jpeg (215.2 KB ) - added by Artem S. Tashkinov 13 years ago.
Panic with patched sources
.config (63.7 KB ) - added by Artem S. Tashkinov 13 years ago.
My 3.0 .config uration
vboxdrv.ko.gz (93.9 KB ) - added by Artem S. Tashkinov 13 years ago.
vboxdrv.ko compiled with vanilla GCC 4.5.3
vboxdrv.ko_gcc4.6.1.tar.gz (98.1 KB ) - added by Ionut Biru 13 years ago.
vboxdrv.ko compiled with gcc 4.6.1
savetemps.7z (359.7 KB ) - added by Artem S. Tashkinov 13 years ago.
GCC's preprocessed and assembler output

Download all attachments as: .zip

Change History (31)

by Artem S. Tashkinov, 13 years ago

Attachment: rec.jpeg added

A kernel panic screenshot

comment:1 by Artem S. Tashkinov, 13 years ago

!!Assertion Failed!!
Expression idCpu == RTMpCpuId()
Location   :  /tmp/vbox.0/r0drv/linux/mpnotification-r0drv-linux.c(85) rtMpNotificationLinuxOnCurrentCpu
int3: 0000 [#1] PREEMPT SMP

The rest of it is in the attached screenshot.

I'm running Linux 3.0 i686 vanilla kernel. I observed the same problems on Linux kernel 2.6.39. I don't remember experiencing such problems with VirtualBox 4.0.x, so this issue is probably new to VirtualBox 4.1.x.

comment:2 by Artem S. Tashkinov, 13 years ago

One thing I've forgotten to mention - it's the host OS which panics.

comment:3 by Felix Möller, 13 years ago

Is this maybe related to #9282?

in reply to:  3 comment:4 by Artem S. Tashkinov, 13 years ago

Replying to fm:

Is this maybe related to #9282?

I'm not sure they are related since the existing bug reports don't have a kernel backtrace attached - so it's really hard to judge.

comment:5 by Artem S. Tashkinov, 13 years ago

If anyone has the same problem, here's a temporary solution (until VBox developerss identify and solve this issue). Put these lines into your halt/shutdown script just before a halt invocation:

rmmod `lsmod | grep ^vb | awk '{print $1}'` &> /dev/null
rmmod `lsmod | grep ^vb | awk '{print $1}'` &> /dev/null

comment:6 by Artem S. Tashkinov, 13 years ago

It's most likely a dupe of bug #9253 - but at least my bug report contains full panic information (I run framebuffer at 1600x1200).

comment:7 by Ramshankar Venkataraman, 13 years ago

Many thanks for giving us the actual assertion. It seems our notification callback is not firing on the CPU we expect it to fire on. It works fine on my x64 2.6.38-8-generic kernel but I still can't find anything in our sources that restricts this to 32-bit only. Maybe 64-bit dual-core setups are just lucky to not hit the problem.

We noticed a slight difference in the linux sources in smp_processor_id() between 32 and 64-bit, but nothing really concrete to identify the real cause.

@birdie / anyone who can see the Assertion before the trace:

Could you try patching the sources and trying again to trigger the assertion? It would be good if we can get more information out of it.

Index: src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c
===================================================================
--- src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73165)
+++ src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73166)
@@ -32,6 +32,7 @@
 #include "internal/iprt.h"
 
 #include <iprt/mp.h>
+#include <iprt/asm-amd64-x86.h>
 #include <iprt/err.h>
 #include <iprt/cpuset.h>
 #include <iprt/thread.h>
@@ -82,7 +83,8 @@
     NOREF(pvUser1);
 
     AssertRelease(!RTThreadPreemptIsEnabled(NIL_RTTHREAD));
-    AssertRelease(idCpu == RTMpCpuId());   /* ASSUMES iCpu == RTCPUID */
+    AssertReleaseMsg(idCpu == RTMpCpuId(),  /* ASSUMES iCpu == RTCPUID */
+                     ("idCpu=%u RTMpCpuId=%d ApicId=%d\n", idCpu, RTMpCpuId(), ASMGetApicId() ));
 
     switch (ulNativeEvent)
     {

in reply to:  7 comment:8 by Artem S. Tashkinov, 13 years ago

Replying to ramshankar:

I've applied the patch and I will post the results as soon as I hit this problem again.

by Artem S. Tashkinov, 13 years ago

Attachment: patched.jpeg added

Panic with patched sources

by Artem S. Tashkinov, 13 years ago

Attachment: .config added

My 3.0 .config uration

comment:9 by Artem S. Tashkinov, 13 years ago

In fact the host crashes every time if I ran any VM - so it must be easily reproducible.

I have a quad core CPU, 4GB of RAM and I run PAE enabled kernel in x86 mode.

in reply to:  9 ; comment:10 by Ramshankar Venkataraman, 13 years ago

Replying to birdie:

In fact the host crashes every time if I ran any VM - so it must be easily reproducible.

I have a quad core CPU, 4GB of RAM and I run PAE enabled kernel in x86 mode.

Thanks for the revised assertion!

Could you provide us with the gcc version you're using to compile the vboxdrv sources as well as provide us the the vboxdrv.ko binary compiled with it?

Our linux expert suggests this is a calling convention bug, so the gcc version and the vboxdrv.ko binary would help us in solving this issue. This also would explain why it only happens on 32-bit.

Please compress the binary before uploading (.zip or .tar.gz)

comment:11 by Felix Möller, 13 years ago

I have reported this isse in #9282

fm@thinkpad:~ $ LANG=C gcc --version
gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
fm@thinkpad:~ $ uname -a
Linux thinkpad 2.6.38.8-35.fc15.i686.PAE #1 SMP Wed Jul 6 14:29:06 UTC 2011 i686 i686 i386 GNU/Linux

by Artem S. Tashkinov, 13 years ago

Attachment: vboxdrv.ko.gz added

vboxdrv.ko compiled with vanilla GCC 4.5.3

in reply to:  10 comment:12 by Artem S. Tashkinov, 13 years ago

Replying to ramshankar:

Could you provide us with the gcc version you're using to compile the vboxdrv sources as well as provide us the the vboxdrv.ko binary compiled with it?

GCC 4.5.3 vanilla, i.e. with no patches applied ( ftp://gcc.gnu.org/pub/gcc/releases/gcc-4.5.3/gcc-4.5.3.tar.bz2 ):

$ gcc -v 
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/gcc4/bin/../libexec/gcc/i686-pc-linux-gnu/4.5.3/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: /usr/src/gcc-4.5.3/configure --enable-shared --enable-threads=posix --disable-stage1-checking --with-system-zlib --enable-__cxa_atexit --enable-multilib --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --without-x --prefix=/opt/gcc4 --disable-libunwind-exceptions --with-gmp=/usr
Thread model: posix
gcc version 4.5.3 (GCC)


Our linux expert suggests this is a calling convention bug, so the gcc version and the vboxdrv.ko binary would help us in solving this issue. This also would explain why it only happens on 32-bit.

Please compress the binary before uploading (.zip or .tar.gz)

I have attached the required module.

comment:13 by Artem S. Tashkinov, 13 years ago

GCC locally uses these flags during compilation:

-DKERNEL -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=core2 -maccumulate-outgoing-args -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=1024 -fno-stack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO

comment:14 by Ionut Biru, 13 years ago

Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-pc-linux-gnu/4.6.1/lto-wrapper Target: i686-pc-linux-gnu Configured with: /build/src/gcc-4.6.1/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --enable-gnu-unique-object --enable-linker-build-id --with-ppl --enable-cloog-backend=isl --enable-lto --enable-gold --enable-ld=default --enable-plugin --with-plugin-ld=ld.gold --disable-multilib --disable-libstdcxx-pch --enable-checking=release Thread model: posix gcc version 4.6.1 (GCC)

by Ionut Biru, 13 years ago

Attachment: vboxdrv.ko_gcc4.6.1.tar.gz added

vboxdrv.ko compiled with gcc 4.6.1

by Artem S. Tashkinov, 13 years ago

Attachment: savetemps.7z added

GCC's preprocessed and assembler output

comment:15 by Michael Thayer, 13 years ago

We hope that the following patch may fix this issue, if anyone would like to give it a shot:

--- src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73209)
+++ src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73210)
@@ -77,7 +77,7 @@
  * @param pvUser2           The notification event.
  * @remarks This can be invoked in interrupt context.
  */
-static void rtMpNotificationLinuxOnCurrentCpu(RTCPUID idCpu, void *pvUser1, void *pvUser2)
+static DECLCALLBACK(void) rtMpNotificationLinuxOnCurrentCpu(RTCPUID idCpu, void *pvUser1, void *pvUser2)
 {
     unsigned long ulNativeEvent = *(unsigned long *)pvUser2;
     NOREF(pvUser1);

in reply to:  15 comment:16 by Ionut Biru, 13 years ago

Replying to michael:

We hope that the following patch may fix this issue, if anyone would like to give it a shot:

patch works for me

comment:17 by Peter Pletcher, 13 years ago

michael, the DECLCALLBACK patch works for me as well.

Before patch, system would always panic on suspend when vboxdrv module loaded.

Using Fedora 15, kernel 2.6.38.8-35.fc15.i686.PAE, gcc-4.6.0, VirtualBox-4.1-4.1.0_73009_fedora15-1.i686 on a Lenovo T420s. Thanks!

in reply to:  15 comment:18 by Artem S. Tashkinov, 13 years ago

Replying to michael:

We hope that the following patch may fix this issue, if anyone would like to give it a shot:

This patch fixes the issue for me.

This bug report may now be closed as FIXED.

comment:19 by Eugene San, 13 years ago

Patch provided by michael also solves suspend/hibernate issues described in #9260.

Are there any plans for fixed packages?

comment:20 by Felix Möller, 13 years ago

#9260 , #9286 and #9282 should be marked as a duplicate.

comment:21 by Michael Thayer, 13 years ago

Summary: VBox modules randomly cause kernel panic on computer shutdownVBox modules randomly cause kernel panic on computer shutdown -> fixed as of 28-Jul 2011

The patch above was committed on 28 July and will be contained any future releases.

comment:22 by roxyland, 13 years ago

#9407 was marked a duplicate of this. But the symptoms described here are different ( happy to be corrected ) to that in #9407, which is about the host crashing when it's suspended. Shutdown goes through without any problem whatsoever. Regardless of whether a VM is running or not, the host crashes on suspend. Uninstall VirtualBox and suspend/resume work normally.

comment:23 by Frank Mehnert, 13 years ago

Did you try if the fix from above (2011-07-28 21:15:00 by michael) helps?

comment:24 by roxyland, 13 years ago

Sorry, yes the fix above - (2011-07-28 21:15:00 by michael) seems to have fixed the problem. Thanks !!

comment:25 by Frank Mehnert, 13 years ago

Resolution: fixed
Status: newclosed

This is fixed in VBox 4.1.2.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use