VirtualBox

Ticket #9305 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

VBox modules randomly cause kernel panic on computer shutdown -> fixed as of 28-Jul 2011

Reported by: birdie Owned by:
Priority: critical Component: other
Version: VirtualBox 4.1.0 Keywords:
Cc: Guest type: other
Host type: Linux

Description

!!Assertion Failed!! Expression idCpu == RTMpCpuId() Location : /tmp/vbox.0/r0drv/linux/mpnotification-r0drv-linux.c(85) rtMpNotificationLinuxOnCurrentCpu int3: 0000 #1 PREEMPT SMP

The rest of it is in the attached screenshot.

Attachments

rec.jpeg Download (248.0 KB) - added by birdie 3 years ago.
A kernel panic screenshot
patched.jpeg Download (215.2 KB) - added by birdie 3 years ago.
Panic with patched sources
.config Download (63.7 KB) - added by birdie 3 years ago.
My 3.0 .config uration
vboxdrv.ko.gz Download (93.9 KB) - added by birdie 3 years ago.
vboxdrv.ko compiled with vanilla GCC 4.5.3
vboxdrv.ko_gcc4.6.1.tar.gz Download (98.1 KB) - added by wonder 3 years ago.
vboxdrv.ko compiled with gcc 4.6.1
savetemps.7z Download (359.7 KB) - added by birdie 3 years ago.
GCC's preprocessed and assembler output

Change History

Changed 3 years ago by birdie

A kernel panic screenshot

comment:1 Changed 3 years ago by birdie

!!Assertion Failed!!
Expression idCpu == RTMpCpuId()
Location   :  /tmp/vbox.0/r0drv/linux/mpnotification-r0drv-linux.c(85) rtMpNotificationLinuxOnCurrentCpu
int3: 0000 [#1] PREEMPT SMP

The rest of it is in the attached screenshot.

I'm running Linux 3.0 i686 vanilla kernel. I observed the same problems on Linux kernel 2.6.39. I don't remember experiencing such problems with VirtualBox 4.0.x, so this issue is probably new to VirtualBox 4.1.x.

comment:2 Changed 3 years ago by birdie

One thing I've forgotten to mention - it's the host OS which panics.

comment:3 follow-up: ↓ 4 Changed 3 years ago by fm

Is this maybe related to #9282?

comment:4 in reply to: ↑ 3 Changed 3 years ago by birdie

Replying to fm:

Is this maybe related to #9282?

I'm not sure they are related since the existing bug reports don't have a kernel backtrace attached - so it's really hard to judge.

comment:5 Changed 3 years ago by birdie

If anyone has the same problem, here's a temporary solution (until VBox developerss identify and solve this issue). Put these lines into your halt/shutdown script just before a halt invocation:

rmmod `lsmod | grep ^vb | awk '{print $1}'` &> /dev/null
rmmod `lsmod | grep ^vb | awk '{print $1}'` &> /dev/null

comment:6 Changed 3 years ago by birdie

It's most likely a dupe of bug #9253 - but at least my bug report contains full panic information (I run framebuffer at 1600x1200).

comment:7 follow-up: ↓ 8 Changed 3 years ago by ramshankar

Many thanks for giving us the actual assertion. It seems our notification callback is not firing on the CPU we expect it to fire on. It works fine on my x64 2.6.38-8-generic kernel but I still can't find anything in our sources that restricts this to 32-bit only. Maybe 64-bit dual-core setups are just lucky to not hit the problem.

We noticed a slight difference in the linux sources in smp_processor_id() between 32 and 64-bit, but nothing really concrete to identify the real cause.

@birdie / anyone who can see the Assertion before the trace:

Could you try patching the sources and trying again to trigger the assertion? It would be good if we can get more information out of it.

Index: src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c
===================================================================
--- src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73165)
+++ src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73166)
@@ -32,6 +32,7 @@
 #include "internal/iprt.h"
 
 #include <iprt/mp.h>
+#include <iprt/asm-amd64-x86.h>
 #include <iprt/err.h>
 #include <iprt/cpuset.h>
 #include <iprt/thread.h>
@@ -82,7 +83,8 @@
     NOREF(pvUser1);
 
     AssertRelease(!RTThreadPreemptIsEnabled(NIL_RTTHREAD));
-    AssertRelease(idCpu == RTMpCpuId());   /* ASSUMES iCpu == RTCPUID */
+    AssertReleaseMsg(idCpu == RTMpCpuId(),  /* ASSUMES iCpu == RTCPUID */
+                     ("idCpu=%u RTMpCpuId=%d ApicId=%d\n", idCpu, RTMpCpuId(), ASMGetApicId() ));
 
     switch (ulNativeEvent)
     {

comment:8 in reply to: ↑ 7 Changed 3 years ago by birdie

Replying to ramshankar:

I've applied the patch and I will post the results as soon as I hit this problem again.

Changed 3 years ago by birdie

Panic with patched sources

Changed 3 years ago by birdie

My 3.0 .config uration

comment:9 follow-up: ↓ 10 Changed 3 years ago by birdie

In fact the host crashes every time if I ran any VM - so it must be easily reproducible.

I have a quad core CPU, 4GB of RAM and I run PAE enabled kernel in x86 mode.

comment:10 in reply to: ↑ 9 ; follow-up: ↓ 12 Changed 3 years ago by ramshankar

Replying to birdie:

In fact the host crashes every time if I ran any VM - so it must be easily reproducible.

I have a quad core CPU, 4GB of RAM and I run PAE enabled kernel in x86 mode.

Thanks for the revised assertion!

Could you provide us with the gcc version you're using to compile the vboxdrv sources as well as provide us the the vboxdrv.ko binary compiled with it?

Our linux expert suggests this is a calling convention bug, so the gcc version and the vboxdrv.ko binary would help us in solving this issue. This also would explain why it only happens on 32-bit.

Please compress the binary before uploading (.zip or .tar.gz)

comment:11 Changed 3 years ago by fm

I have reported this isse in #9282

fm@thinkpad:~ $ LANG=C gcc --version
gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
fm@thinkpad:~ $ uname -a
Linux thinkpad 2.6.38.8-35.fc15.i686.PAE #1 SMP Wed Jul 6 14:29:06 UTC 2011 i686 i686 i386 GNU/Linux

Changed 3 years ago by birdie

vboxdrv.ko compiled with vanilla GCC 4.5.3

comment:12 in reply to: ↑ 10 Changed 3 years ago by birdie

Replying to ramshankar:

Could you provide us with the gcc version you're using to compile the vboxdrv sources as well as provide us the the vboxdrv.ko binary compiled with it?

GCC 4.5.3 vanilla, i.e. with no patches applied (  ftp://gcc.gnu.org/pub/gcc/releases/gcc-4.5.3/gcc-4.5.3.tar.bz2 ):

$ gcc -v 
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/gcc4/bin/../libexec/gcc/i686-pc-linux-gnu/4.5.3/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: /usr/src/gcc-4.5.3/configure --enable-shared --enable-threads=posix --disable-stage1-checking --with-system-zlib --enable-__cxa_atexit --enable-multilib --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --without-x --prefix=/opt/gcc4 --disable-libunwind-exceptions --with-gmp=/usr
Thread model: posix
gcc version 4.5.3 (GCC)


Our linux expert suggests this is a calling convention bug, so the gcc version and the vboxdrv.ko binary would help us in solving this issue. This also would explain why it only happens on 32-bit.

Please compress the binary before uploading (.zip or .tar.gz)

I have attached the required module.

comment:13 Changed 3 years ago by birdie

GCC locally uses these flags during compilation:

-DKERNEL -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=core2 -maccumulate-outgoing-args -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=1024 -fno-stack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO

comment:14 Changed 3 years ago by wonder

Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-pc-linux-gnu/4.6.1/lto-wrapper Target: i686-pc-linux-gnu Configured with: /build/src/gcc-4.6.1/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl= https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --enable-gnu-unique-object --enable-linker-build-id --with-ppl --enable-cloog-backend=isl --enable-lto --enable-gold --enable-ld=default --enable-plugin --with-plugin-ld=ld.gold --disable-multilib --disable-libstdcxx-pch --enable-checking=release Thread model: posix gcc version 4.6.1 (GCC)

Changed 3 years ago by wonder

vboxdrv.ko compiled with gcc 4.6.1

Changed 3 years ago by birdie

GCC's preprocessed and assembler output

comment:15 follow-ups: ↓ 16 ↓ 18 Changed 3 years ago by michael

We hope that the following patch may fix this issue, if anyone would like to give it a shot:

--- src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73209)
+++ src/VBox/Runtime/r0drv/linux/mpnotification-r0drv-linux.c	(revision 73210)
@@ -77,7 +77,7 @@
  * @param pvUser2           The notification event.
  * @remarks This can be invoked in interrupt context.
  */
-static void rtMpNotificationLinuxOnCurrentCpu(RTCPUID idCpu, void *pvUser1, void *pvUser2)
+static DECLCALLBACK(void) rtMpNotificationLinuxOnCurrentCpu(RTCPUID idCpu, void *pvUser1, void *pvUser2)
 {
     unsigned long ulNativeEvent = *(unsigned long *)pvUser2;
     NOREF(pvUser1);

comment:16 in reply to: ↑ 15 Changed 3 years ago by wonder

Replying to michael:

We hope that the following patch may fix this issue, if anyone would like to give it a shot:

patch works for me

comment:17 Changed 3 years ago by peterp

michael, the DECLCALLBACK patch works for me as well.

Before patch, system would always panic on suspend when vboxdrv module loaded.

Using Fedora 15, kernel 2.6.38.8-35.fc15.i686.PAE, gcc-4.6.0, VirtualBox-4.1-4.1.0_73009_fedora15-1.i686 on a Lenovo T420s. Thanks!

comment:18 in reply to: ↑ 15 Changed 3 years ago by birdie

Replying to michael:

We hope that the following patch may fix this issue, if anyone would like to give it a shot:

This patch fixes the issue for me.

This bug report may now be closed as FIXED.

comment:19 Changed 3 years ago by eugenesan

Patch provided by michael also solves suspend/hibernate issues described in  #9260.

Are there any plans for fixed packages?

comment:20 Changed 3 years ago by fm

#9260 , #9286 and #9282 should be marked as a duplicate.

comment:21 Changed 3 years ago by michael

  • Summary changed from VBox modules randomly cause kernel panic on computer shutdown to VBox modules randomly cause kernel panic on computer shutdown -> fixed as of 28-Jul 2011

The patch above was committed on 28 July and will be contained any future releases.

comment:22 Changed 3 years ago by roxyland

#9407 was marked a duplicate of this. But the symptoms described here are different ( happy to be corrected ) to that in #9407, which is about the host crashing when it's suspended. Shutdown goes through without any problem whatsoever. Regardless of whether a VM is running or not, the host crashes on suspend. Uninstall VirtualBox and suspend/resume work normally.

comment:23 Changed 3 years ago by frank

Did you try if the fix from above (2011-07-28 21:15:00 by michael) helps?

comment:24 Changed 3 years ago by roxyland

Sorry, yes the fix above - (2011-07-28 21:15:00 by michael) seems to have fixed the problem. Thanks !!

comment:25 Changed 3 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

This is fixed in VBox 4.1.2.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use