VirtualBox

Opened 8 years ago

Last modified 7 years ago

#15090 new defect

Linux host running "perf top" kills Virtualbox vm's (VERR_PGM_INVALID_CR3_ADDR)

Reported by: dimaqq Owned by:
Component: other Version: VirtualBox 5.0.14
Keywords: Cc:
Guest type: all Host type: Linux

Description (last modified by Frank Mehnert)

If I run Linux tool "perf top" on host and Virtualbox VM at the same time, VM crashes. It doesn't matter which is started first.

Host versions tested:

  • Linux 4.3.3-3-ARCH #1 SMP PREEMPT x86_64
  • perf version 4.4.gafd2ff
  • VirtualBox 5.0.14_OSE r105127

Guest versions tested:

  • Linux 4.4.0-4-ARCH #1 SMP PREEMPT x86_64
  • Linux 2.6.18-164el5PAE #1 SMP i686
  • Windows10 64-bit modern.ie without guest additions

Host dmesg:

[  313.192046] capability: warning: `VirtualBox' uses 32-bit capabilities (legacy support in use)
[  371.880286] SUPR0GipMap: fGetGipCpu=0x3
[  372.327223] vboxdrv: ffffffffa08b4020 VMMR0.r0
[  372.522422] vboxdrv: ffffffffa09af020 VBoxDDR0.r0
[  372.529397] vboxdrv: ffffffffa00c6020 VBoxDD2R0.r0
[  372.617800] VBoxNetFlt: attached to 'eth0' / 74:d0:2b:95:1b:92
[  372.624009] device eth0 entered promiscuous mode
[  374.602722] HPET: Using timer above configured range: 2
[  374.603501] HPET: Using timer above configured range: 2
[  374.603507] HPET: Using timer above configured range: 2
[  374.603511] HPET: Using timer above configured range: 2
[  374.603515] HPET: Using timer above configured range: 2
[  374.603519] HPET: Using timer above configured range: 2
[  374.603523] HPET: Using timer above configured range: 2
[  374.603527] HPET: Using timer above configured range: 2
[  374.603531] HPET: Using timer above configured range: 2
[  374.603535] HPET: Using timer above configured range: 2
[  374.603539] HPET: Using timer above configured range: 2
[  374.603543] HPET: Using timer above configured range: 2
[  389.380249] device eth0 left promiscuous mode
[  390.099756] vboxnetflt: 0 out of 4 packets were not sent (directed to host)

Attaching VBox.log's in a second.

Attachments (4)

VBox-guest-linux-32.log (205.2 KB ) - added by dimaqq 8 years ago.
VBox-guest-windows-64.log (192.6 KB ) - added by dimaqq 8 years ago.
VBox-guest-linux-64.log (113.9 KB ) - added by dimaqq 8 years ago.
15090-Logs.tar.gz (138.3 KB ) - added by rklehm 8 years ago.
VBox.log; VBoxSVC.log

Download all attachments as: .zip

Change History (17)

by dimaqq, 8 years ago

Attachment: VBox-guest-linux-32.log added

by dimaqq, 8 years ago

Attachment: VBox-guest-windows-64.log added

by dimaqq, 8 years ago

Attachment: VBox-guest-linux-64.log added

comment:1 by Frank Mehnert, 8 years ago

Description: modified (diff)
Summary: Linux host running "perf top" kills Virtualbox vm'sLinux host running "perf top" kills Virtualbox vm's (VERR_PGM_INVALID_CR3_ADDR)

comment:2 by Frank Mehnert, 8 years ago

Doing a first quick test I'm not able to reproduce this problem. Do your guest VMs crash immediately when you run perf top on the host?

comment:3 by Frank Mehnert, 8 years ago

We did further tests and so far we are not able to reproduce this problem.

comment:4 by rklehm, 8 years ago

Similar issue here. Running several VMs in parallel they are all killed unmotivated. The issue is not always reproducible and seems to be related to special unknown network conditions. When occurred I find these log information:

Host syslog:

Apr 22 09:01:41 kernel: VBoxNetFlt: attached to 'br0' 
...
Apr 22 09:05:03 avahi-daemon: server.c: Packet too short or invalid while reading response record. (Maybe a UTF-8 problem?)
...
Apr 22 09:07:24 kernel: device br0 left promiscuous mode
Apr 22 09:07:24 kernel: vboxnetflt: 0 out of 4324 packets were not sent (directed to host)

This VM VBoxManage command seems to cause the crash:

Fr 22. Apr 09:03:28 CEST 2016
+ VBoxManage guestcontrol QA-k-W7 run --no-wait-stdout --no-wait-stderr --username xxx --password xxx-- 'C:\Windows\system32\cmd.exe' /c 'D: & cd D: & C:\xxx.exe '
VBoxManage: error: Code NS_ERROR_ABORT (0x80004004) - Operation aborted (extended info not available)
VBoxManage: error: Context: "WaitForArray(ComSafeArrayAsInParam(aWaitFlags), RT_MIN(500 , RT_MAX(cMsTimeLeft, 1 )), &waitResult)" at line 1549 of file VBoxManageGuestCtrl.cpp
VBoxManage: error: Code NS_ERROR_ABORT (0x80004004) - Operation aborted (extended info not available)
VBoxManage: error: Context: "Close()" at line 1067 of file VBoxManageGuestCtrl.cpp
+ exit code is 20 

This output is observable at two VMs running the command in simultanously.

Next VBoxManage commands are ending up with this output:

VBoxManage: error: Machine "QA-j-W7" is not running (currently aborted)!

Host

  • openSUSE 13.2
  • Linux 3.16.7-7-desktop #1 SMP PREEMPT Wed Dec 17 18:00:44 UTC 2014 (762f27a) x86_64 x86_64 x86_64 GNU/Linux
  • VirtualBox 5.0.8r103449 (issue observed also with VirtualBox-4.3-4.3.34_104062_openSUSE123-1.x86_64)
Last edited 8 years ago by Frank Mehnert (previous) (diff)

comment:5 by Frank Mehnert, 8 years ago

rklehm, please attach a VBox.log file of such a failed session.

by rklehm, 8 years ago

Attachment: 15090-Logs.tar.gz added

VBox.log; VBoxSVC.log

comment:6 by rklehm, 8 years ago

frank, sorry for the late answer. The failure does not occur often, but today again. I attached logs from two affected VMs. Additionally you find Logs from today 09:19 where the failure can not be reproduced for unknown reason.

As a try I already set the "promiscuous mode" option to "all VMs and host" at all VMs on this host since 2016-04-22. But as I have to accept, this does not help.

May be worth to know, that I run another VM at another user at the same host. Thus two VBoxSVCs are running here.

comment:7 by rklehm, 8 years ago

After repeated occurrences I bound all VMs to the second NIC. Some of them I found bound to br0.

Now the messages like "vboxnetflt: 0 out of xxx packets were not sent" are gone.

But last night several runs are aborted and I always see the following message with exit code 20.

VBoxManage: error: Code NS_ERROR_ABORT (0x80004004) - Operation aborted (extended info not available)
VBoxManage: error: Context: "WaitForArray(ComSafeArrayAsInParam(aWaitFlags), RT_MIN(500 , RT_MAX(cMsTimeLeft, 1 )), &waitResult)" at line 1549 of file VBoxManageGuestCtrl.cpp
VBoxManage: error: Code NS_ERROR_ABORT (0x80004004) - Operation aborted (extended info not available)
VBoxManage: error: Context: "Close()" at line 1067 of file VBoxManageGuestCtrl.cpp

All running VMs are then aborted.

However sometimes they are doing well again. Is there any idea how to get closer to this issue?

comment:8 by Frank Mehnert, 8 years ago

I did some more tests and it seems that a fairly new Linux kernel is required. I was not able to reproduce the problem with Ubuntu 15.10 / Linux 4.2 but I could reproduce it with Ubuntu 16.04 / Linux 4.4. FWIW, reproduced the problem on Intel Haswell, Sandy Bridge and Nehalem. This could have something to do with triggering NMI interrupts.

comment:9 by rklehm, 8 years ago

I found some NMI related messages at syslog. To avoid a hardware related issue I changed it and run the same software (comment 5) now on Intel Haswell-E DMI2, 32x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, MemTotal: 132185496 kB, Ethernet controller: Intel I350 Gigabit.

After some successful daily runs at night all runs failed again with messages like at comment 8.

comment:10 by Frank Mehnert, 8 years ago

We did more investigations. It seems that all Intel processors are affected. For me this happens only if I use Linux 4.4 and the corresponding Perf tool 4.4 (or later). For instance, if I use Linux 4.4 and Perf 4.3 I don't see this problem.

It looks like the Intel DS Area is responsible for this behaivor. This area is used to store information about code execution, for example branch recording. So we have an idea and working on a fix.

comment:11 by rklehm, 8 years ago

Just to let you know, I upgraded to VirtualBox-5.0-5.0.20_106931_openSUSE132-1.x86_64. The issue is still reproducible:

VBoxManage guestcontrol ... run --no-wait-stdout ...
VBoxManage: error: Code NS_ERROR_ABORT (0x80004004) - Operation aborted (extended info not available)
VBoxManage: error: Context: "WaitForArray(ComSafeArrayAsInParam(aWaitFlags), RT_MIN(500 , RT_MAX(cMsTimeLeft, 1 )), &waitResult)" at line 1551 of file VBoxManageGuestCtrl.cpp
VBoxManage: error: Code NS_ERROR_ABORT (0x80004004) - Operation aborted (extended info not available)
VBoxManage: error: Context: "Close()" at line 1069 of file VBoxManageGuestCtrl.cpp
exit code is 20 and in much cases it returns with 1

I wonder why this environment works unchanged at day time without issues, but every night from 11pm to 05am the runs are affected by this problem. Is there any information I have to consider additionally? Is there any workaround option?

comment:12 by coyotte, 8 years ago

I have the same issues

device enp1s0 left promiscuous mode
vboxnetflt: 81340 out of 252005 packets were not sent (directed to host)

At this time period, I'm doing VM Backup and running stop / clone / start

04:44:33 = start of a VM --> All other running VM Crashes

Host : Linux version 4.4.0-34-generic (buildd@lgw01-20) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1) ) #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016

VirtualBox : 5.0.24_Ubuntur108355

Aug 12 04:31:22 coyotte kernel: [125987.117236] intel_powerclamp: Start idle injection to reduce power
Aug 12 04:31:26 coyotte kernel: [125991.137399] intel_powerclamp: Stop forced idle injection
Aug 12 04:44:33 coyotte kernel: [126778.233704] device enp1s0 left promiscuous mode
Aug 12 04:44:33 coyotte kernel: [126778.379954] vboxnetflt: 81340 out of 252005 packets were not sent (directed to host)
Aug 12 04:45:36 coyotte kernel: [126841.862203] vboxdrv: ffffffffc095f020 VMMR0.r0
Aug 12 04:45:37 coyotte kernel: [126841.999067] vboxdrv: ffffffffc0740020 VBoxDDR0.r0
Aug 12 04:45:37 coyotte kernel: [126842.000476] vboxdrv: ffffffffc00db020 VBoxDD2R0.r0
Aug 12 04:45:37 coyotte kernel: [126842.028914] VBoxNetFlt: attached to 'enp1s0' / 8c:89:a5:34:b3:2d
Aug 12 04:45:37 coyotte kernel: [126842.030035] vboxdrv: ffffffffc00d3020 VBoxEhciR0.r0
Aug 12 04:45:40 coyotte kernel: [126845.076460] device enp1s0 entered promiscuous mode
Aug 12 04:51:31 coyotte kernel: [127196.033875] HPET: Using timer above configured range: 3
Aug 12 04:51:31 coyotte kernel: [127196.033883] HPET: Using timer above configured range: 3
Aug 12 04:51:31 coyotte kernel: [127196.369009] HPET: Using timer above configured range: 3
Aug 12 04:56:52 coyotte kernel: [127517.327395] device enp1s0 left promiscuous mode
Aug 12 04:56:52 coyotte kernel: [127517.514947] vboxnetflt: 0 out of 29 packets were not sent (directed to host)

comment:13 by rklehm, 7 years ago

Follow-up and workaround

As this issue is still not solved running:

  • VirtualBox 5.1.14r112924 on
  • openSUSE 13.2 "Harlequin" Linux r08792 3.16.7-7-desktop x86_64 GNU/Linux

I like to add our workaround that sounds funny, but we can reproduce that all VMs running on the same user are aborted:

VBoxManage: error: Machine is not running (currently aborted)!

At this time there are 4 VBoxHeadless VMs running simultanously and at each VM commands like

VBoxManage guestcontrol <VM> run --no-wait-stdout --no-wait-stderr --username <user> --password xxx -- 'C:\Windows\system32\cmd.exe' /c ...

are to be executed. The VMs are working fine only when our workaround is running. To ensure that there is always at least one VirtualBox running we start this:

Xvfb -screen 0 1920x1080x24 -fbdir ~/log/tmp/Xvfb6000 :6000 &
DISPLAY=:6000 VirtualBox > 2&>1 /dev/null &
# xwud -in ~/log/tmp/Xvfb6000/Xvfb_screen0 # to check that the workaround is running

If no VirtualBox process is running at the user executing the VMs we can reproducible see that all VMs are aborted.

I hope this gives a glue to a fix for this issue.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use