VirtualBox

Ticket #11610 (closed defect: fixed)

Opened 13 months ago

Last modified 3 months ago

BUG: unable to handle kernel paging request

Reported by: csreynolds Owned by:
Priority: major Component: other
Version: VirtualBox 4.2.10 Keywords: kernel
Cc: Guest type: all
Host type: Linux

Description

VM will start up, function for a random amount of time and then freeze. Has to be killed from command line.

Attachments

messages Download (7.2 KB) - added by csreynolds 13 months ago.
VBox.log Download (62.6 KB) - added by csreynolds 13 months ago.
virtualbox-4.2.10-2-linux-3.8.5-1-oops.log Download (4.8 KB) - added by sl4mmy 13 months ago.
VBox.log.1 Download (62.2 KB) - added by sl4mmy 13 months ago.
This log is from when I couldn't boot a Windows virtual machine after upgrading my ArchLinux system to virtualbox-4.2.10-2 and linux-3.8.5-1.
VBox.2.log Download (56.3 KB) - added by sl4mmy 13 months ago.
This log is from successfully booting the same Windows virtual machine after downgrading back to virtualbox-4.2.8-1 and linux-3.7.10-1
fedora-18-oops.txt Download (2.8 KB) - added by rsalmon 13 months ago.
kernel oops starting a VM on a up-to-date fedora 18 Linux bureau 3.8.5-201.fc18.i686.PAE #1 SMP Thu Mar 28 21:50:08 UTC 2013 i686 i686 i386 GNU/Linux VirtualBox-4.2-4.2.10_84104_fedora18-1.i686
fc-18-oops.txt Download (4.7 KB) - added by csreynolds 12 months ago.
still happening with kernel 3.8.6-203.fc18.x86_64 and VirtualBox 4.2.12 I have tried re-installing header/devel rpms and re-running vboxdrv setup to see if it cleared up like frank, same issue.
cpuinfo Download (13.8 KB) - added by timemaster 11 months ago.
cpuinfo of affected system
cpuinfo.2 Download (20.8 KB) - added by sl4mmy 11 months ago.
Output of /proc/cpuinfo from my host machine (Archlinux, kernel v3.9.2, VirtualBox 4.2.12)
kernel_vboxissue.log Download (3.4 KB) - added by dboy 11 months ago.
Kernel Oops, VirtualBox 4.2.12
virtualbox-4.2.12-3-linux-3.9.3-1-oops.log Download (5.3 KB) - added by sl4mmy 11 months ago.
Here is another kernel log from my system running Linux 3.9.3 and VirtualBox 4.2.12.
VirtualBox-dies.log Download (84.1 KB) - added by rmflight 11 months ago.
rmflight dmesg output
oops.txt Download (4.8 KB) - added by p5n 11 months ago.
virtualbox-4.2.51-linux-3.9.3-oops.txt Download (90.8 KB) - added by sl4mmy 11 months ago.
Complete dmesg of kernel oops produced using test build 4.2.51.
host_uname Download (111 bytes) - added by wenns 11 months ago.
host_dmesg Download (93.5 KB) - added by wenns 11 months ago.
host_cpuinfo Download (20.8 KB) - added by wenns 11 months ago.
host_lsmod Download (2.2 KB) - added by wenns 11 months ago.
host_meminfo Download (1.2 KB) - added by wenns 11 months ago.
host_vb_version Download (1.9 KB) - added by wenns 11 months ago.
vb_crash_dataset.tar.gz Download (29.9 KB) - added by wenns 11 months ago.
virtualbox-4.2.51-linux-3.9.4-oops.txt Download (91.9 KB) - added by sl4mmy 11 months ago.
Full system log of crash with VirtualBox-4.2.51-85953 and Linux 3.9.4
VBoxSVC.log Download (2.2 KB) - added by sl4mmy 11 months ago.
VirtualBox service log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4
sl4mmy-virtualbox-4.2.51-linux-3.9.4-vbox.log Download (51.0 KB) - added by sl4mmy 11 months ago.
VBox.log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4

Change History

Changed 13 months ago by csreynolds

Changed 13 months ago by csreynolds

comment:1 Changed 13 months ago by csreynolds

This problem started after I upgraded to kernel 3.8.x. 3.7.x functions properly.

comment:2 Changed 13 months ago by csreynolds

I can boot into an older kernel and i have no problems. Is there a way I can get more detailed info on why the crash is happening? I'd like to help resolve this issue if i can.

[creynolds@localhost trunk]$ uname -a Linux localhost.localdomain 3.6.10-4.fc18.x86_64 #1 SMP Tue Dec 11 18:01:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

The listed kernel above shows no problems at all. 3.7 also worked before I upgraded to 3.8

comment:3 Changed 13 months ago by srondeau

I had a similar experience with upgrading Fedora 17 from 3.6.10 to 3.8.3 -- VirtualBox 4.2.10 issued "unable to handle kernel paging" while writing to the virtual disk (was installing Fedora 17 Live CD to hard drive). So I tried VirtualBox 4.1.24 -- same problem. I changed to another computer with 3.8.3 kernel -- same problem. Reverted kernel to 3.6.10, and no problems were encountered.

Reported problem to Red Hat (Bug 929339), who said it was VirtualBox's problem.

comment:4 Changed 13 months ago by sl4mmy

I encountered the same problem today on ArchLinux. I was running virtualbox-4.2.10-2 and linux-3.8.5-1. I had to downgrade back to virtualbox-4.2.8-1 and linux-3.7.10-1 in order to use my virtual machines again. I will upload the relevant snippet from /var/log/messages.log.

Changed 13 months ago by sl4mmy

Changed 13 months ago by sl4mmy

This log is from when I couldn't boot a Windows virtual machine after upgrading my ArchLinux system to virtualbox-4.2.10-2 and linux-3.8.5-1.

Changed 13 months ago by sl4mmy

This log is from successfully booting the same Windows virtual machine after downgrading back to virtualbox-4.2.8-1 and linux-3.7.10-1

Changed 13 months ago by rsalmon

kernel oops starting a VM on a up-to-date fedora 18 Linux bureau 3.8.5-201.fc18.i686.PAE #1 SMP Thu Mar 28 21:50:08 UTC 2013 i686 i686 i386 GNU/Linux VirtualBox-4.2-4.2.10_84104_fedora18-1.i686

comment:5 follow-up: ↓ 6 Changed 13 months ago by rsalmon

actually, as I read again the description of this ticket, I realized that my issue is probably not the same as this one since VMs don't start. I get a kernel Oops trying to start them.

comment:6 in reply to: ↑ 5 Changed 13 months ago by sl4mmy

Replying to rsalmon:

actually, as I read again the description of this ticket, I realized that my issue is probably not the same as this one since VMs don't start. I get a kernel Oops trying to start them.

Actually, the same is true in my case as well. The VM starts but the Oops happens at some random point while the guest is booting. I tried with both Windows XP and RHEL 6.3 guests. None ever booted into a usable state before the Oops occurred.

comment:7 follow-ups: ↓ 8 ↓ 9 Changed 13 months ago by frank

Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).

You don't run KVM in parallel by any chance?

comment:8 in reply to: ↑ 7 Changed 13 months ago by srondeau

Replying to frank:

Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).

You don't run KVM in parallel by any chance?

In my case, no.

The host was running Fedora 17 x86_64 3.8.3-103 (10GB RAM).

I had created a Fedora VM with 2GB RAM, a 15GB virtual disk and configured networking to be bridged to em2. I connected the Fedora 17 Live CD (x86) .iso file, and clicked on "install to hard drive". It was during the last step -- the installation of packages to the virtual disk -- that I would encounter the kernel paging error. The point at which it was encountered varied -- one time it was fairly early in the package installation process, while at another time it was near the end.

I don't believe there were any other VMs active at the time.

When I changed the host's kernel back to 3.6.10, I didn't encounter any problems.

I have many other hosts running 3.8.3-103, but with an existing Windows 7 VM, and I haven't seen any problems running them. It seemed to be tied to writing a lot to the virtual disk.

comment:9 in reply to: ↑ 7 Changed 13 months ago by rsalmon

Replying to frank:

Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).

You don't run KVM in parallel by any chance?

I don't run KVM. Now, I'm not sure of what I've done but I no longer get a kernel Oops when starting a VM. I forced a reinstall of the kernel and kernel's header/devel files, then rerun vboxdrv setup. may be I had a problem with the devel files.

kernel is 3.8.5-201.fc18.i686.PAE and I was able to start a debian 32bits.

Changed 12 months ago by csreynolds

still happening with kernel 3.8.6-203.fc18.x86_64 and VirtualBox 4.2.12 I have tried re-installing header/devel rpms and re-running vboxdrv setup to see if it cleared up like frank, same issue.

comment:10 Changed 12 months ago by sl4mmy

I noticed at least one difference in the /var/log/messages between the last version of VirtualBox that worked on my machine and all of the versions that failed: just before the kernel Oops message there is a line logging that a network device entered promiscuous mode.

In the working versions of VirtualBox the device is vboxnet0 or vboxnet1, but in the versions that don't work the device is eth0. You can see an example of this at line 1 in the log snippet I originally posted: https://www.virtualbox.org/attachment/ticket/11610/virtualbox-4.2.10-2-linux-3.8.5-1-oops.log#L1

The same can be seen here in the original attachment posted by csreynolds: https://www.virtualbox.org/attachment/ticket/11610/messages#L17

Unfortunately, the other snippets of /var/log/messages posted in this thread trimmed the "device XYZ entered promiscuous mode" lines.

I wonder if this is consistent for others experiencing this issue. Do the versions of VirtualBox that fail always log the name of the physical interface before the kernel Oops, and the versions of VirtualBox that work fine always log the name of one of the vboxnet interfaces?

Does anyone know of any changes in VirtualBox 4.2.10+ or Linux 3.8+ that would affect which device the vboxdrv, vboxnetadp or vboxnetflt kernel modules try to switch into promiscuous mode?

comment:11 follow-up: ↓ 13 Changed 12 months ago by sergiomb

hi , the kernel on host or kernel on guest ?

Changed 11 months ago by timemaster

cpuinfo of affected system

comment:12 follow-up: ↓ 14 Changed 11 months ago by timemaster

Hi All, I think I was able to nail down the problem. I played with different configuration for many virtual machine and found some working configuration.

It all comes down to under System settings, "Acceleration" tab, nested paging (AMD-V) or EPT (Intel VT-x) under System settings, "Processor" tab, Enable PAE/NX under System settings, "Acceleration" tab, hardware virtualization (AMD-V) (Intel VT-x) (first checkbox)

Generally, the problem arise when you have multiple cpu assigned to a guest, and nested page are activated.

I was doing some tests with System Rescue CD iso file and an installed CromeBook OS

Working without problem : System rescue cd and chrome os will boot, run and wait for user input at the prompt or interface.

PAE NX on

1 processor

VT-d off

nested page off


system rescue cd will work as above, chrome will not because it need a pae kernel, anyway.

PAE NX off

1 processor

VT-d off

nested pages off


Not working : system rescue cd will show a fully working boot menu, when starting the default kernel will hang after third line "Probing EDD (edd=off to disable)... ok

chrome will fail silently

pae nx on

1 processor

vt-d on

nested pages off


system rescue cd, same as above

chrome will fail silently

pae nx on

1 processor

vt-d on

nested pages on


system rescue cd, same as above

chrome will fail silently

pae nx on

1 processor

vt-d on

nested pages on


system rescue cd, will boot but fail before running to user login. multiple run will crash at different places

chrome will fail silently

pae nx off

2 processor

vt-d on

nested pages off


system rescue cd, will boot but fail before running to user login. multiple run will crash at different places

chrome will fail silently

pae nx on

2 processor

vt-d on

nested pages on


I could not see differences between nat and bridged network. (bridget network put the interface into promiscous mode) I could not see

Under Arch linux, affected kernel  https://bugs.archlinux.org/task/34399 also occur with 3.9.2-1-ARCH virtualbox 4.2.12_OSE_r84980

attached in my cpuinfo, I have an intel westmere cpu.

Version 0, edited 11 months ago by timemaster (next)

comment:13 in reply to: ↑ 11 ; follow-up: ↓ 15 Changed 11 months ago by sl4mmy

Replying to sergiomb:

hi , the kernel on host or kernel on guest ?

On the host, but I believe that is a red herring. timemaster's workaround works for me, too.

comment:14 in reply to: ↑ 12 Changed 11 months ago by sl4mmy

Replying to timemaster:

Generally, the problem arise when you have activated hardware virtualization or have multiple cpu (which automatically activate it)

I was doing some tests with System Rescue CD iso file and an installed CromeBook OS

Disabling VT-x/AMD-V worked around the problem with my 32-bit Windows XP virtual machine, unfortunately that won't work for any 64-bit virtual machines. Disabling PAE/NX for a 64-bit vm seems to help it run a little longer before the kernel Oops occurs (for example, with PAE/NX enabled my 64-bit vm trips over consistently while the vm is booting, but with PAE/NX disabled it boots fine and is usable), but it does eventually happen every time for me.

This is with my host system running Linux 3.9.2 and VirtualBox 4.2.12. I'll post the output of /proc/cpuinfo as well.

Changed 11 months ago by sl4mmy

Output of /proc/cpuinfo from my host machine (Archlinux, kernel v3.9.2, VirtualBox 4.2.12)

comment:15 in reply to: ↑ 13 ; follow-up: ↓ 17 Changed 11 months ago by sergiomb

Replying to sl4mmy:

Replying to sergiomb:

hi , the kernel on host or kernel on guest ?

On the host, but I believe that is a red herring. timemaster's workaround works for me, too.

what or where is timemaster's workaround ?

comment:16 follow-up: ↓ 18 Changed 11 months ago by frank

We still cannot reproduce this problem. I would be interested to see more kernel logs after a VM crashed like described above.

Changed 11 months ago by dboy

Kernel Oops, VirtualBox 4.2.12

comment:17 in reply to: ↑ 15 Changed 11 months ago by sl4mmy

Replying to sergiomb:

what or where is timemaster's workaround ?

https://www.virtualbox.org/ticket/11610#comment:12

comment:18 in reply to: ↑ 16 ; follow-up: ↓ 19 Changed 11 months ago by sl4mmy

Hi, frank-

Replying to frank:

We still cannot reproduce this problem. I would be interested to see more kernel logs after a VM crashed like described above.

What kernel version are you running on the host? What kind of processor does the host have?

I exchanged some private emails with the maintainer of the Archlinux package. He says he can't reproduce the problem, either. I'm not sure what the common trigger is between all of the affected systems...

Changed 11 months ago by sl4mmy

Here is another kernel log from my system running Linux 3.9.3 and VirtualBox 4.2.12.

comment:19 in reply to: ↑ 18 ; follow-up: ↓ 25 Changed 11 months ago by quickbooks

@sl4mmy: Care to post the whole dmesg log instead of just the oops portion? And the number of guests running simultanously when you got the oops?

Also you might want to see if you can reproduce this issue with the below test build (major rewrite of the VT-x code including many bug fixes and performance improvements)

 http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85607-Linux_amd64.run

 http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85607.vbox-extpack

comment:20 follow-up: ↓ 26 Changed 11 months ago by wenns

Hi all,

Im experiencing the same behavior: VM's boot up and freeze later, which can be easily reproduced by writing big amounts of data to the virtual disc. In my case I do a big "svn co " and the hang happens usually after svn has written ~ 1 GB.

Interestingly enough, It happens only on my server hardware: HP Prolient with 64 GB Memory and 24 Intel Xeon Cores. Quite similar setup (same VirtualBox version, same kernel, save VM) on my workstation (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz) works fine.

I can provide more details if necessary.

Changed 11 months ago by rmflight

rmflight dmesg output

comment:21 Changed 11 months ago by wenns

I've done some testing and begin to see patterns. On our side, this issue arises under following circumstances:

  • 64 bit guest (tested with Linux 3.8 and Windows 7)
  • 64 bit Linux host (Ubuntu Server in our case). Other host OSes not tested.
  • Intel Xeon hardware. Yes: it doesnt trigger on a Intel Core i7, host OS/guest OS/virtualizer all being equal.

Switching the following settings on and off doesnt matter:

  • PAE/NX
  • Nested Paging
  • VT-x/AMD-V (This one cannot be switched off on 64 guests, of course)

Also, the behavior under the posted test build (4.2.51) is still the same.

Hope that helps.

comment:22 follow-ups: ↓ 24 ↓ 27 Changed 11 months ago by wenns

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

Changed 11 months ago by p5n

comment:23 Changed 11 months ago by p5n

one more oops from kernel 3.9.3-1-ARCH and virtualbox 4.2.12-3

comment:24 in reply to: ↑ 22 ; follow-up: ↓ 31 Changed 11 months ago by sergiomb

Replying to wenns:

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

can you precise what you disable ? graphics ? FYI, I found in one machine on linux host that one linux guest crash and X won't start when loads /usr/lib/modules/*/extra/VirtualBox/vboxvideo.ko , if I remove it before lunch X , everything works , just disable "seamless mode"

comment:25 in reply to: ↑ 19 ; follow-up: ↓ 30 Changed 11 months ago by sl4mmy

Hi, quickbooks-

Replying to quickbooks:

@sl4mmy: Care to post the whole dmesg log instead of just the oops portion? And the number of guests running simultanously when you got the oops?

Also you might want to see if you can reproduce this issue with the below test build (major rewrite of the VT-x code including many bug fixes and performance improvements)

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

Changed 11 months ago by sl4mmy

Complete dmesg of kernel oops produced using test build 4.2.51.

comment:26 in reply to: ↑ 20 Changed 11 months ago by sl4mmy

Hi, wenns-

Replying to wenns:

Interestingly enough, It happens only on my server hardware: HP Prolient with 64 GB Memory and 24 Intel Xeon Cores. Quite similar setup (same VirtualBox version, same kernel, save VM) on my workstation (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz) works fine.

That is interesting. My desktop suffering from this problem has an Intel Xeon X5675 @ 3.07GHz.

comment:27 in reply to: ↑ 22 Changed 11 months ago by sl4mmy

Hi, wenns-

Replying to wenns:

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

Did disabling hardware acceleration work around the problem with your 32-bit guests? With hardware acceleration enabled my 32 bit guests reliably trigger the kernel oops while they are booting, but with hardware acceleration enabled my 32 bit guests are usable (well... they're noticably slower ;)). Again, this is with my desktop machine with a Xeon X5675 @ 3.07GHz.

comment:28 follow-up: ↓ 34 Changed 11 months ago by timemaster

So if I summaryse the cpu usage..... ....

csreynolds VBox.log show that he use Xeon X5675 @ 3.07GHz
sl4mmy VBox.log.1 and cpuinfo.2 show that he use Xeon X5675 @ 3.07GHz
timemaster (I) show in cpuinfo am using a Xeon E5620 @ 2.40GHz
wenns say that he is using Xeon processor, and his core i7 processor does not have this problem.
rmflight VirtualBox-dies.log show that he is using Xeon X5650 @ 2.67GHz

p5n ?
wenns ? detail please

That's a lot of Xeon processor in the range of ?56??... plus Wenns say that i7 processor are not affected..... Something worth to check.

Last edited 11 months ago by timemaster (previous) (diff)

comment:29 Changed 11 months ago by quickbooks

Not sure which guest caused this, as I had 3+ guests running: 1 linux, 2 windows. 1 was installing a new copy of Win 7 64bit.

I have an Intel i3-3225.

May 24 19:09:19 localhost kernel: [11647.766537] EMT-0: page allocation failure: order:9, mode:0x344d2
May 24 19:09:19 localhost kernel: [11647.766541] Pid: 5366, comm: EMT-0 Tainted: PF        C O 3.9.3-201.fc18.x86_64 #1
May 24 19:09:19 localhost kernel: [11647.766542] Call Trace:
May 24 19:09:19 localhost kernel: [11647.766547]  [<ffffffff81139509>] warn_alloc_failed+0xe9/0x150
May 24 19:09:19 localhost kernel: [11647.766551]  [<ffffffff81658ae4>] ? __alloc_pages_direct_compact+0x182/0x194
May 24 19:09:19 localhost kernel: [11647.766553]  [<ffffffff8113d806>] __alloc_pages_nodemask+0x856/0xae0
May 24 19:09:19 localhost kernel: [11647.766557]  [<ffffffff8117c0c8>] alloc_pages_current+0xb8/0x190
May 24 19:09:19 localhost kernel: [11647.766570]  [<ffffffffa02bbd60>] rtR0MemObjLinuxAllocPages+0xc0/0x260 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766577]  [<ffffffffa02bbf3a>] rtR0MemObjLinuxAllocPhysSub2+0x3a/0xe0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766583]  [<ffffffffa02bc0aa>] rtR0MemObjLinuxAllocPhysSub+0xca/0xd0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766589]  [<ffffffffa02bc479>] rtR0MemObjNativeAllocPhys+0x19/0x20 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766595]  [<ffffffffa02ba314>] VBoxHost_RTR0MemObjAllocPhysExTag+0x64/0xb0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766608]  [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766613]  [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766618]  [<ffffffffa02b2db4>] ? supdrvIOCtl+0x1664/0x2be0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766623]  [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766628]  [<ffffffffa02ad47c>] ? VBoxDrvLinuxIOCtl_4_2_51+0x10c/0x1f0 [vboxdrv]
May 24 19:09:19 localhost kernel: [11647.766631]  [<ffffffff811b17e7>] ? do_vfs_ioctl+0x97/0x580
May 24 19:09:19 localhost kernel: [11647.766634]  [<ffffffff812a157a>] ? inode_has_perm.isra.32.constprop.62+0x2a/0x30
May 24 19:09:19 localhost kernel: [11647.766635]  [<ffffffff812a2c07>] ? file_has_perm+0x97/0xb0
May 24 19:09:19 localhost kernel: [11647.766637]  [<ffffffff811b1d61>] ? sys_ioctl+0x91/0xb0
May 24 19:09:19 localhost kernel: [11647.766640]  [<ffffffff81669f59>] ? system_call_fastpath+0x16/0x1b
May 24 19:09:19 localhost kernel: [11647.766641] Mem-Info:
May 24 19:09:19 localhost kernel: [11647.766642] Node 0 DMA per-cpu:
May 24 19:09:19 localhost kernel: [11647.766643] CPU    0: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766644] CPU    1: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766645] CPU    2: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766646] CPU    3: hi:    0, btch:   1 usd:   0
May 24 19:09:19 localhost kernel: [11647.766646] Node 0 DMA32 per-cpu:
May 24 19:09:19 localhost kernel: [11647.766648] CPU    0: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766648] CPU    1: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766649] CPU    2: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766650] CPU    3: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766650] Node 0 Normal per-cpu:
May 24 19:09:19 localhost kernel: [11647.766651] CPU    0: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766652] CPU    1: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766653] CPU    2: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766653] CPU    3: hi:  186, btch:  31 usd:   0
May 24 19:09:19 localhost kernel: [11647.766656] active_anon:194178 inactive_anon:4307 isolated_anon:0
May 24 19:09:19 localhost kernel: [11647.766656]  active_file:378144 inactive_file:835082 isolated_file:0
May 24 19:09:19 localhost kernel: [11647.766656]  unevictable:879 dirty:29 writeback:0 unstable:0
May 24 19:09:19 localhost kernel: [11647.766656]  free:58654 slab_reclaimable:32788 slab_unreclaimable:31894
May 24 19:09:19 localhost kernel: [11647.766656]  mapped:1056803 shmem:6914 pagetables:11294 bounce:0
May 24 19:09:19 localhost kernel: [11647.766656]  free_cma:0
May 24 19:09:19 localhost kernel: [11647.766658] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
May 24 19:09:19 localhost kernel: [11647.766660] lowmem_reserve[]: 0 3436 15947 15947
May 24 19:09:19 localhost kernel: [11647.766662] Node 0 DMA32 free:64824kB min:14548kB low:18184kB high:21820kB active_anon:7812kB inactive_anon:0kB active_file:116kB inactive_file:96kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3631648kB managed:3518864kB mlocked:0kB dirty:0kB writeback:0kB mapped:8356kB shmem:4kB slab_reclaimable:376kB slab_unreclaimable:3972kB kernel_stack:48kB pagetables:1200kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
May 24 19:09:19 localhost kernel: [11647.766665] lowmem_reserve[]: 0 0 12510 12510
May 24 19:09:19 localhost kernel: [11647.766667] Node 0 Normal free:153900kB min:52968kB low:66208kB high:79452kB active_anon:768900kB inactive_anon:17228kB active_file:1512460kB inactive_file:3340232kB unevictable:3516kB isolated(anon):0kB isolated(file):0kB present:13074432kB managed:12811044kB mlocked:3516kB dirty:116kB writeback:0kB mapped:4218856kB shmem:27652kB slab_reclaimable:130776kB slab_unreclaimable:123596kB kernel_stack:2992kB pagetables:43976kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
May 24 19:09:19 localhost kernel: [11647.766670] lowmem_reserve[]: 0 0 0 0
May 24 19:09:19 localhost kernel: [11647.766671] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15892kB
May 24 19:09:19 localhost kernel: [11647.766677] Node 0 DMA32: 109*4kB (UEM) 87*8kB (UEM) 131*16kB (UEM) 56*32kB (UEM) 83*64kB (UEM) 60*128kB (UM) 33*256kB (UM) 13*512kB (UM) 15*1024kB (UEM) 8*2048kB (UM) 0*4096kB = 64860kB
May 24 19:09:19 localhost kernel: [11647.766684] Node 0 Normal: 5990*4kB (UEM) 2991*8kB (UEM) 1510*16kB (UEM) 783*32kB (EM) 341*64kB (EM) 74*128kB (UM) 30*256kB (UEM) 15*512kB (UEM) 10*1024kB (UM) 0*2048kB 0*4096kB = 154000kB
May 24 19:09:19 localhost kernel: [11647.766691] 1220696 total pagecache pages
May 24 19:09:19 localhost kernel: [11647.766692] 0 pages in swap cache
May 24 19:09:19 localhost kernel: [11647.766693] Swap cache stats: add 0, delete 0, find 0/0
May 24 19:09:19 localhost kernel: [11647.766693] Free swap  = 0kB
May 24 19:09:19 localhost kernel: [11647.766694] Total swap = 0kB
May 24 19:09:19 localhost kernel: [11647.795919] 4186111 pages RAM
May 24 19:09:19 localhost kernel: [11647.795922] 2599506 pages reserved
May 24 19:09:19 localhost kernel: [11647.795923] 1370459 pages shared
May 24 19:09:19 localhost kernel: [11647.795923] 1307461 pages non-shared

comment:30 in reply to: ↑ 25 ; follow-ups: ↓ 33 ↓ 43 Changed 11 months ago by quickbooks

Replying to sl4mmy:

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

Test Build (May 22)

Linux 64 Host:  http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack:  http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump

Last edited 11 months ago by quickbooks (previous) (diff)

comment:31 in reply to: ↑ 24 Changed 11 months ago by wenns

Replying to sergiomb:

Replying to wenns:

Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.

can you precise what you disable ? graphics ?

I disable VT-x/AMD-V. And now it work reliably. I'll post an overview in a couple of minutes.

FYI, I found in one machine on linux host that one linux guest crash and X won't start when loads /usr/lib/modules/*/extra/VirtualBox/vboxvideo.ko , if I remove it before lunch X , everything works , just disable "seamless mode"

Changed 11 months ago by wenns

Changed 11 months ago by wenns

Changed 11 months ago by wenns

Changed 11 months ago by wenns

Changed 11 months ago by wenns

Changed 11 months ago by wenns

comment:32 Changed 11 months ago by wenns

I'm glad there are people caring about the issue and interested in details. So here they are. In short: Im able to trigger this issue reliably under following conditions:

  1. A guest (OS doesnt seem to matter) with VT-x/AMD-V enabled is running on
  2. Intel Xeon X5670@2.93GHz, with Linux Ubuntu Server 64 Bit on top.

I tried a couple of Linux systems and Windows 7 (64 and 32 bit) as guests, all behave the same. I *didnt* try an other host OS.

See attachments for further details on the host platform.

comment:33 in reply to: ↑ 30 ; follow-ups: ↓ 35 ↓ 39 Changed 11 months ago by wenns

Replying to quickbooks:

Replying to sl4mmy:

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

Test Build (May 22)

Linux 64 Host:  http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack:  http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump

I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.

comment:34 in reply to: ↑ 28 Changed 11 months ago by p5n

Replying to timemaster:

p5n ?
wenns ? detail please

CPU: Dual Xeon E5506

MB: Intel S5500BC

OS: ArchLinux

comment:35 in reply to: ↑ 33 Changed 11 months ago by frank

Replying to wenns:

I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.

Please look here for instructions how / where to upload the core dump.

comment:36 follow-ups: ↓ 41 ↓ 44 Changed 11 months ago by frank

wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!

comment:37 Changed 11 months ago by p5n

Uggly workaround:

ls -1d /sys/devices/system/cpu/cpu?/online | while read a; do echo 0 >$a; done

Yes, it dramatically slows down your host )

comment:38 follow-up: ↓ 48 Changed 11 months ago by p5n

Actually switching off one of two CPUs helped me.

(I switched off all odd cores: 1,3,5,7)

comment:39 in reply to: ↑ 33 Changed 11 months ago by quickbooks

Replying to wenns:

Replying to quickbooks:

Replying to sl4mmy:

I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).

Test Build (May 22)

Linux 64 Host:  http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack:  http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump

I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.

Upload it to  ftp://ftp.oracle.com/appsdev/incoming together with attaching log file, and then just post the file name here.

That way only Oracle Developer's can take a look at the core dump, as sometimes core dumps contain sensitive information.

You probably will need a FTP upload software like FileZilla or gFTP etc.

comment:40 Changed 11 months ago by sergiomb

Hi , I just check and this is not my bug problem, I disable all CPU acceleration, and still hangs my laptop on resume a VM , sometimes seems my 6 gigas of swap is not enough. If you know other bug tickets that may address my problem, I was grateful that you show me .

Thanks,

comment:41 in reply to: ↑ 36 Changed 11 months ago by wenns

Replying to frank:

wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!

Here they are: see attachments, file vb_crash_dataset.tar.gz

Changed 11 months ago by wenns

comment:42 follow-up: ↓ 45 Changed 11 months ago by frank

Thanks wenns! We now see where it crashes but don't know yet why it crashes. Did you ever run an older kernel on your Xeon box with the same setup, so can you confirm that this is a Linux 3.8 regression? Or did you see the same crashes with older Linux kernels?

comment:43 in reply to: ↑ 30 Changed 11 months ago by sl4mmy

Hi, quickbooks-

Replying to quickbooks:

Test Build (May 22)

Linux 64 Host:  http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run

Extension pack:  http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack

Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump

I was able to reproduce the problem with the 85953 build. I uploaded a tarball with logs and coredumps named sl4mmy-virtualbox-4.2.51-linux-3.9.4-oops.tar.gz to the FTP site.

Changed 11 months ago by sl4mmy

Full system log of crash with VirtualBox-4.2.51-85953 and Linux 3.9.4

Changed 11 months ago by sl4mmy

VirtualBox service log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4

comment:44 in reply to: ↑ 36 Changed 11 months ago by sl4mmy

Hi, frank-

Replying to frank:

wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!

I uploaded a tarball named sl4mmy-virtualbox-4.2.51-linux-3.9.4-oops.tar.gz to the FTP site that includes both logs plus coredumps of VirtualBox, VBoxSVC and VBoxXPCOMIPCD. I also attached both log files separately to this ticket:

comment:45 in reply to: ↑ 42 Changed 11 months ago by sl4mmy

Hi, frank-

Replying to frank:

Thanks wenns! We now see where it crashes but don't know yet why it crashes. Did you ever run an older kernel on your Xeon box with the same setup, so can you confirm that this is a Linux 3.8 regression? Or did you see the same crashes with older Linux kernels?

I ran VirtualBox on this workstation without problems since October 2012. The last working version for me was VirtualBox 4.2.8 with Linux 3.7.10.

Unfortunately, the official VirtualBox 4.2.10+ packages for Arch require Linux 3.8+ so I can't easily test VirtualBox 4.2.12 with Linux 3.7.10. It also makes it difficult to identify the regression: was the problem introduced in VirtualBox 4.2.10 or Linux 3.8?

comment:46 follow-up: ↓ 47 Changed 11 months ago by frank

sl4mmy, thanks for the logs. But one log is missing: The VBox.log file from the VM. You provided VBoxSVC.log which is from the VBoxSVC server. The VBox.log file can be found either from the VM selector window / Machine / Show Log ... or can also be found in the VM configuration directory under Logs.

Changed 11 months ago by sl4mmy

VBox.log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4

comment:47 in reply to: ↑ 46 Changed 11 months ago by sl4mmy

Hi, frank-

Replying to frank:

sl4mmy, thanks for the logs. But one log is missing: The VBox.log file from the VM. You provided VBoxSVC.log which is from the VBoxSVC server. The VBox.log file can be found either from the VM selector window / Machine / Show Log ... or can also be found in the VM configuration directory under Logs.

D'oh! Sorry... I've just attached the VBox.log from the same session yesterday as the other log files.

comment:48 in reply to: ↑ 38 Changed 11 months ago by sl4mmy

Hi, p5n-

Replying to p5n:

Actually switching off one of two CPUs helped me.

(I switched off all odd cores: 1,3,5,7)

Wow, that's a really interesting observation! I've been able to work-around the issue on my machine by doing the same, thanks!

comment:49 Changed 11 months ago by sl4mmy

Howdy-

Thanks to p5n's observations (https://www.virtualbox.org/ticket/11610#comment:37 and https://www.virtualbox.org/ticket/11610#comment:38) I came up with a work-around that doesn't require disabling hardware virtualization acceleration:

$ numactl --cpunodebind=0 --localalloc -- /opt/VirtualBox/VirtualBox

First of all, here is what the numa topology of my workstation looks like:

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 6143 MB
node 0 free: 340 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 6127 MB
node 1 free: 154 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10

So with this work-around VirtualBox can only run on the CPUs of node 0, and all of the memory used by VirtualBox should be allocated on the same node running the process.

By the way, this is with the VirtualBox 4.2.51-85953 test build frank and others linked to, and Linux 3.9.4.

Interestingly, when I first tried playing with numactl after reading p5n's comments I tried binding to the CPUs on node 1, not node 0, but I encountered the same kernel oops. I tried putzing with a few more options to numactl but to no avail. Before giving up, however, I decided to try binding to node 0 instead, and sure it enough it appears to work!

What is it about node 0 that is special? Is it anyway related to the fact that node 0 is the initial boot node?

I tested with a 32-bit Windows guest with 2 CPUs and a 64-bit RHEL 6.3 guest with 2 CPUs. I even tested with both running simultaneously, watching YouTube videos in the Windows guest while running some builds in the RHEL guest. :) Zero kernel Oops so far...

Yay! Big thanks to p5n!

comment:50 follow-up: ↓ 51 Changed 11 months ago by frank

Thanks sl4mmy, also for your additional log. This helps further...

comment:51 in reply to: ↑ 50 Changed 11 months ago by sl4mmy

Hi, frank-

Replying to frank:

Thanks sl4mmy, also for your additional log. This helps further...

Sure, no problem!

Also, I can confirm that the work-around also works with the official Arch packages for VirtualBox 4.2.12 (virtualbox-4.2.12-3 and virtualbox-host-modules-4.2.12-6) on Linux 3.9.4.

comment:52 Changed 11 months ago by frank

I think the reason for this problem is CONFIG_NUMA_BALANCING which was introduced in Linux 3.8. Currently looking for a patch how to prevent migrating pages between numa nodes. Probably by setting a VM area flag...

comment:53 Changed 10 months ago by Romain Buquet

Dummy comment, just to be notified when this ticket is modified.

comment:54 Changed 10 months ago by frank

Just an update: We know what's wrong but it will be difficult to fix. Actually we are a bit over-stretching the Linux kernel API. We plan a workaround for 4.2.x and a better fix for the next major release. As written above, this problem affects only people which have more than one NUMA node in their system (output of numctl --hardware).

comment:55 follow-up: ↓ 56 Changed 10 months ago by frank

The following patch will be included in the next maintenance release (expected very soon). To fix the problem, please go to /usr/src/vboxhost-4.2.12/vboxdrv/r0drv/linux and apply these two lines manually. Then make sure that all VMs are terminated, recompile the host kernel driver (/etc/init.d/vboxdrv setup) and that was it. Or just wait a bit for the release.

This is actually a workaround but we cannot do a more fundamental fix. The simple fix will require a Linux kernel change, the difficult fix will require many many code changes in VBox so this will have to wait.

--- memobj-r0drv-linux.c        (revision 86600)
+++ memobj-r0drv-linux.c        (revision 86601)
@@ -1527,6 +1527,21 @@
                 }
             }
 
+#ifdef CONFIG_NUMA_BALANCING
+            if (RT_SUCCESS(rc))
+            {
+                /** @todo Ugly hack! But right now we have no other means to disable
+                 *        automatic NUMA page balancing. */
+# ifdef RT_OS_X86
+                pTask->mm->numa_next_reset = jiffies + 0x7fffffffUL;
+                pTask->mm->numa_next_scan  = jiffies + 0x7fffffffUL;
+# else
+                pTask->mm->numa_next_reset = jiffies + 0x7fffffffffffffffUL;
+                pTask->mm->numa_next_scan  = jiffies + 0x7fffffffffffffffUL;
+# endif
+            }
+#endif
+
             up_write(&pTask->mm->mmap_sem);
 
             if (RT_SUCCESS(rc))

comment:56 in reply to: ↑ 55 Changed 10 months ago by quickbooks

Replying to frank:

This is actually a workaround but we cannot do a more fundamental fix. The simple fix will require a Linux kernel change, the difficult fix will require many many code changes in VBox so this will have to

Can you post a trunk build for 64 bit linux, plz. thnx.

comment:57 Changed 10 months ago by frank

Sure,  here it is.

comment:58 Changed 10 months ago by frank

The workaround is included in VBox 4.2.14.

comment:59 Changed 10 months ago by sl4mmy

Hi, Frank-

I can confirm that the problem no longer occurs on my host system with VirtualBox 4.2.16. Thanks!

comment:60 Changed 10 months ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Hi, sl4mmy, thanks for the feedback and thanks again for helping debugging this problem. I will close this ticket. A better fix is required but this one will do it for the moment.

comment:61 Changed 3 months ago by frank

See also #11171 for page allocation warnings on Linux hosts.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use