VirtualBox

Opened 9 years ago

Closed 9 years ago

#13961 closed defect (fixed)

Unable to handle kernel paging request (SMAP with 4.3.26)

Reported by: Christian Hesse Owned by:
Component: other Version: VirtualBox 4.3.26
Keywords: Cc:
Guest type: other Host type: Linux

Description

My device is a Lenovo Thinkpad X250 (Broadwell CPU) running Linux 4.0rc4. System crashes as soon as virtualbox guest is started. This happens with 4.3.24 and 4.3.26.

I am pretty sure this is not related to the CR4 changes. I compiled Linux 4.0rc4 with CR4 changes reverted and this still happens.

Kernel log are attached.

Attachments (31)

vbox1.log (5.8 KB ) - added by Christian Hesse 9 years ago.
kernel log virtualbox 4.3.24
vbox2.log (5.8 KB ) - added by Christian Hesse 9 years ago.
kernel log virtualbox 4.3.26
vboxdrv.ko.gz (141.2 KB ) - added by Christian Hesse 9 years ago.
vboxdrv.ko.gz (virtualbox 4.3.26, linux 4.0rc4.r0.g06e5801)
diff_smap_2 (2.1 KB ) - added by Frank Mehnert 9 years ago.
leda-vbox3.log (5.3 KB ) - added by Christian Hesse 9 years ago.
kernel log virtualbox 4.3.26 + diff_smap_2
vboxdrv.ko.2.gz (141.2 KB ) - added by Christian Hesse 9 years ago.
vboxdrv.ko.gz (virtualbox 4.3.26 + diff_smap_2, linux 4.0rc4.r0.g06e5801)
kernel-20150317-211216.log (5.5 KB ) - added by Christian Hesse 9 years ago.
kernel log virtualbox 4.3.26 + diff_smap_2
VBox-20150317-211216.log (47.3 KB ) - added by Christian Hesse 9 years ago.
VBox.log virtualbox 4.3.26 + diff_smap_2
VMMR0.r0.gz (393.8 KB ) - added by Christian Hesse 9 years ago.
VMMR0.r0 from virtualbox 4.3.26
diff_smap_3 (3.3 KB ) - added by Frank Mehnert 9 years ago.
leda-kernel.log (4.8 KB ) - added by Christian Hesse 9 years ago.
leda-vbox.log (48.5 KB ) - added by Christian Hesse 9 years ago.
vboxdrv.ko.3.gz (141.2 KB ) - added by Christian Hesse 9 years ago.
diff_smap_4 (4.0 KB ) - added by Frank Mehnert 9 years ago.
leda-journal.log (5.0 KB ) - added by Christian Hesse 9 years ago.
syslog
leda-vbox.2.log (47.0 KB ) - added by Christian Hesse 9 years ago.
vbox.log
vboxdrv.ko.4.gz (141.2 KB ) - added by Christian Hesse 9 years ago.
vboxdrv.ko
journal.log (5.0 KB ) - added by Christian Hesse 9 years ago.
VBox.log (47.3 KB ) - added by Christian Hesse 9 years ago.
vboxdrv.ko.5.gz (141.2 KB ) - added by Christian Hesse 9 years ago.
system.log (4.1 KB ) - added by fardog 9 years ago.
journalctl log output
Vbox.log (53.9 KB ) - added by fardog 9 years ago.
vboxdrv.ko.6.gz (140.6 KB ) - added by fardog 9 years ago.
config.gz (39.2 KB ) - added by Christian Hesse 9 years ago.
kernel config 4.0.4 (default Arch Linux)
config.2.gz (39.2 KB ) - added by fardog 9 years ago.
0001-x86_64-smap-call-stac-before-touching-user-memory.patch (791 bytes ) - added by Christian Hesse 9 years ago.
x86_64, smap: call stac() before touching user memory
system.2.log (3.9 KB ) - added by enioarda 9 years ago.
journalctl of the crash.
VBox_nocrash.log (97.3 KB ) - added by enioarda 9 years ago.
NO crash VBox.log
virtualbox-crash-on-startup (4.2 KB ) - added by tekstryder 9 years ago.
virtualbox-crash-on-startup-2015-06-24 (4.2 KB ) - added by tekstryder 9 years ago.
Latest crash log info added
panic.log (26.1 KB ) - added by rugubara 9 years ago.
adding nosmap to kernel parameters didn't help me. I still got panic tonight

Download all attachments as: .zip

Change History (79)

by Christian Hesse, 9 years ago

Attachment: vbox1.log added

kernel log virtualbox 4.3.24

by Christian Hesse, 9 years ago

Attachment: vbox2.log added

kernel log virtualbox 4.3.26

comment:1 by Christian Hesse, 9 years ago

I added 'nosmap' and 'nosmep' to host boot parameters. Virtualbox guest now starts without issues.

comment:2 by Frank Mehnert, 9 years ago

Could you attach the compiled vboxdrv.ko module from VirtualBox 4.3.26? Also, could you check adding 'nosmep' and 'nosmap' exclusively? Thanks!

comment:3 by Christian Hesse, 9 years ago

You should increase the upload limit... My vboxdrv.ko exceeds the limit by about 20kB.

I uploaded it to my webserver for now: http://www.eworm.de/tmp/vboxdrv.ko

This is virtualbox 4.3.26 and linux 4.0rc4.r0.g06e5801.

comment:4 by Frank Mehnert, 9 years ago

Downloaded, thanks. If you compress this binary it will not exceed the size limit.

comment:5 by Christian Hesse, 9 years ago

Having SMEP (supervisor mode execution prevention) enabled is just fine. It's sufficient to have 'nosmap' (to disable supervisor mode access prevention) in boot parameters.

comment:6 by Christian Hesse, 9 years ago

You are right... I expected it to be compressed, but looks like dkms does not compress.

Uploading compressed vboxdrv.ko for reference.

by Christian Hesse, 9 years ago

Attachment: vboxdrv.ko.gz added

vboxdrv.ko.gz (virtualbox 4.3.26, linux 4.0rc4.r0.g06e5801)

comment:7 by Frank Mehnert, 9 years ago

Thanks. Actually I think I know where the problem is and I might have a patch available during the next few hours.

comment:8 by Frank Mehnert, 9 years ago

Summary: unable to handle kernel paging requestUnable to handle kernel paging request (SMAP with 4.3.26)

by Frank Mehnert, 9 years ago

Attachment: diff_smap_2 added

comment:9 by Frank Mehnert, 9 years ago

Attached a diff for the kernel driver which should fix the problem. After you applied the diff to the VirtualBox kernel driver sources (which are located at /usr/src/vboxhost-4.3.26) please recompile the host kernel modules by

/etc/init.d/vboxdrv setup

and start your VM. Please make sure to run this on a Linux kernel with 'nosmap' and 'nosmep' removed.

comment:10 by Christian Hesse, 9 years ago

This still crashes. Will attach a new log.

by Christian Hesse, 9 years ago

Attachment: leda-vbox3.log added

kernel log virtualbox 4.3.26 + diff_smap_2

comment:11 by Frank Mehnert, 9 years ago

Unexpected. Could you attach the new vboxdrv.ko module please? Thanks!

by Christian Hesse, 9 years ago

Attachment: vboxdrv.ko.2.gz added

vboxdrv.ko.gz (virtualbox 4.3.26 + diff_smap_2, linux 4.0rc4.r0.g06e5801)

comment:12 by Frank Mehnert, 9 years ago

Thanks. Unfortunately I don't understand why EFLAGS.AC is still not set. Could you repeat the experiment and attach all corresponding items from the same VM session:

  • The VBox.log file
  • The Linux kernel log
  • The vboxdrv.ko file if different than vboxdrv.ko.2.gz

This will help me to debug the problem because the VBox.log file contains the load addresses of the VMM modules. Unfortunately we cannot reproduce the problem as we still don't have Broadwell hardware.

comment:13 by Christian Hesse, 9 years ago

Ok, here we go...

It's not easy to capture VBox.log from a dying machine, but inotify, tail and ssh did the trick. ;)

vboxdrv.ko is unchanged, logs will follow.

by Christian Hesse, 9 years ago

Attachment: kernel-20150317-211216.log added

kernel log virtualbox 4.3.26 + diff_smap_2

by Christian Hesse, 9 years ago

Attachment: VBox-20150317-211216.log added

VBox.log virtualbox 4.3.26 + diff_smap_2

comment:14 by Frank Mehnert, 9 years ago

Thanks again. As you used a non-official package, could you also provide the VMMR0.r0 module?

comment:15 by Christian Hesse, 9 years ago

That is what's found at /usr/lib/virtualbox/VMMR0.r0?

I will attach it in a few seconds. Though I am not sure if this is identical to what I used... The logs were made with a package I compiled myself, I now have installed my distribution's packages. Both were built in a clean chroot.

Let me know if I should repeat my tests.

by Christian Hesse, 9 years ago

Attachment: VMMR0.r0.gz added

VMMR0.r0 from virtualbox 4.3.26

by Frank Mehnert, 9 years ago

Attachment: diff_smap_3 added

comment:16 by Frank Mehnert, 9 years ago

Ok. Could you try diff_smap_3 instead of diff_smap_2 and see if you would now be able to start VMs with SMAP enabled? Thanks!

comment:17 by Christian Hesse, 9 years ago

Still crashes. This was with linux 4.0rc4.r199.gb314aca, virtualbox 4.3.26 + diff_smap_3.

by Christian Hesse, 9 years ago

Attachment: leda-kernel.log added

by Christian Hesse, 9 years ago

Attachment: leda-vbox.log added

by Christian Hesse, 9 years ago

Attachment: vboxdrv.ko.3.gz added

by Frank Mehnert, 9 years ago

Attachment: diff_smap_4 added

comment:18 by Frank Mehnert, 9 years ago

Next try. We just saw this changeset which would explain why the other patches did not work. Could you try again? Thank you!

comment:19 by Christian Hesse, 9 years ago

Looks like that did the trick! Guest is up and running, host is still alive. ;)

Thanks a lot!

comment:20 by Frank Mehnert, 9 years ago

And thanks for your patience during testing!

comment:21 by Christian Hesse, 9 years ago

It's very seldom, but still happens from time to time... Looks like we have a corner case that still crashes the machine. Any ideas? Sadly I can not reproduce, happens about once a week for me.

comment:22 by Frank Mehnert, 9 years ago

eworm, that's important. I'm running VBo on a Linux 4.0.0 host and never saw such problems for many weeks now. It would be nice if you could provide at least a VBox.log file together with the output of 'dmesg' and the corresponding vboxdrv.ko as you provided before.

by Christian Hesse, 9 years ago

Attachment: leda-journal.log added

syslog

by Christian Hesse, 9 years ago

Attachment: leda-vbox.2.log added

vbox.log

by Christian Hesse, 9 years ago

Attachment: vboxdrv.ko.4.gz added

vboxdrv.ko

comment:23 by Christian Hesse, 9 years ago

I think I did about a hundred reboot cycles... Finally it crashed. :D Have fun!

comment:24 by Frank Mehnert, 9 years ago

Thanks eworm. One more request: Could you also attach the VMMR0.r0 file from your installation? You are using a distribution-specific package therefore I don't have a reference. Thanks!

comment:25 by Frank Mehnert, 9 years ago

Actually attaching of VMMR0.r0 shouldn't be necessary. I just downloaded the original ArchLinux package and got a VMMR0.r0 which seems to fit the other files. Investigating...

comment:26 by tannerjfco, 9 years ago

I've applied the latest patch in this thread and it has resolved the issue I was encountering with Virtualbox freezing the host. I have yet to encounter any further trouble but I will keep an eye out for the issue eworm mentions and will follow-up if I encounter it. Thanks!

by Christian Hesse, 9 years ago

Attachment: journal.log added

by Christian Hesse, 9 years ago

Attachment: VBox.log added

by Christian Hesse, 9 years ago

Attachment: vboxdrv.ko.5.gz added

comment:27 by Christian Hesse, 9 years ago

Just had another crash... Uploaded the logs and kernel module.

Any news on this? This is really annoying. Would be great to have a stable workstation any time soon.

comment:28 by Frank Mehnert, 9 years ago

Thanks for the new dump. Looks like the fault was triggered at the exact same place as before. We still don't know how this can happen and try to reproduce the problem.

comment:29 by Christian Hesse, 9 years ago

Let me know if I can help in one way or another.

Does it help to upload more logs if a crash occurs?

comment:30 by Frank Mehnert, 9 years ago

VBox 4.3.28 contains the last code including diff_smap_4. I guess this will still not fix eworms problems but I would like to know if other users have any SMAP problems with VBox 4.3.28.

by fardog, 9 years ago

Attachment: system.log added

journalctl log output

comment:31 by fardog, 9 years ago

Hi frank; having the same issue as eworm on my Thinkpad T450s, using the latest virtualbox 4.3.28, so can confirm the issue isn't fixed. I've uploaded my system.log, and am trying to find the other information such as my virtualbox log, will upload it as I find it.

by fardog, 9 years ago

Attachment: Vbox.log added

by fardog, 9 years ago

Attachment: vboxdrv.ko.6.gz added

comment:32 by fardog, 9 years ago

I think that gives you everything you need, but let me know if there's anything else that'll help. Thanks!

comment:33 by Frank Mehnert, 9 years ago

fardog and eworm, your log files indicated that your VM processes still crash at the very same position. At the moment we cannot explain this. It's also interesting that only ArchLinux users seem to be affected, at least I'm not aware of users having 4.3.28 installed and having problems with SMAP. One developer installed ArchLinux on a Broadwell laptop and still was not able to reproduce the problem.

Could you attach the Linux kernel configuration?

(removed the last paragraph. I will prepare another test build soon)

Last edited 9 years ago by Frank Mehnert (previous) (diff)

comment:34 by Frank Mehnert, 9 years ago

Could you install this 4.3 test build and try to reproduce the crash? In that case, please attach the VBox.log file, the output of 'dmesg' and the vboxdrv.ko module as you already did before. Thank you!

comment:35 by Frank Mehnert, 9 years ago

Hrmpf. Sorry, that test build might fail to compile the kernel modules. Please use this test build instead. Not my day :-/

Last edited 9 years ago by Frank Mehnert (previous) (diff)

by Christian Hesse, 9 years ago

Attachment: config.gz added

kernel config 4.0.4 (default Arch Linux)

comment:36 by Christian Hesse, 9 years ago

Attached the Linux kernel configuration. It's the default from Arch Linux linux package version 4.0.4-1.

Latst time my system crashed with linux 4.0.2-1 and Virtualbox 4.3.26 (+ patches). Given the fact that it happens really seldom I can not tell whether or not latest versions are still effected. Configuration did not change since then, though.

I am not sure how to reliably test this... Even rebooting the guest twenty times and more in a row without issues does not indicate it is fixed. I will think about it...

Wondering what influence the guest setup has... Does it matter? I took a look at the last crash logs available and saw that the BUG follows:

kernel: device bridge entered promiscuous mode

Where bridge is a bridge interface with static IP and dhcp daemon. Anything else that could have an effect?

by fardog, 9 years ago

Attachment: config.2.gz added

comment:37 by Christian Hesse, 9 years ago

Over and over again Google brings me to an old ticket about a similar issue:

BUG: unable to handle kernel paging request

Is this related? Possibly we have to disable automatic NUMA page balancing by setting pTask->mm->numa_next_scan (src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c, line 1551) for every CPU?

comment:38 by fardog, 9 years ago

Hi frank; I won't be able to test that build until later tonight or tomorrow, but will give it a go. For the time being, I've uploaded my kernel config (for version linux 4.0.2-1, I haven't upgraded to the latest 4.0.3-1 yet, although it looks like 4.0.4-1 is eminent in Arch's repos). This is the version that the crash logs above were from.

Please note: this config.gz was from the running system, which has nosmap set as a boot parameter since that's how I can get virtualbox to run (I depend on it heavily for work); I'm not sure if that shows up in the config file, but I didn't want it to confuse you. The crash logs above are from a different boot, when I was NOT running the nosmap flag.

Thanks!

in reply to:  37 comment:39 by Frank Mehnert, 9 years ago

Replying to eworm:

Over and over again Google brings me to an old ticket about a similar issue:

BUG: unable to handle kernel paging request

Is this related? Possibly we have to disable automatic NUMA page balancing by setting pTask->mm->numa_next_scan (src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c, line 1551) for every CPU?

No, completely unrelated. Look at your kernel crash dump:

  1. CR4: 00000000003427e0, so bit 20 and 21 are set. That means that SMAP is activated.
  2. BUG: unable to handle kernel paging request at 00007f8460fcd000. That means that the kernel is accessing memory which is mapped into userland. This is considered being hacky but for historical reasons, VirtualBox still works this way. For example, on 32-bit hosts it would be not possible to map the complete guest address space into the 1G kernel address space.
  3. EFLAGS: 00010202. That means that bit 18 of EFlags (AC) is clear. But with VBox 4.3.28 this bit is supposed to be set on SMAP-enabled hosts.

That means that the AC flag is somewhere cleared in the kernel code and currently we don't know where. We even installed ArchLinux on a SMAP-enabled laptop, unfortunately no success...

comment:40 by Christian Hesse, 9 years ago

Digging though kernel code I found a place where clac() is called, but there is no stac() before. Possibly that is the place where things go wrong?

by Christian Hesse, 9 years ago

x86_64, smap: call stac() before touching user memory

in reply to:  40 comment:41 by Frank Mehnert, 9 years ago

Replying to eworm:

Digging though kernel code I found a place where clac() is called, but there is no stac() before. Possibly that is the place where things go wrong?

No :-)

It works like this: stac() is for setting the AC flag. If the AC flag is set in R0 then the SMAP check (if R0=kernel is allowed to R3=userland) is disabled. clac() clears the AC flag and therefore enables the SMAP check. The latter is default in recent Linux on Broadwell CPUs.

The place you found is just the last part of an error handler. The code for copying data from user to kernel obviously needs to have the AC flag set to temporarily disable the SMAP check. That's done for instance in copy_user_generic_string (see copy_user_64.S). The copy_user_handle_tail() function is called if there was a normal page fault while accessing the provided user data from the kernel.

comment:42 by enioarda, 9 years ago

I encounter the same issue on my Lenovo L450 on Arch linux:

  • Linux thinkpad 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC 2015 x86_64 GNU/Linux
  • Virtualbox 4.3.28
  • Windows 8.1 x64 guest

It happens on about 20% of starts with smap active very early in the boot process (Windows Logo showing with spinner).

Any hints on how to debug this are much appreciated.

by enioarda, 9 years ago

Attachment: system.2.log added

journalctl of the crash.

by enioarda, 9 years ago

Attachment: VBox_nocrash.log added

NO crash VBox.log

by tekstryder, 9 years ago

Attachment: virtualbox-crash-on-startup added

comment:43 by tekstryder, 9 years ago

Lenovo X1 Carbon 2015 model here (20BS). Arch Linux, VB 4.3.28. Win8.1 guest OS.

I've hit this bug regularly (1/4 virtual machine boots avg) since this report was filed. Also followed duplicate/similar reports regarding Broadwell, but this report seems to have the most relevant info.

I just crashed 3/3 times, and each requires a hard power-off of the host. This is a data-loss-potential bug. I'm surprised to see it unresolved.

Crash info attached.

by tekstryder, 9 years ago

Latest crash log info added

comment:44 by rugubara, 9 years ago

I confirm I have this issue as well on my Haswell Lenovo T540p. Kernel 4.1.1-r1, VB 4.3.28.

Last edited 9 years ago by rugubara (previous) (diff)

by rugubara, 9 years ago

Attachment: panic.log added

adding nosmap to kernel parameters didn't help me. I still got panic tonight

comment:45 by Christian Hesse, 9 years ago

Running Virtualbox 5.0.0 with KVM virtualization and SMAP enabled now. Looks like that does not suffer the issue. I will give it some more testing.

Virtualization "Default" is KVM as well, no?

comment:46 by Frank Mehnert, 9 years ago

VBox 5.0.2 contains more fixes and hopefully fixes all remaining problems with Linux and SMAP.

comment:47 by Christian Hesse, 9 years ago

Running VBox 5.0.0 / 5.0.2 since about four weeks now. No remaining issues with SMAP enabled and KVM virtualization in action.

comment:48 by Frank Mehnert, 9 years ago

Resolution: fixed
Status: newclosed

Thanks!

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use