VirtualBox

Ticket #13961 (closed defect: fixed)

Opened 3 years ago

Last modified 2 years ago

Unable to handle kernel paging request (SMAP with 4.3.26)

Reported by: eworm Owned by:
Priority: major Component: other
Version: VirtualBox 4.3.26 Keywords:
Cc: Guest type: other
Host type: Linux

Description

My device is a Lenovo Thinkpad X250 (Broadwell CPU) running Linux 4.0rc4. System crashes as soon as virtualbox guest is started. This happens with 4.3.24 and 4.3.26.

I am pretty sure this is not related to the CR4 changes. I compiled Linux 4.0rc4 with CR4 changes reverted and this still happens.

Kernel log are attached.

Attachments

vbox1.log Download (5.8 KB) - added by eworm 3 years ago.
kernel log virtualbox 4.3.24
vbox2.log Download (5.8 KB) - added by eworm 3 years ago.
kernel log virtualbox 4.3.26
vboxdrv.ko.gz Download (141.2 KB) - added by eworm 3 years ago.
vboxdrv.ko.gz (virtualbox 4.3.26, linux 4.0rc4.r0.g06e5801)
diff_smap_2 Download (2.1 KB) - added by frank 3 years ago.
leda-vbox3.log Download (5.3 KB) - added by eworm 3 years ago.
kernel log virtualbox 4.3.26 + diff_smap_2
vboxdrv.ko.2.gz Download (141.2 KB) - added by eworm 3 years ago.
vboxdrv.ko.gz (virtualbox 4.3.26 + diff_smap_2, linux 4.0rc4.r0.g06e5801)
kernel-20150317-211216.log Download (5.5 KB) - added by eworm 3 years ago.
kernel log virtualbox 4.3.26 + diff_smap_2
VBox-20150317-211216.log Download (47.3 KB) - added by eworm 3 years ago.
VBox.log virtualbox 4.3.26 + diff_smap_2
VMMR0.r0.gz Download (393.8 KB) - added by eworm 3 years ago.
VMMR0.r0 from virtualbox 4.3.26
diff_smap_3 Download (3.3 KB) - added by frank 3 years ago.
leda-kernel.log Download (4.8 KB) - added by eworm 3 years ago.
leda-vbox.log Download (48.5 KB) - added by eworm 3 years ago.
vboxdrv.ko.3.gz Download (141.2 KB) - added by eworm 3 years ago.
diff_smap_4 Download (4.0 KB) - added by frank 3 years ago.
leda-journal.log Download (5.0 KB) - added by eworm 2 years ago.
syslog
leda-vbox.2.log Download (47.0 KB) - added by eworm 2 years ago.
vbox.log
vboxdrv.ko.4.gz Download (141.2 KB) - added by eworm 2 years ago.
vboxdrv.ko
journal.log Download (5.0 KB) - added by eworm 2 years ago.
VBox.log Download (47.3 KB) - added by eworm 2 years ago.
vboxdrv.ko.5.gz Download (141.2 KB) - added by eworm 2 years ago.
system.log Download (4.1 KB) - added by fardog 2 years ago.
journalctl log output
Vbox.log Download (53.9 KB) - added by fardog 2 years ago.
vboxdrv.ko.6.gz Download (140.6 KB) - added by fardog 2 years ago.
config.gz Download (39.2 KB) - added by eworm 2 years ago.
kernel config 4.0.4 (default Arch Linux)
config.2.gz Download (39.2 KB) - added by fardog 2 years ago.
0001-x86_64-smap-call-stac-before-touching-user-memory.patch Download (791 bytes) - added by eworm 2 years ago.
x86_64, smap: call stac() before touching user memory
system.2.log Download (3.9 KB) - added by enioarda 2 years ago.
journalctl of the crash.
VBox_nocrash.log Download (97.3 KB) - added by enioarda 2 years ago.
NO crash VBox.log
virtualbox-crash-on-startup Download (4.2 KB) - added by tekstryder 2 years ago.
virtualbox-crash-on-startup-2015-06-24 Download (4.2 KB) - added by tekstryder 2 years ago.
Latest crash log info added
panic.log Download (26.1 KB) - added by rugubara 2 years ago.
adding nosmap to kernel parameters didn't help me. I still got panic tonight

Change History

Changed 3 years ago by eworm

kernel log virtualbox 4.3.24

Changed 3 years ago by eworm

kernel log virtualbox 4.3.26

comment:1 Changed 3 years ago by eworm

I added 'nosmap' and 'nosmep' to host boot parameters. Virtualbox guest now starts without issues.

comment:2 Changed 3 years ago by frank

Could you attach the compiled vboxdrv.ko module from VirtualBox 4.3.26? Also, could you check adding 'nosmep' and 'nosmap' exclusively? Thanks!

comment:3 Changed 3 years ago by eworm

You should increase the upload limit... My vboxdrv.ko exceeds the limit by about 20kB.

I uploaded it to my webserver for now:  http://www.eworm.de/tmp/vboxdrv.ko

This is virtualbox 4.3.26 and linux 4.0rc4.r0.g06e5801.

comment:4 Changed 3 years ago by frank

Downloaded, thanks. If you compress this binary it will not exceed the size limit.

comment:5 Changed 3 years ago by eworm

Having SMEP (supervisor mode execution prevention) enabled is just fine. It's sufficient to have 'nosmap' (to disable supervisor mode access prevention) in boot parameters.

comment:6 Changed 3 years ago by eworm

You are right... I expected it to be compressed, but looks like dkms does not compress.

Uploading compressed vboxdrv.ko for reference.

Changed 3 years ago by eworm

vboxdrv.ko.gz (virtualbox 4.3.26, linux 4.0rc4.r0.g06e5801)

comment:7 Changed 3 years ago by frank

Thanks. Actually I think I know where the problem is and I might have a patch available during the next few hours.

comment:8 Changed 3 years ago by frank

  • Summary changed from unable to handle kernel paging request to Unable to handle kernel paging request (SMAP with 4.3.26)

Changed 3 years ago by frank

comment:9 Changed 3 years ago by frank

Attached a diff for the kernel driver which should fix the problem. After you applied the diff to the VirtualBox kernel driver sources (which are located at /usr/src/vboxhost-4.3.26) please recompile the host kernel modules by

/etc/init.d/vboxdrv setup

and start your VM. Please make sure to run this on a Linux kernel with 'nosmap' and 'nosmep' removed.

comment:10 Changed 3 years ago by eworm

This still crashes. Will attach a new log.

Changed 3 years ago by eworm

kernel log virtualbox 4.3.26 + diff_smap_2

comment:11 Changed 3 years ago by frank

Unexpected. Could you attach the new vboxdrv.ko module please? Thanks!

Changed 3 years ago by eworm

vboxdrv.ko.gz (virtualbox 4.3.26 + diff_smap_2, linux 4.0rc4.r0.g06e5801)

comment:12 Changed 3 years ago by frank

Thanks. Unfortunately I don't understand why EFLAGS.AC is still not set. Could you repeat the experiment and attach all corresponding items from the same VM session:

  • The VBox.log file
  • The Linux kernel log
  • The vboxdrv.ko file if different than vboxdrv.ko.2.gz

This will help me to debug the problem because the VBox.log file contains the load addresses of the VMM modules. Unfortunately we cannot reproduce the problem as we still don't have Broadwell hardware.

comment:13 Changed 3 years ago by eworm

Ok, here we go...

It's not easy to capture VBox.log from a dying machine, but inotify, tail and ssh did the trick. ;)

vboxdrv.ko is unchanged, logs will follow.

Changed 3 years ago by eworm

kernel log virtualbox 4.3.26 + diff_smap_2

Changed 3 years ago by eworm

VBox.log virtualbox 4.3.26 + diff_smap_2

comment:14 Changed 3 years ago by frank

Thanks again. As you used a non-official package, could you also provide the VMMR0.r0 module?

comment:15 Changed 3 years ago by eworm

That is what's found at /usr/lib/virtualbox/VMMR0.r0?

I will attach it in a few seconds. Though I am not sure if this is identical to what I used... The logs were made with a package I compiled myself, I now have installed my distribution's packages. Both were built in a clean chroot.

Let me know if I should repeat my tests.

Changed 3 years ago by eworm

VMMR0.r0 from virtualbox 4.3.26

Changed 3 years ago by frank

comment:16 Changed 3 years ago by frank

Ok. Could you try diff_smap_3 instead of diff_smap_2 and see if you would now be able to start VMs with SMAP enabled? Thanks!

comment:17 Changed 3 years ago by eworm

Still crashes. This was with linux 4.0rc4.r199.gb314aca, virtualbox 4.3.26 + diff_smap_3.

Changed 3 years ago by eworm

Changed 3 years ago by eworm

Changed 3 years ago by eworm

Changed 3 years ago by frank

comment:18 Changed 3 years ago by frank

Next try. We just saw  this changeset which would explain why the other patches did not work. Could you try again? Thank you!

comment:19 Changed 3 years ago by eworm

Looks like that did the trick! Guest is up and running, host is still alive. ;)

Thanks a lot!

comment:20 Changed 3 years ago by frank

And thanks for your patience during testing!

comment:21 Changed 2 years ago by eworm

It's very seldom, but still happens from time to time... Looks like we have a corner case that still crashes the machine. Any ideas? Sadly I can not reproduce, happens about once a week for me.

comment:22 Changed 2 years ago by frank

eworm, that's important. I'm running VBo on a Linux 4.0.0 host and never saw such problems for many weeks now. It would be nice if you could provide at least a VBox.log file together with the output of 'dmesg' and the corresponding vboxdrv.ko as you provided before.

Changed 2 years ago by eworm

syslog

Changed 2 years ago by eworm

vbox.log

Changed 2 years ago by eworm

vboxdrv.ko

comment:23 Changed 2 years ago by eworm

I think I did about a hundred reboot cycles... Finally it crashed. :D Have fun!

comment:24 Changed 2 years ago by frank

Thanks eworm. One more request: Could you also attach the VMMR0.r0 file from your installation? You are using a distribution-specific package therefore I don't have a reference. Thanks!

comment:25 Changed 2 years ago by frank

Actually attaching of VMMR0.r0 shouldn't be necessary. I just downloaded the original ArchLinux package and got a VMMR0.r0 which seems to fit the other files. Investigating...

comment:26 Changed 2 years ago by tannerjfco

I've applied the latest patch in this thread and it has resolved the issue I was encountering with Virtualbox freezing the host. I have yet to encounter any further trouble but I will keep an eye out for the issue eworm mentions and will follow-up if I encounter it. Thanks!

Changed 2 years ago by eworm

Changed 2 years ago by eworm

Changed 2 years ago by eworm

comment:27 Changed 2 years ago by eworm

Just had another crash... Uploaded the logs and kernel module.

Any news on this? This is really annoying. Would be great to have a stable workstation any time soon.

comment:28 Changed 2 years ago by frank

Thanks for the new dump. Looks like the fault was triggered at the exact same place as before. We still don't know how this can happen and try to reproduce the problem.

comment:29 Changed 2 years ago by eworm

Let me know if I can help in one way or another.

Does it help to upload more logs if a crash occurs?

comment:30 Changed 2 years ago by frank

VBox 4.3.28 contains the last code including diff_smap_4. I guess this will still not fix eworms problems but I would like to know if other users have any SMAP problems with VBox 4.3.28.

Changed 2 years ago by fardog

journalctl log output

comment:31 Changed 2 years ago by fardog

Hi frank; having the same issue as eworm on my Thinkpad T450s, using the latest virtualbox 4.3.28, so can confirm the issue isn't fixed. I've uploaded my system.log, and am trying to find the other information such as my virtualbox log, will upload it as I find it.

Changed 2 years ago by fardog

Changed 2 years ago by fardog

comment:32 Changed 2 years ago by fardog

I think that gives you everything you need, but let me know if there's anything else that'll help. Thanks!

comment:33 Changed 2 years ago by frank

fardog and eworm, your log files indicated that your VM processes still crash at the very same position. At the moment we cannot explain this. It's also interesting that only ArchLinux users seem to be affected, at least I'm not aware of users having 4.3.28 installed and having problems with SMAP. One developer installed ArchLinux on a Broadwell laptop and still was not able to reproduce the problem.

Could you attach the Linux kernel configuration?

(removed the last paragraph. I will prepare another test build soon)

Last edited 2 years ago by frank (previous) (diff)

comment:34 Changed 2 years ago by frank

Could you install this 4.3 test build and try to reproduce the crash? In that case, please attach the VBox.log file, the output of 'dmesg' and the vboxdrv.ko module as you already did before. Thank you!

comment:35 Changed 2 years ago by frank

Hrmpf. Sorry, that test build might fail to compile the kernel modules. Please use this test build instead. Not my day :-/

Last edited 2 years ago by frank (previous) (diff)

Changed 2 years ago by eworm

kernel config 4.0.4 (default Arch Linux)

comment:36 Changed 2 years ago by eworm

Attached the Linux kernel configuration. It's the default from Arch Linux linux package version 4.0.4-1.

Latst time my system crashed with linux 4.0.2-1 and Virtualbox 4.3.26 (+ patches). Given the fact that it happens really seldom I can not tell whether or not latest versions are still effected. Configuration did not change since then, though.

I am not sure how to reliably test this... Even rebooting the guest twenty times and more in a row without issues does not indicate it is fixed. I will think about it...

Wondering what influence the guest setup has... Does it matter? I took a look at the last crash logs available and saw that the BUG follows:

kernel: device bridge entered promiscuous mode

Where bridge is a bridge interface with static IP and dhcp daemon. Anything else that could have an effect?

Changed 2 years ago by fardog

comment:37 follow-up: ↓ 39 Changed 2 years ago by eworm

Over and over again Google brings me to an old ticket about a similar issue:

BUG: unable to handle kernel paging request

Is this related? Possibly we have to disable automatic NUMA page balancing by setting pTask->mm->numa_next_scan (src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c, line 1551) for every CPU?

comment:38 Changed 2 years ago by fardog

Hi frank; I won't be able to test that build until later tonight or tomorrow, but will give it a go. For the time being, I've uploaded my kernel config (for version linux 4.0.2-1, I haven't upgraded to the latest 4.0.3-1 yet, although it looks like 4.0.4-1 is eminent in Arch's repos). This is the version that the crash logs above were from.

Please note: this config.gz was from the running system, which has nosmap set as a boot parameter since that's how I can get virtualbox to run (I depend on it heavily for work); I'm not sure if that shows up in the config file, but I didn't want it to confuse you. The crash logs above are from a different boot, when I was NOT running the nosmap flag.

Thanks!

comment:39 in reply to: ↑ 37 Changed 2 years ago by frank

Replying to eworm:

Over and over again Google brings me to an old ticket about a similar issue:

BUG: unable to handle kernel paging request

Is this related? Possibly we have to disable automatic NUMA page balancing by setting pTask->mm->numa_next_scan (src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c, line 1551) for every CPU?

No, completely unrelated. Look at your kernel crash dump:

  1. CR4: 00000000003427e0, so bit 20 and 21 are set. That means that SMAP is activated.
  2. BUG: unable to handle kernel paging request at 00007f8460fcd000. That means that the kernel is accessing memory which is mapped into userland. This is considered being hacky but for historical reasons, VirtualBox still works this way. For example, on 32-bit hosts it would be not possible to map the complete guest address space into the 1G kernel address space.
  3. EFLAGS: 00010202. That means that bit 18 of EFlags (AC) is clear. But with VBox 4.3.28 this bit is supposed to be set on SMAP-enabled hosts.

That means that the AC flag is somewhere cleared in the kernel code and currently we don't know where. We even installed ArchLinux on a SMAP-enabled laptop, unfortunately no success...

comment:40 follow-up: ↓ 41 Changed 2 years ago by eworm

Digging though kernel code I found a place where clac() is called, but there is no stac() before. Possibly that is the place where things go wrong?

Changed 2 years ago by eworm

x86_64, smap: call stac() before touching user memory

comment:41 in reply to: ↑ 40 Changed 2 years ago by frank

Replying to eworm:

Digging though kernel code I found a place where clac() is called, but there is no stac() before. Possibly that is the place where things go wrong?

No :-)

It works like this: stac() is for setting the AC flag. If the AC flag is set in R0 then the SMAP check (if R0=kernel is allowed to R3=userland) is disabled. clac() clears the AC flag and therefore enables the SMAP check. The latter is default in recent Linux on Broadwell CPUs.

The place you found is just the last part of an error handler. The code for copying data from user to kernel obviously needs to have the AC flag set to temporarily disable the SMAP check. That's done for instance in copy_user_generic_string (see copy_user_64.S). The copy_user_handle_tail() function is called if there was a normal page fault while accessing the provided user data from the kernel.

comment:42 Changed 2 years ago by enioarda

I encounter the same issue on my Lenovo L450 on Arch linux:

  • Linux thinkpad 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC 2015 x86_64 GNU/Linux
  • Virtualbox 4.3.28
  • Windows 8.1 x64 guest

It happens on about 20% of starts with smap active very early in the boot process (Windows Logo showing with spinner).

Any hints on how to debug this are much appreciated.

Changed 2 years ago by enioarda

journalctl of the crash.

Changed 2 years ago by enioarda

NO crash VBox.log

Changed 2 years ago by tekstryder

comment:43 Changed 2 years ago by tekstryder

Lenovo X1 Carbon 2015 model here (20BS). Arch Linux, VB 4.3.28. Win8.1 guest OS.

I've hit this bug regularly (1/4 virtual machine boots avg) since this report was filed. Also followed duplicate/similar reports regarding Broadwell, but this report seems to have the most relevant info.

I just crashed 3/3 times, and each requires a hard power-off of the host. This is a data-loss-potential bug. I'm surprised to see it unresolved.

Crash info attached.

Changed 2 years ago by tekstryder

Latest crash log info added

comment:44 Changed 2 years ago by rugubara

I confirm I have this issue as well on my Haswell Lenovo T540p. Kernel 4.1.1-r1, VB 4.3.28.

Last edited 2 years ago by rugubara (previous) (diff)

Changed 2 years ago by rugubara

adding nosmap to kernel parameters didn't help me. I still got panic tonight

comment:45 Changed 2 years ago by eworm

Running Virtualbox 5.0.0 with KVM virtualization and SMAP enabled now. Looks like that does not suffer the issue. I will give it some more testing.

Virtualization "Default" is KVM as well, no?

comment:46 Changed 2 years ago by frank

VBox 5.0.2 contains more fixes and hopefully fixes all remaining problems with Linux and SMAP.

comment:47 Changed 2 years ago by eworm

Running VBox 5.0.0 / 5.0.2 since about four weeks now. No remaining issues with SMAP enabled and KVM virtualization in action.

comment:48 Changed 2 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Thanks!

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use