Opened 9 years ago
Closed 8 years ago
#14965 closed defect (fixed)
CPU general protection fault when starting VM in newly released VirtualBox 5.0.12
Reported by: | sdford | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 5.0.12 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Linux |
Description
After upgrading to Virtualbox 5.0.12, we are getting a general protection fault when starting a VM:
Dec 21 20:26:15 HOSTNAME kernel: [2323400.623107] general protection fault: 0000 [#1] SMP Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] Modules linked in: vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c pci_stub iosf_mbi kvm_intel kvm crct10dif_pclmul ppdev crc32_pclmul ghash_clmulni_intel aesni_intel cirrus ttm aes_x86_64 lrw drm_kms_helper gf128mul glue_helper ablk_helper cryptd drm serio_raw syscopyarea sysfillrect pvpanic sysimgblt parport_pc 8250_fintek parport mac_hid i2c_piix4 nls_utf8 isofs 8139too floppy psmouse 8139cp mii pata_acpi [last unloaded: vboxdrv] Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] CPU: 0 PID: 23258 Comm: EMT Tainted: G OE 3.19.0-33-generic #38~14.04.1-Ubuntu Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] task: ffff880036b49d70 ti: ffff88003fc98000 task.ti: ffff88003fc98000 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] RIP: 0010:[<ffffffffc000b45e>] [<ffffffffc000b45e>] 0xffffffffc000b45e Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] RSP: 0018:ffff88003fc9bd48 EFLAGS: 00010206 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] RAX: 000000000000009b RBX: 00000000ffffffdb RCX: 000000000000009b Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003fc9bca0 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] RBP: ffff88003fc9bd88 R08: ffffffff81815108 R09: 000000008000000a Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] R10: 0000000000000009 R11: 0000000000000000 R12: 0000000001000020 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] R13: 0000000000000020 R14: ffff880014dbdf10 R15: 0000000000000000 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] FS: 00007fbb322bb700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] CR2: 00007fbb3205f000 CR3: 000000000a3d8000 CR4: 00000000000406f0 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] Stack: Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] 0000000000000000 000000000000003f ffff88003fc9bd88 ffffffff00000000 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] 0000000000000000 0000000000000000 0000000000000000 ffffc900008f1010 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] ffff88003fc9bda8 ffffffffc001fb3c 0000000000000000 ffffffffc07601c0 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] Call Trace: Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] [<ffffffffc0725600>] ? supdrvIOCtl+0x1fc0/0x3400 [vboxdrv] Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] [<ffffffffc071f541>] ? VBoxDrvLinuxIOCtl_5_0_12+0x121/0x210 [vboxdrv] Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] [<ffffffff811ffc58>] ? do_vfs_ioctl+0x2f8/0x510 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] [<ffffffff81062335>] ? trace_do_page_fault+0x45/0x100 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] [<ffffffff811ffef1>] ? SyS_ioctl+0x81/0xa0 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] [<ffffffff817b6dcd>] ? system_call_fastpath+0x16/0x1b Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] Code: 0e 00 00 00 01 00 89 c1 0f 32 48 c1 e2 20 89 c0 48 09 d0 48 89 05 a3 ac 0e 00 0f 20 e0 48 89 05 81 ac 0e 00 b8 9b 00 00 00 89 c1 <0f> 32 48 c1 e2 20 89 c0 48 09 d0 48 89 05 78 ac 0e 00 b8 80 00 Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] RIP [<ffffffffc000b45e>] 0xffffffffc000b45e Dec 21 20:26:15 HOSTNAME kernel: [2323400.624017] RSP <ffff88003fc9bd48> Dec 21 20:26:15 HOSTNAME kernel: [2323400.849237] ---[ end trace 8dac6f21b9aec12d ]---
Here is VBox.log:
VirtualBox VM 5.0.12 r104815 linux.amd64 (Dec 18 2015 18:20:39) release log 00:00:00.322506 Log opened 2015-12-21T20:26:14.864896000Z 00:00:00.322507 Build Type: release 00:00:00.322510 OS Product: Linux 00:00:00.322511 OS Release: 3.19.0-33-generic 00:00:00.322512 OS Version: #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 00:00:00.322584 DMI Product Name: Standard PC (i440FX + PIIX, 1996) 00:00:00.322599 DMI Product Version: pc-i440fx-trusty 00:00:00.322696 Host RAM: 2001MB total, 1458MB available 00:00:00.322700 Executable: /usr/lib/virtualbox/VBoxHeadless 00:00:00.322701 Process ID: 23249 00:00:00.322701 Package type: LINUX_64BITS_UBUNTU_12_04 00:00:00.495355 Installed Extension Packs: 00:00:00.495401 None installed! 00:00:00.497797 Console: Machine state changed to 'Starting'
The problem appears to be related to running VirtualBox 5.0.12 inside a QEMU KVM virtual machine. Our use-case is to help parallelize automated tests by running multiple vagrant+Virtualbox instances, each isolated in their own QEMU KVM virtual machine. This prevents tests from conflicting with each other.
Workaround
VirtualBox 5.0.10 works perfectly, so we are currently working around this by downgrading to 5.0.10.
Attachments (2)
Change History (13)
comment:1 by , 9 years ago
by , 9 years ago
by , 9 years ago
Attachment: | cpuid_r.txt added |
---|
comment:2 by , 9 years ago
Hi frank, thanks for the reply!
Yes, that is correct. The host kernel that is running VirtualBox is crashing.
That is actually the entire VBox.log file. I re-produced the issue to verify that there is nothing else in that file. Here is the crash and Vbox.log (in case another sample is helpful):
[1464598.571364] general protection fault: 0000 [#1] SMP [1464598.572009] Modules linked in: vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c pci_stub iosf_mbi kvm_intel kvm crct10dif_pclmul crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd cirrus ttm drm_kms_helper serio_raw drm syscopyarea sysfillrect sysimgblt i2c_piix4 pvpanic parport_pc 8250_fintek parport mac_hid nls_utf8 isofs 8139too psmouse floppy 8139cp mii pata_acpi [last unloaded: vboxdrv] [1464598.572009] CPU: 1 PID: 6134 Comm: EMT Tainted: G OE 3.19.0-42-generic #48~14.04.1-Ubuntu [1464598.572009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [1464598.572009] task: ffff880036a3ce80 ti: ffff8800369e8000 task.ti: ffff8800369e8000 [1464598.572009] RIP: 0010:[<ffffffffc000b173>] [<ffffffffc000b173>] 0xffffffffc000b173 [1464598.572009] RSP: 0018:ffff8800369ebd48 EFLAGS: 00010206 [1464598.572009] RAX: 00000000000406e0 RBX: 00000000ffffffdb RCX: 000000000000009b [1464598.572009] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800369ebca0 [1464598.572009] RBP: ffff8800369ebd88 R08: ffffffff818150e8 R09: 0000000049656e69 [1464598.572009] R10: 000000008000000a R11: 0000000000000009 R12: 0000000001000020 [1464598.572009] R13: 0000000000000020 R14: ffff880036a73250 R15: 0000000000000000 [1464598.572009] FS: 00007fdd2a42e700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [1464598.572009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1464598.572009] CR2: 00007fdd2a1cf000 CR3: 000000007963e000 CR4: 00000000000406e0 [1464598.572009] Stack: [1464598.572009] 000000000000003f ffffc900004218c0 ffff8800369ebd88 ffffffff00000000 [1464598.572009] 0000000000000000 0000000000000000 0000000000000000 ffffc900010c2010 [1464598.572009] ffff8800369ebda8 ffffffffc001f657 0000000000000000 ffffffffc08271c0 [1464598.572009] Call Trace: [1464598.572009] [<ffffffffc07ec600>] ? supdrvIOCtl+0x1fc0/0x3400 [vboxdrv] [1464598.572009] [<ffffffff813b2ffc>] ? copy_user_generic_string+0x2c/0x40 [1464598.572009] [<ffffffffc07e6541>] ? VBoxDrvLinuxIOCtl_5_0_12+0x121/0x210 [vboxdrv] [1464598.572009] [<ffffffff811ffd28>] ? do_vfs_ioctl+0x2f8/0x510 [1464598.572009] [<ffffffff81062325>] ? trace_do_page_fault+0x45/0x100 [1464598.572009] [<ffffffff811fffc1>] ? SyS_ioctl+0x81/0xa0 [1464598.572009] [<ffffffff817b770d>] ? system_call_fastpath+0x16/0x1b [1464598.572009] Code: 85 c0 0f 88 56 fd ff ff b9 3a 00 00 00 0f 32 48 c1 e2 20 89 c0 48 09 d0 48 89 05 89 cf 0e 00 0f 20 e0 48 89 05 67 cf 0e 00 b1 9b <0f> 32 48 c1 e2 20 89 c0 b9 80 00 00 c0 48 09 d0 48 89 05 5e cf [1464598.572009] RIP [<ffffffffc000b173>] 0xffffffffc000b173 [1464598.572009] RSP <ffff8800369ebd48> [1464598.810770] ---[ end trace c77bda97ebb32443 ]---
VirtualBox VM 5.0.12 r104815 linux.amd64 (Dec 18 2015 17:08:04) release log 00:00:00.028046 Log opened 2016-01-07T23:02:05.486471000Z 00:00:00.028048 Build Type: release 00:00:00.028054 OS Product: Linux 00:00:00.028055 OS Release: 3.19.0-42-generic 00:00:00.028056 OS Version: #48~14.04.1-Ubuntu SMP Fri Dec 18 10:24:49 UTC 2015 00:00:00.028169 DMI Product Name: Standard PC (i440FX + PIIX, 1996) 00:00:00.028186 DMI Product Version: pc-i440fx-trusty 00:00:00.028280 Host RAM: 2001MB total, 1524MB available 00:00:00.028284 Executable: /usr/lib/virtualbox/VBoxHeadless 00:00:00.028285 Process ID: 6125 00:00:00.028285 Package type: LINUX_64BITS_UBUNTU_14_04 00:00:00.031935 Installed Extension Packs: 00:00:00.031974 None installed! 00:00:00.033118 Console: Machine state changed to 'Starting'
I attached text files including the output of cpuid and cpuid -r. I ran these commands right after the kernel crashed.
comment:3 by , 9 years ago
Ok, we found the problem. There was indeed a change in VBox 5.0.12 which seems to trigger the crash on KVM guests. Looks like KVM bug but it might be possible to work around this.
comment:4 by , 9 years ago
Which version of KVM and are you using and which Linux kernel (KVM host and KVM guest)?
comment:5 by , 9 years ago
Hi frank,
The KVM host is running Ubuntu 14.04 with kernel 3.19.0-33-generic
. kvm --version
on the host returns:
# kvm --version QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.21), Copyright (c) 2003-2008 Fabrice Bellard
That appears to show QEMU version... so I am not sure what version of KVM is actually bundled with 3.19.0, but hopefully this helps.
As for the KVM guests, we experienced problems with the following KVM guest kernels:
- Ubuntu 14.04: 3.19.0-43-generic
- Ubuntu 12.04: 3.13.0-74-generic
- Ubuntu 12:04: 3.2.0-97-virtual
Please let me know if more information will help. Thank you!
comment:6 by , 9 years ago
As written above this is a KVM bug. The workaround would be to detect that VirtualBox runs as KVM guest and change the behavior of VirtualBox. After some internal discussion we decided against this approach.
The problem in more detail: As written above, VirtualBox tries to read the MSR 0x9B (IA32_SMM_MONITOR_CTL). This is an architectural MSR which is present if CPUID.01 / ECX bit 5 or bit 6 are set (VMX or SMX). As KVM has nested virtualization enabled and therefore pretends to support VT-x, this MSR must be accessible and reading from this MSR must not raise a #GP. KVM/QEmu does not behave like real hardware in this case.
Switching to a newer version of KVM might help. But looking at the current QEmu GIT code it looks like they still don't implement this MSR. They implement MSR_IA32_SMBASE (0x9E) but not MSR register 0x9B.
comment:7 by , 8 years ago
It's not a KVM bug; even though the SDM in some places says incorrectly that IA32_SMM_MONITOR_CTL is present if VMX=1, it also says:
The IA32_SMM_MONITOR_CTL MSR is supported only on processors that support the dual-monitor treatment.1
On other processors, accesses to the MSR using RDMSR or WRMSR generate a general-protection fault (#GP(0))
1 Software should consult the VMX capability MSR IA32_VMX_BASIC (see Appendix A.1) to determine whether the dual-monitor treatment is supported.
and KVM does not enable dual-monitor treatment (bit 49 in the MSR is 0). It's not clear to me why VirtualBox would care about the MSR, since it's not an SMM VMX monitor, but treating it as zero would be correct if bit 49 of IA32_VMX_BASIC is zero.
comment:8 by , 8 years ago
Here is what happened, as far as we can tell: For the initial VMX implementations, the SDM (revisions 018-022, January to November 2006) clearly stated about the dual-monitor treatment that "Bit 49 [...] is always read as 1." (Appendix G.1 in that SDM). The next SDM (revision 023, May 2007) said that the "dual-monitor treatment may not be supported by all processors" (24.16 in that SDM). Note the implication that there may be hypervisors written to the SDM which legitimately expect the IA32_SMM_MONITOR_CTL MSR to exist.
It would be reasonable to think that Intel simply forgot to update the MSR reference to include a note that the IA32_SMM_MONITOR_CTL only exists if the VMCS says so. But that's not the case. SDM revisions 23 and 24 (May and August 2007) did say that the MSR is only present if IA32_VMX_BASIC[bit 49] is set. But in revision 25 (November 2007) the MSR documentation was changed to say the MSR is present if CPUID.1.ECX[bit 5 or bit 6] (VMX or SMX) is set, without any further qualification.
The fact that Intel actively removed the IA32_VMX_BASIC dependency suggests that it is the MSR documentation which is correct and the VMX text is not when it claims that the IA32_SMM_MONITOR_CTL MSR is optional on VMX-capable systems.
We all agree that the SDM is untrustworthy, so we need to rely on actual Intel hardware to learn how the Intel architecture is really implemented. The question could be conclusively answered if we had an Intel CPU which implements VMX but not the dual-monitor treatment. Could you please point us to such a processor?
comment:9 by , 8 years ago
Independent from the above discussion the next 5.1.x maintenance release will contain an additional test if the dual-monitor treatment feature is available before the MSR_IA32_SMM_MONITOR_CTL MSR is accessed. This should make VirtualBox 5.1.x work again within a KVM VM.
So the host kernel which runs VirtualBox (inside KVM) crashes when you try to start a VM, is this correct?
The kernel crash happens on rdmsr with RCX=000000000000009b. This seems to be the IA32_SMM_MONITOR_CTL MSR. Could you provide the complete VBox.log file and the output of cpuid as well as cpuid -r running on the Linux kernel which crashes?