VirtualBox

Ticket #5263 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

Host crash with 64-bit Guest on 32-bit SLED10 SP2 on Intel Core 2 Duo

Reported by: AndersKOlsson Owned by:
Priority: blocker Component: VMM/HWACCM
Version: VirtualBox 3.0.8 Keywords: VT-x SLED10
Cc: Guest type: Linux
Host type: Linux

Description

When upgrading from Novel SLED10 SP1 to SP2, VirtualBox has stopped working on all our Intel Core 2 Duo Machines. There is no problem with upgraded AMD Opteron Machines.

We've replicated this with VirtualBox OSE 2.1.4 and 3.0.8 as well as VirtualBox PUEL 3.0.8. The hardware we've tested with is HP xw4400 (BIOS 2.06) and Dell Inspirion D630 (BIOS A16).

The symptoms are that the guest booting halts after the GRUB menu. Either one waits for a few minutes or close the VM, both result in the host machine hanging. The screen remains static, and the only successful action is holding down the power button.

See thread about this issue:  http://forums.virtualbox.org/viewtopic.php?f=7&t=22874

For testing, I've compiled and installed the vboxdrv module with BUILD_TYPE=debug. I've left the vboxnetflt and vboxnetadp out.

I've attached the vboxdrv logs in /var/log/messages and the vbox.log from a whole session started right after the module loaded. That session was left running until the machine crashed.

The Virtual Machine's configuration: HW Virtualization: on Host type: OpenSuSE (64bit) Memory: 512MB Nested Paging: off IO APIC: on ACPI: on PAE: off CPUs: 1 NICs: 3 Internal FastIII 3D Acc: off Primary IDE Master: 1 30GB VDI COM1: Enabled, Disconnected

Guest OS: SLES10 SP2 64-bit Ubuntu 9.04 64-bit has also been tested, with similar, but not identical result. It manages to boot all the way from the CD, but the host still crashes when the VM is shut down.

Attachments

VBox.log Download (39.2 KB) - added by AndersKOlsson 4 years ago.
messages Download (692 bytes) - added by AndersKOlsson 4 years ago.
/var/log/messages part when loading the vboxdrv module
debugbuild-VBox.log Download (34.2 KB) - added by AndersKOlsson 4 years ago.
VBox log from debug build
debugbuild-stdout-stderr Download (296 bytes) - added by AndersKOlsson 4 years ago.
VBox console output from debug build
debugbuild-messages Download (692 bytes) - added by AndersKOlsson 4 years ago.
vboxdrv syslog messages from debug build
VB-3.0.10-VBOX_HWVIRTEX_INIT-VBox.log Download (49.3 KB) - added by AndersKOlsson 4 years ago.
VBox log with VBOX_HWVIRTEX_INIT=local
VBox-local-node1.log Download (38.0 KB) - added by AndersKOlsson 4 years ago.
Log from the first VM, host crashed when starting the second
VBox-local-node2.log Download (37.8 KB) - added by AndersKOlsson 4 years ago.
Log from the Second VM, started alone, since otherwise no log was outputted
config-2.6.16.60-0.42.5-bigsmp Download (68.9 KB) - added by AndersKOlsson 4 years ago.
The Kernel Config of the affected Host System
VBox-3.0.12-node1.log Download (37.9 KB) - added by AndersKOlsson 4 years ago.
Log from 1st VM Instance, crashed the host when shutting down
VBox-3.0.12-node2.log Download (37.4 KB) - added by AndersKOlsson 4 years ago.
Log from 2nd VM Instance, crashed with the host when 1st VM Instance shut down
shuttingDownGuest.hostCrash Download (2.2 KB) - added by AndersFranzen 4 years ago.
vm crash in host when shutting down guest

Change History

Changed 4 years ago by AndersKOlsson

Changed 4 years ago by AndersKOlsson

/var/log/messages part when loading the vboxdrv module

comment:1 follow-up: ↓ 2 Changed 4 years ago by sandervl73

There are many disk errors in the log. E.g.: 00:00:07.982 PIIX3 ATA: LUN#0: disk read error (rc=VERR_EOF iSector=0xc8001 cSectors=0x1)

Is there a problem with the VDI file?

comment:2 in reply to: ↑ 1 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

There are many disk errors in the log. E.g.: 00:00:07.982 PIIX3 ATA: LUN#0: disk read error (rc=VERR_EOF iSector=0xc8001 cSectors=0x1)

Is there a problem with the VDI file?

I've now tested creating a new VDI, but I fail to install to it. The previous VDI was created on one of the working Opteron Machines. I've also tested having no hard drive attached at all, just booting from ISO. When shutting down the VM the same thing happens as with a HD, the host hangs.

To me, the disk read problems seem to be a symptom of the problem I'm facing.

comment:3 Changed 4 years ago by sandervl73

That depends. If the problem occurs on a single machine, then a bad disk can cause host hangs. If however, you see this on many machines, then it's rather unlikely to be related to the disks.

comment:4 follow-up: ↓ 8 Changed 4 years ago by sandervl73

These problems don't occur when turning off VT-x? Maybe there's some kind of conflict with the updates SLED kernel and VT-x. It's a bit mysterious though.

I assume SLED doesn't come with active KVM builtin. The Xen parts (referred to in your forum topic) shouldn't matter as long as you do *not* use the xen kernel.

comment:5 follow-up: ↓ 6 Changed 4 years ago by sandervl73

You can run 32 bits guests properly with VT-x? Perhaps the problem is restricted to 64 bits guests on a 32 bits host. Do you have the option to switch to a 64 bits version of SLED10?

comment:6 in reply to: ↑ 5 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

You can run 32 bits guests properly with VT-x? Perhaps the problem is restricted to 64 bits guests on a 32 bits host. Do you have the option to switch to a 64 bits version of SLED10?

I've tested 32-bit Ubuntu Live-CD now. Without VT-x activated, it works well, and shutting it down doesn't affect the host. However with VT-x activated the same problem as with 64-bit guests appears. I'll go ahead and update the description and header.

Sorry, I don't have access to SLED10 64-bit readily.

comment:7 follow-up: ↓ 9 Changed 4 years ago by sandervl73

Ok, so it's unrelated to 64 bits guests. There's some kind of issue with VT-x with your SLED10 SP2. Kind of hard from here to say what's wrong though.

You've tried the debug build and didn't see any assertions? Build VirtualBox entirely with BUILD_TYPE=debug please.

comment:8 in reply to: ↑ 4 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

These problems don't occur when turning off VT-x? Maybe there's some kind of conflict with the updates SLED kernel and VT-x. It's a bit mysterious though.

I assume SLED doesn't come with active KVM builtin. The Xen parts (referred to in your forum topic) shouldn't matter as long as you do *not* use the xen kernel.

When starting 'virt-manager' on the host, there is one entry "name: localhost, type=qemu, status=disconnected" it has 0 assigned CPUs and no memory. I've tried deleting this entry, with no change in VBs behavior. Also, it returns when rebooting - hence, I think it's a dummy entry. Still, I'll try uninstalling Xen and qemu packages.

Is there any reason other virtualization technologies would cause a problem on VT-x, but not on AMD-V? Is VT-x more limited/buggy?

Hmm, Seems I couldn't change the bug description after creation, can a mod do it please?

comment:9 in reply to: ↑ 7 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

Ok, so it's unrelated to 64 bits guests. There's some kind of issue with VT-x with your SLED10 SP2. Kind of hard from here to say what's wrong though.

You've tried the debug build and didn't see any assertions? Build VirtualBox entirely with BUILD_TYPE=debug please.

OK, made a debug build. Also built the vboxdrv module as debug. Attaching VB log, messages and console output. The output is from Booting an Ubuntu 32-bit ISO without any HD attached, with one Internal NIC.

Is there any flag I can try when loading vboxdrv? Might it be an Intel Speedstep problem? Should I change clocksource for the host? I'm up for trying anything out.

Changed 4 years ago by AndersKOlsson

VBox log from debug build

Changed 4 years ago by AndersKOlsson

VBox console output from debug build

Changed 4 years ago by AndersKOlsson

vboxdrv syslog messages from debug build

comment:10 follow-up: ↓ 11 Changed 4 years ago by sandervl73

VT-x is kind of picky compared to AMD-V. In general it's not really possible to run two hypervisors simultaneously when using either. (unless the host OS provides services to control them)

Could you create a very basic VM and try to boot the 32 bits ubuntu cd? (no audio, no network, no usb)

comment:11 in reply to: ↑ 10 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

VT-x is kind of picky compared to AMD-V. In general it's not really possible to run two hypervisors simultaneously when using either. (unless the host OS provides services to control them)

Could you create a very basic VM and try to boot the 32 bits ubuntu cd? (no audio, no network, no usb)

I've now created such a simple VM. Basically everything except HW Acc turned off, 1024MB Ram + 1 CPU. I've tried with IO APIC and ACPI both off and both on. The result is the same - the host hangs as soon as I shut the VM down.

comment:12 follow-up: ↓ 13 Changed 4 years ago by sandervl73

So the 32 bits Ubuntu VM also hangs after GRUB and kills the host after a few minutes or when closing the VM?

Note that I have to ask such questions, because I don't know where to begin looking otherwise. It's a tedious process of eliminating causes.

comment:13 in reply to: ↑ 12 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

So the 32 bits Ubuntu VM also hangs after GRUB and kills the host after a few minutes or when closing the VM?

Note that I have to ask such questions, because I don't know where to begin looking otherwise. It's a tedious process of eliminating causes.

I have no problem answering all your questions, I'm very grateful for your help on this.

To answer your question: No, not quite, Ubuntu 32-bit ISO boots fine and generally behaves well. It crashes the host after being on for a few minutes or when shutting the VM down.

I've also further investigated the booting problem of the 64-bit SLES10 image I had, and it was like you suggested: the vdi-image seems to have been corrupted, probably as a result of the constant hard-halts. If I replace it with a fresh backup it manages to boot as well. After booting, it behaves like 32-bit Ubuntu, however.

comment:14 Changed 4 years ago by sandervl73

Could you try again with 3.0.10 and define the environment variable 'VBOX_HWVIRTEX_INIT=local' ?

Please attach the VBox.log as well so I can see if the environment variable was seen.

comment:15 Changed 4 years ago by AndersKOlsson

Oh my, that seems to have worked!

I tried 3.0.10 previously, without any change, but setting that environment variable seems to have done a world of difference.

Is this something new in 3.0.10? Will it be default in newer versions? What does it actually do?

Changed 4 years ago by AndersKOlsson

VBox log with VBOX_HWVIRTEX_INIT=local

comment:16 follow-up: ↓ 17 Changed 4 years ago by AndersKOlsson

Sorry, but that joy didn't last long. As soon as I start a second VBox VM, the computer hangs.

comment:17 in reply to: ↑ 16 Changed 4 years ago by AndersKOlsson

Replying to AndersKOlsson:

Sorry, but that joy didn't last long. As soon as I start a second VBox VM, the computer hangs.

Do note that that the host hanging with multiple VMs happens also with HWACCM Init method set to 'global', so it might be another - but not strictly related - problem.

comment:18 follow-up: ↓ 20 Changed 4 years ago by sandervl73

Could you attach the log files of both VMs in the local init method?

The default for Linux is global, which means all host CPUs are put into VMX root mode (VT-x concept). This however puts certain restrictions on what the host OS can do with the CPU. Perhaps Novell has added some code that violates this restriction.

comment:19 Changed 4 years ago by frank

For the record, this is Linux 2.6.16.60-0.42.5-bigsmp, so not that old (in constrast, 2.6.9 of CentOS 4 is really old) but still this could be a problem with the host Linux kernel.

comment:20 in reply to: ↑ 18 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

Could you attach the log files of both VMs in the local init method?

The default for Linux is global, which means all host CPUs are put into VMX root mode (VT-x concept). This however puts certain restrictions on what the host OS can do with the CPU. Perhaps Novell has added some code that violates this restriction.

I don't get a log from the VM that is started last, the host crashes too quickly.

I attach the log from first started VM, when it's been crashed together with the rest of the system. The log from the second VM is when it's been stared first.

Changed 4 years ago by AndersKOlsson

Log from the first VM, host crashed when starting the second

Changed 4 years ago by AndersKOlsson

Log from the Second VM, started alone, since otherwise no log was outputted

comment:21 follow-up: ↓ 22 Changed 4 years ago by frank

Running a self-compiled kernel 2.6.16.62 here on my Debian Lenny (32-bit) host. Running two 64-bit guests in parallel (Ubuntu Hardy and Ubuntu Intrepid, both 64-bit). Could you attach the kernel configuration of your host kernel to this defect? This file can be found in /boot/config-<current-kernel-version>. Thank you!

Changed 4 years ago by AndersKOlsson

The Kernel Config of the affected Host System

comment:22 in reply to: ↑ 21 Changed 4 years ago by AndersKOlsson

Replying to frank:

Running a self-compiled kernel 2.6.16.62 here on my Debian Lenny (32-bit) host. Running two 64-bit guests in parallel (Ubuntu Hardy and Ubuntu Intrepid, both 64-bit). Could you attach the kernel configuration of your host kernel to this defect? This file can be found in /boot/config-<current-kernel-version>. Thank you!

I've added the Kernel config file, but do note that the Kernel of SLED does have quite a few patches applied to it.

comment:23 follow-up: ↓ 24 Changed 4 years ago by sandervl73

Could you make sure nmi support is disabled in this kernel? A badly timed NMI will kill the host in the 32 bits host/64 bits guest case.

comment:24 in reply to: ↑ 23 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

Could you make sure nmi support is disabled in this kernel? A badly timed NMI will kill the host in the 32 bits host/64 bits guest case.

Without any Kernel Parameters it seems the NMI Watchdog is not active. There are 0 0 NMI interrupts in /proc/interrupts.
Tried passing some different nmi_watchdog=<x> parameters to the Host Kernel. The results:
0: The same behavior as before, host crash. /proc/interrupts lists 0 0 NMI interrupts
1: The vboxdrv Kernel Module refuses to insert, complains that NMI Watchdog is active. /proc/interrupts list many thousands of NMI interrupts
2: The same behavior as when set to 0. /proc/interrupts lists 0 0 NMI interrupts

Or was it the Guest System I should check?

comment:25 follow-up: ↓ 26 Changed 4 years ago by sandervl73

It's just the host setting that matters. So 32 bits guests also cause crashes with the local init method, right?

No assertions are triggered with the debug build in the 2x 32 bits guest case? (local init again)

comment:26 in reply to: ↑ 25 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

It's just the host setting that matters. So 32 bits guests also cause crashes with the local init method, right?

Yes, that's the case.

No assertions are triggered with the debug build in the 2x 32 bits guest case? (local init again)

No, I've not seen any.

An update: I've tested 3.0.12. With VBOX_HWVIRTEX_INIT=global, it crashed like before. But with it set to local, it works a bit better than 3.0.10. First off, I notice a difference when a VM is started, the vboxdrv Kernel Module prints:

kernel: VMXR0EnableCPU cpu 0 page (<HEX>) <HEX>
kernel: VMXR0DisableCPU cpu 0

many times per second. Then it alternates with enabling/disabling cpu 1.

Now I could actually start a second VM instance, and it ran just fine. I could even shut it down without problems. The host machine finally crashed when I started the Second VM Instance a second time. Though no Kernel Messages or Assertions were to be seen.

I tried many times to replicate this without success, the host crashed as soon as I started a second VM Instance. Finally, I could get two VM up simultaneously, if the first VM Instance had finished booting and was idle. Generally, it's not stable at all, and can crash both on starting and stopping a second VM Instance. It seems to be timing dependent. Maybe some sort of Race Condition.

Changed 4 years ago by AndersKOlsson

Log from 1st VM Instance, crashed the host when shutting down

Changed 4 years ago by AndersKOlsson

Log from 2nd VM Instance, crashed with the host when 1st VM Instance shut down

comment:27 Changed 4 years ago by sandervl73

That logging kills performance of course.

The problem is certainly timing dependant, but has nothing to do with races. This SLED10 update does something weird, but it's kind of hard to say what it is.

comment:28 follow-up: ↓ 29 Changed 4 years ago by sandervl73

One thing you could try is: (src/VBox/VMM/VMMR0/HWVMXR0.cpp)

- search for 'pVCpu->hwaccm.s.fContextUseFlags &= ~HWACCM_CHANGED_HOST_CONTEXT;' and comment out this line

comment:29 in reply to: ↑ 28 Changed 4 years ago by AndersKOlsson

Replying to sandervl73:

One thing you could try is: (src/VBox/VMM/VMMR0/HWVMXR0.cpp)

- search for 'pVCpu->hwaccm.s.fContextUseFlags &= ~HWACCM_CHANGED_HOST_CONTEXT;' and comment out this line

That change seems to have affected stability negatively. I now get host crashes with only one VM Instance, with local HW Virtualization Init.

comment:30 Changed 4 years ago by Technologov

This looks like a duplicate of bug #5563

comment:31 Changed 4 years ago by AndersKOlsson

I've tried VB 3.1.0 now. No change noticed, the host still crashes on one/two VMs with global/local HWVIRTEX_INIT.

With the debug build, I get an Assertion Exception that stops me from starting any 64-bit VM. The Expression is:

!((uintptr_t)pvSample & 7)

I tried starting the machine as "Linux 2.6" type, without problems (except that it will not boot the 64-bit OS). When I change the type to "Linux 2.6 (64 bit)" the VM aborts directly when trying to start it. I'm attaching the log with the complete Assertion failure.

comment:32 Changed 4 years ago by AndersKOlsson

Some new development: This problem persists on Nehalem-architecture Processors. We get the same problem on HP Z600 Workstations with Xeon E5520 Processors. I've been following the Release notes of the latest releases, but haven't seen anything that fixes anything related to this. I'll still try the latest release, to see if anything happened.

Changed 4 years ago by AndersFranzen

vm crash in host when shutting down guest

comment:33 Changed 4 years ago by AndersFranzen

I have the same problem, running a 2.6.32.11 vanilla k.org 64 bit kernel, with SLED10SP2 32bit. I have a attached a file showing a host crash at unmap_vm_area, this happends when I kill the guest window.

comment:34 Changed 4 years ago by AndersFranzen

By the way I was using VirtualBox OSE 2.1.4

comment:35 Changed 4 years ago by frank

Sorry but VBox 2.1.4 is quite old. Please try a newer version (preferably VBox 3.1.8).

comment:36 Changed 4 years ago by AndersFranzen

I looked a bit at the r0drv for linux and it is quite dependent on KERNEL_VERSION, and I have a hunch that this might be part of the problem, the SLED10 have a kernel version of 2.6.16 but it's patched so heavy it's more like 2.6.22 in some areas.

comment:37 Changed 4 years ago by frank

That might be indeed a problem but actually if the module compiles fine the probability is high that it works. Again, check a newer VBox release (3.1.8) if possible.

comment:38 Changed 4 years ago by Technologov

AndersFranzen: How it works with 3.1.8 or 3.2.6 VBox releases ?

-Technologov

comment:39 Changed 3 years ago by frank

Guys, we recently fixed a problem which could be responsible for the host crashes you observed. If you would like to install a test build, please drop me a mail at frank _dot_ mehnert _at_ oracle _dot_ com. Don't forget to tell me your target distribution if different from SLES10.

comment:40 Changed 3 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Please reopen if still relevant with VBox 3.2.12.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use