VirtualBox

Ticket #9555 (reopened defect)

Opened 3 years ago

Last modified 2 years ago

VB 4.1.x will not install on Intel Q67 chipset

Reported by: bqbauer Owned by:
Priority: critical Component: host support
Version: VirtualBox 4.1.2 Keywords:
Cc: Guest type: other
Host type: Solaris

Description

I have a new Dell OptiPlex 990 system with the Intel Q67 chipset on which I've installed Solaris Express snv_151a. Installation of VB 4.1.0 or 4.1.2 causes the system to freeze during the VB kernel module installation process. Nothing is logged in messages and nothing is displayed on the console. The system must be powered off, and then upon reboot it freezes again when the kernel modules are loaded on boot. If I boot the system in verbose mode, no errors are displayed. The only way I've found to recover is by booting from CD, mounting the hard drive, and uninstalling VB 4.1.x. I've tested & duplicated the problem with three clean Solaris Express installs on different hard drives in this system.

VB 4.0.12 installs & functions normally & reliably.

Since there are no logs, I am attaching as much system info as I can think of. I've also tried disabling various features of the BIOS including Vt-x & Vt-d. Details of the host hardware include:

Dell OptiPlex 990, BIOS v. A05, Intel i7-2600 CPU, Intel Q67 chipset, 16GB DDR3 dual channel memory at 1333 MHz, Dual SATA3 1TB hard drives (recognized as SATA2) as zfs mirror, NVidia GeForce 210 video, Intel 82574L add-in PCIe ethernet card

Attachments

prtconf.txt Download (258.6 KB) - added by bqbauer 3 years ago.
prtconf -v output
prtdiag.txt Download (1.1 KB) - added by bqbauer 3 years ago.
prtdiag -v output
psrinfo.txt Download (1.5 KB) - added by bqbauer 3 years ago.
psrinfo -v output

Change History

Changed 3 years ago by bqbauer

prtconf -v output

Changed 3 years ago by bqbauer

prtdiag -v output

Changed 3 years ago by bqbauer

psrinfo -v output

comment:1 Changed 3 years ago by ramshankar

This is a difficult problem to debug because I cannot replicate it here locally. I'm 99% sure this is related to the MP notification code that is introduced in 4.1.x. This code kicks in while loading the host driver (vboxdrv) and since the system hangs in this stage (vboxdrv is loaded during install) it's most likely the culprit.

You could try booting up with kmdb enabled, add "-k" to the Solaris kernel command-line in GRUB (when in GRUB press 'e' instead of ENTER and add this).

Also, setup the kernel deadman timer to /etc/system add:

    set pcplusmp:apic_kmdb_on_nmi=1
    set snooping=1
    set snoop_interval=60000000

snoop_interval is in microseconds, 60 seconds is sufficient. Reboot after adding these and boot with "-k" from GRUB, and see what happens when you boot it with vbox drivers enabled (wait for about 2 minutes after hang).

If you don't have the time for this/feel it's too complicated to setup, I can instrument a debug driver of sorts, with instructions on how to use it, but it will take a while as I currently have other higher priority tasks at hand.

comment:2 Changed 3 years ago by bqbauer

These are easy enough tasks to accomplish. Will try them at the end of a work day ASAP (tomorrow?).

By the way, you probably assumed or know this, but VB 4.1.x installs fine on older i7 platforms running Solaris, such as the i7-920 on the X58 chipset. I don't have access to newer builds of Solaris 11, so is it possible this problem might be somehow addressed in the OS with better support of these newer chipsets?

comment:3 Changed 3 years ago by ramshankar

At this point I have no concrete information about what is going wrong and if it is dependent on particular chipsets. I'd certainly hope not, and that it's a Solaris or VirtualBox kernel bug that just happens to be invoked in certain conditions.

comment:4 Changed 3 years ago by bqbauer

I did all you asked, but got nothing obviously helpful in the logs or on the console. The changes you requested did prevent the system from locking up both during install and at boot. The install process itself froze at the point that would normally lock up the entire system. On boot, the only new behavior I observed was what appears to be an indirect effect:

Sep 7 16:28:31 wopr svc.startd[9]: [ID 636263 daemon.warning] svc:/system/filesystem/usr:default: Method "/lib/svc/method/fs-usr" failed due to signal TERM.

This happened each time I booted the system. The above service DID start after this error and no service was left in a maintenance or error state. After logging in VB was there and would start, but when trying to boot a VM, VB would complain that the kernel modules were not loaded. After uninstalling, the fs-usr complaint did not recur.

One thing I didn't mention before is that in all my numerous tests (over a dozen), VB installed successfully once and it ran fine until I rebooted. After a reboot attempt, the same problem manifested. Without changing anything on the host, I could not successfully reinstall it again, and as originally described--I tried more than one clean install of the OS. I used pkgrm each time to uninstall if it was not a clean OS.

Glad to try something else or a test or beta build of VB. This issue concerns me because it leaves me stuck with 4.0.x. I'll even try a newer OS release or patches to Solaris Express if it can be arranged to assist in resolving this.

I installed VB 4.1.2 on the same host running 64-bit Windows 7 and it works fine. I realize it's a completely different animal, but I had to test it....

comment:5 Changed 3 years ago by ramshankar

The first thing we can try is to isolate the problem down to the MP notification code. I'll prepare a driver with MP notification registration turned off. If this does not hang/panic your system then at least we're one step closer into solving this problem. Would you be willing to test such a driver?

comment:6 Changed 3 years ago by ramshankar

The problem is, nobody else has reported this issue or rather we've not had any system here (in our labs or my local boxes etc.) that exhibit this issue, so it's down to first isolating the problem, then figuring out what is going wrong in the isolated code.

comment:7 Changed 3 years ago by bqbauer

Glad to test whatever I can. I normally don't have any of these one-off problems, which is why I'm thinking it's because of this somewhat new chipset. Could be I'm barking up the wrong tree, but it's been 100% reproducible. I wonder if there are other Solaris systems with VB out there using any of the Intel Series 6 Cougar Point chips to test this with? I have a ThinkPad t410s laptop with the series 5 mobile chipset, but It's not running Solaris. I'll try to test it this weekend with an eSATA drive so I don't mess up the internal disk.

Are the /etc/system changes something worth leaving, or might they cause other problems if I don't comment them?

comment:8 Changed 3 years ago by ramshankar

You can leave it in, shouldn't cause any harm.

comment:9 Changed 3 years ago by ramshankar

EDIT: Deleted. Wrong assumption here, this won't do. The notification code is triggered in VMMR0.r0 not vboxdrv.

comment:10 Changed 3 years ago by bqbauer

I would have to agree that your initial solution wasn't the right one. I tried your instructions and it told me that driver was already loaded. HOWEVER! I found the trigger.

For some reason, I decided to uncheck "Enable C State Control" in the BIOS. Everything works now, every time. Did some web searches and found that this particular BIOS option has a history of causing problems--sometimes Ubuntu won't boot, or Flash doesn't work correctly in Windows. I realize these aren't the same as my problem, but it is the same BIOS option. I found several other issues related to this BIOS setting. I wonder why VB loaded once in a blue moon--perhaps the CPU was in the right C state?

I ran powertop after rebooting with this setting disabled, and things seem pretty normal, so I don't know exactly what part of C states are affected. But VB 4.1.2 loads without complaint and now functions. If "C State Control" causes such random problems, I don't suppose an application developer can solve it. I think we can close this for now, unless I later find something more specific to relate it to VB. I greatly appreciate the time you invested. Perhaps a note in the release notes to disable this in the BIOS if you have problems installing or booting your host?

Now I wish I could figure out why Solaris can't reboot (i.e. init 6) this HW platform. Gotta init 5, then power up....

comment:11 Changed 3 years ago by ramshankar

  • Status changed from new to closed
  • Resolution set to invalid

Interesting, seems the C-state control transitions is probably not handled correctly and/or a buggy chipset. Glad that you have this problem fixed for now.

comment:12 Changed 3 years ago by bqbauer

One thing I'm curious about--why would VB 4.1 trigger this an not an earlier version and no other app I've found? My system keeps dropping off the network (no logs, no errors) with VB 4.1 loaded. Seems otherwise fine. Gotta reboot to recover--it acts like the old lso problem with the e1000g driver, but the workaround for that doesn't fix this.

comment:13 Changed 2 years ago by bqbauer

Update. With the release of VB 4.1.4, even disabling C State Control no longer solved the problem. Workarounds were unsuccessful. Decided to wait for Solaris 11 to be released.

Using VB 4.1.6 running on Solaris 11 (official release v11/11, build 175), all problems relating to this bug appear to be solved. I could even re-enable C State Control in the BIOS without incident. It seems we had to wait for Solaris to catch up to the HW technology, or something in VB 4.1.6 resolved the issue. However, the computer in question quickly developed other problems running Solaris Express, none of which have manifested with Solaris 11.

comment:14 Changed 2 years ago by ramshankar

There were no related changes between VirtualBox 4.1.4 and 4.1.6 in this area.

comment:15 Changed 2 years ago by rameym

  • Status changed from closed to reopened
  • Resolution invalid deleted

I am getting this exact same issue (although on a Dell T1600 w/ the C206 chipset).

In addition to what has been already posted, here is a list of what I have tested:

VirtualBox 4.1.2
Solaris host hangs during boot with C-States turned On.
Solaris host boots normally with C-States turned Off.

VirtualBox 4.1.4
Solaris host hangs during boot with C-States turned On.
Solaris host hangs during boot with C-States turned Off.

VirtualBox 4.1.6
Solaris host hangs during boot with C-States turned On.
Solaris host hangs during boot with C-States turned Off.

ramshankar: I believe you confused versions in your last comment -- in this environment VirtualBox works with 4.1.2 and older and does not work with 4.1.4 and newer. It seems like there was a change from 4.1.2 to 4.1.4 that is causing the problem.

I tried booting in kmdb mode and didn't see anything too obvious in the error log. Of note, solaris will boot up correctly if I specify single-user mode (I'm guessing the VBOX kernel addition isn't loaded yet?).

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use