VirtualBox

Ticket #4486 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

VirtualBox 3.0.2 hard hangs Solaris 10u6 kernel when running HW assisted VMs -> fixed in SVN/3.0.6

Reported by: tallpaul Owned by:
Priority: blocker Component: VMM
Version: VirtualBox 3.0.2 Keywords:
Cc: Guest type: other
Host type: Solaris

Description

After installing the 3.0.2 update (and rebooting to make sure driver structures were consistent), attempting to run previously built and running (under 2.x) virtual machines (Windows 2008, Linux Ubuntu 8.10, Centos) will eventually hard hang the Solaris 10 host kernel it is running upon.

Environment: S10u6 patched to 138889-08 on AMD Opteron 2376, 4Gb memory. VirtualBox 3.0.2 r49928 for SunOS.

Started Win2k8 and Ubuntu guests headless (with HW virtualization enabled). System hard hangs after about 3 hours runtime. Could not break console to MDB (system was booted w/debugger enabled). System does not character echo nor respond to network (ping). No disk activity observed.

Same hang happened almost immediately after starting 3 guest VMs concurrently. All VMs had HVM enabled with one CPU allocated to each guest (IO Apic enabled).

This configuration had been running solidly under VB 2.2.2 for months uptime.

No entries were noted in either the VM logfile(s) nor syslog. Guests could be resumed after host is rebooted. Currently reverting to 2.4.x.

Change History

comment:1 follow-ups: ↓ 4 ↓ 5 Changed 5 years ago by sandervl73

  • Priority changed from major to blocker

comment:2 follow-up: ↓ 3 Changed 5 years ago by bauer40

I had the same problem (hard hangs) when running 3.0.0 on S10_u6. To resolve, I upgraded to 3.0.2 and S10_u7 + most recent recommended patches, ran a 48h stress test and nothing failed.

Now I updated our production server (s10u7 + patches) to 3.0.2 and run productive on that. I will add my experience in about a week.

comment:3 in reply to: ↑ 2 Changed 5 years ago by bauer40

Replying to bauer40:

I forgot to say that I'm using software VMM.

comment:4 in reply to: ↑ 1 Changed 5 years ago by bauer40

Replying to sandervl73:

I had a hard hang of the entire physical machine using VBox 3.0.2 on S10_u7 with latest recommended patches cluster.

Now my system is running using the kernel debugger - so if it hangs again I hopefully can generate a kernel crash dump

comment:5 in reply to: ↑ 1 Changed 5 years ago by bauer40

Replying to sandervl73:

OK, I had another hard hang of my physical server. But the hang was so hard that event F1-A (fall into Kernel debugger) did not work.

So I'm unable to deliver a crash dump of the hanging system, sorry. I downgraded to 2.2.4 to have my production stable again.

comment:6 follow-up: ↓ 7 Changed 5 years ago by herf

Having the same issue with Linux (guest) under Solaris snv118 (host). Crashed hard twice during heavy I/O to Linux guest (heavy I/O over samba and NFS).

Turned off hardware virtualization support and will see if this helps.

comment:7 in reply to: ↑ 6 Changed 5 years ago by bauer40

Replying to herf:

Having the same issue with Linux (guest) under Solaris snv118 (host). Crashed hard twice during heavy I/O to Linux guest (heavy I/O over samba and NFS).

Turned off hardware virtualization support and will see if this helps.

My tests with 3.0.2 on S10_u7 were without hardware virtualisation, and it hang hard. You will probably see the same again.

comment:8 follow-up: ↓ 9 Changed 5 years ago by ramshankar

Could you provide info on what the guest was doing or roughly how long before you get a hang? It would help us in reproducing the problem.

comment:9 in reply to: ↑ 8 Changed 5 years ago by bauer40

Replying to ramshankar:

Could you provide info on what the guest was doing or roughly how long before you get a hang? It would help us in reproducing the problem.

My system was up for about four to five days, running the following VBoxes:

  • Debian 5.0, web Proxy, net and disk activity
  • Debian 5.0, Mail server, net and disk activity
  • Debian 5.0, plain OS, idle
  • Debian 5.0, ultra-low-volume Webserver, idle
  • Solaris 10, plain OS, idle
  • Windows XP, Database server, mostly idle (two users)

I had the same uptime with 3.0.0. Saturday, I upgraded to 3.0.0, and on wednesday the system freezes, but then three or four times on that very day.

Same when I installed 3.0.2. Installation on Saturday, freeze one on thursday, freeze two on friday, then downgraded to the latest 2.x version.

I assume Ticket 4618 is the same, so you might want to combine them.

Peter

comment:10 in reply to: ↑ description Changed 5 years ago by pdurst481

Replying to tallpaul:

After installing the 3.0.2 update (and rebooting to make sure driver structures were consistent), attempting to run previously built and running (under 2.x) virtual machines (Windows 2008, Linux Ubuntu 8.10, Centos) will eventually hard hang the Solaris 10 host kernel it is running upon.

Environment: S10u6 patched to 138889-08 on AMD Opteron 2376, 4Gb memory. VirtualBox 3.0.2 r49928 for SunOS.

I've had some similar experiences, with 3.0.0 to 3.0.4, where the system will hard lock on me. I have tried this with both the Solaris 10u6 and 10u7 host systems on an Ultra 20 (8G ram) and a Dell p380 (6G ram). In all cases, the problem occurs when trying to jumpstart a Solaris Guest system. I created a guest VM using the CLI interface (via a script) and then start it up after logging into the GUI (JDS). After about 340M downloads (flash archive S10u6 or S10u7), the system locks up hard. I haven't been able to do kmdb panic on it, as the systems are remote and I have to rely on other folks to reset them for me. The problem here is consistent and always happens this way. I did have these 2 systems working ok with the 2.2.4 system, so am suspicious that it has something to do with the 3.0.x versions.

Today, I made another discovery. I had ganged 2 switches together for the test lab I was working in and although the u20 wasn't attached to it, it seemed to be affected by it. Once I removed the switch, the s10 jumpstarts for the guest systems finished without problem. These were an 8 port GB switch and a 24 port GB switch. I moved the u20 to another lab and connected it to a 24 100baseT switch, which also had a couple of 8 port 100baseT switches attached to it. The same problem occurred on this network. I moved the system to another lab, with a clean 100baseT switch and it works fine there. At this point, I can't attest to how long they will stay alive, however for now at least, they are working. This would seem to indicate a network layer issue with VBox, at least to my thinking it does.

Hope that helps...

Pete

comment:11 Changed 5 years ago by sandervl73

  • Summary changed from VirtualBox 3.0.2 hard hangs Solaris 10u6 kernel when running HW assisted VMs to VirtualBox 3.0.2 hard hangs Solaris 10u6 kernel when running HW assisted VMs -> fixed in SVN/3.0.6

Try again with the 3.0.6 beta released yesterday.

comment:12 Changed 5 years ago by tallpaul

First impression of 3.0.6Beta1: (on snv121 - Nevada; Intel Q9650 8Gb)

Does not crash Nevada when running. Adversely affects scheduling of other solaris processes; java applet running in Firefox 3.5.1 browser became unresponsive and required killing firefox to unwedge, other processes such as Gnome window manager were very sluggish when guest was running. Back to normal after pausing guest (WinXP 1Gb mem)

Will try S10u7 tonight.

comment:13 Changed 5 years ago by tallpaul

Solaris 10 u7 still hangs under stress test with 3.0.6b1.

3 non-global zones, 3 VB 3.0.6b1 VMs in one NGZ (ubuntu, centos, win2k3), one VB 3.0.6b1 VM in global zone (Win7). All active I/O and CPU.

Hang is different in that interrupt code now appears to be working (character echo still works as well as ICMP PING responses come from host). Was unable to break to kernel debugger though (and obtain crash dump).

comment:14 Changed 5 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use