VirtualBox

Ticket #4501 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

Host hangs/freezes when zones are booted

Reported by: tjobbins Owned by:
Priority: blocker Component: other
Version: VirtualBox 3.0.2 Keywords: host freeze
Cc: Guest type: Windows
Host type: Solaris

Description

I'm experiencing a show-stopping problem with running VirtualBox 3.0.2 on Solaris hosts. VBox 3 is the latest in a line of releases I've failed to get working reliably on Solaris - various versions of 2 would also hang or panic my host, though with different symptoms to the problem below.

The issue I am finding is that with a VBox guest running, my host will freeze/hang if zones are running on the Solaris host. The problem exists on both Solaris 10 U5 and Solaris 10 U7, and I have tried on two different servers - one running new Intel Nehalem X5570 processors (dual processor, quad core), and another running Intel E7320 processors (quad processor, quad core.)

Freeze/hang symptoms:

The symptoms of the freeze have varied slightly. On my Solaris 10 U5 box (4xE7320 processors), it will simply freeze completely including at the console. On the Solaris 10 U7 (X5570 processors), I lost remote network access but was then sometimes able to login at the console - however it would then freeze completely a couple of minutes later.

In all cases, the box will freeze and not panic. Nothing related to VBox is logged in /var/adm/messages or elsewhere, except for the following message:

Jul 12 14:20:49 host-us8 vboxdrv: [ID 937234 kern.notice] CPUMSetGuestCpuIdFeature: Disabled x2APIC

Note: The above message is logged always, and is not related to the crashing - i.e it appears also when the host does not freeze.

Nothing is logged in VirtualBox's log file after initial bootup. There are no log lines near the time of the host freeze/hang.

I've attached a sample VBox.log. The log shows messages no later than 2 minutes after the VM booted - the VM caused a host freeze about 15 minutes later. There are no loglines near the time of the freeze.

My Configuration:

Guest: Windows XP SP3 guest with 1.25GB of RAM, 12MB video ram, one network adapter in NAT mode, USB disabled, CD mounted from ISO, floppy disabled, Intel VT enabled. Installed on a fixed size VDI disk of 20GB.

Host: Both Solaris 10 U5 and Solaris 10 U7. Virtualbox installed onto ZFS filesystem. Virtualbox running in the Global zone. Virtualbox running as root.

Solaris 10 U5 hardware config: 4 x Quad Core Intel Xeon E7320 processors (16 total cores). 32GB ram. 2 x 500GB SATA disks in UFS root/boot mirror. 2 x 500GB SATA disks in ZFS filesystem (this is where VBox is installed.) 2 x Intel NIC using igb driver.

Solaris 10 U7 hardware config: 2 x Quad Core Intel Xeon X5570 processors (8 real cores + HyperThreading = 16 virtual cores.) 36GB ram. 2 x 73GB SAS drives in ZFS root/boot mirror. 2 x 250GB SATA drives in ZFS filesystem (this is where VBox is installed). 1 x Intel NIC using e1000g driver.

Solaris 10 U5 using kernel 138889-03.

Solaris 10 U7 using kernel 139556-08.

My Testing:

The following tests/situations describe the problem:

  1. Solaris 10 U7: Installed VBox 3.0. Box has 61 zones running. Created a new VM, and got 90% of the way through installing before host freezes.
  1. Solaris 10 U7: Rebooted, and then disabled all zones. Re-installed XP VM and used it successfully for 3 hours.
  1. Solaris 10 U7: Booted zones with XP VM still running, After about 30 zones were booted, host box hangs again.
  1. Solaris 10 U5: Transferred Virtualbox config and XP VM to Solaris 10 U5 box. Installed VirtualBox 3.0.2. Box has 29 zones running. Booted XP VM in Headless mode. Box hangs within 1 minute of XP VM starting up.
  1. Solaris 10 U5: Disabled all zones. Booted XP VM in Headless mode. Used VM successfully for 1 hour.
  1. Solaris 10 U5: With XP VM still running in Headless mode, I started booting zones with a 1 minute pause between each boot. Confirmed that box hung after 19 zones were booted.
  1. Solaris 10 U5: Rebooted, booted XP VM again in Headless mode. Repeated zone boot test with 1 minute delay. This time managed to boot all 29 zones. Box continued running for a further 5 minutes before freezing.
  1. Solaris 10 U5: Networking test: it occurred to me that one side effect of booting zones was the addition of new virtual network interfaces (e1000g0:0, e1000g0:1, etc). So to isolate this, I did the following test: Disabled all zones. Booted XP VM in Headless mode. Ran a script to create a new NIC interface every minute, with ifconfig igb0:1 plumb up 192.168.10.1 .. ifconfig igb0:2 .. etc. Ran the test until 50 interfaces were created, without any crash. So the box hang is not related to virtual network interfaces.

Conclusion:

So I have seen that:

a) Problem exists equally and seemingly identically on both Solaris 10 U5 and Solaris 10 U7.

b) Problem occurs both when VBox is started when zones are already running, and if zones are booted after VBox is running.

c) The exact number of running zones required is not fixed, it has been between 19 and 30 in my tests.

d) I cannot say 100% if it is actually the process of booting/running a zone that causes the problem, or whether having zones booted causes some other activity that causes the problem. But there is a direct, replicatable connection between zones booted and the host crashing.

Attachments

VBox.log Download (36.6 KB) - added by tjobbins 5 years ago.
VBox.log from one time the host froze. Log does not log anything after 2 minutes of VM booting - host froze about 15 minutes later (5 mins after all zones were booted.)
QA_XP_AUTOMATION1.xml Download (8.1 KB) - added by tjobbins 5 years ago.
My VBox guest configuration file

Change History

Changed 5 years ago by tjobbins

VBox.log from one time the host froze. Log does not log anything after 2 minutes of VM booting - host froze about 15 minutes later (5 mins after all zones were booted.)

Changed 5 years ago by tjobbins

My VBox guest configuration file

comment:1 Changed 5 years ago by tjobbins

Additional guest configuration notes:

As well as the guest configuration described above, I have also tried:

  • Bridged networking instead of NAT
  • Dynamic VDI disk instead of fixed.

With the same results.

comment:2 Changed 5 years ago by tjobbins

Also: Tried both GUI and Headless mode with same results

comment:3 Changed 5 years ago by ramshankar

Strange issue, but I understand that all of this is under Solaris 10 hosts. Not OpenSolaris/Nevada. Am I right?

comment:4 Changed 5 years ago by tjobbins

Yes, I have only run Virtualbox on Solaris 10 hosts. Solaris 10 U5 and Solaris 10 U7.

Incidentally, there is a big thread of people with lock up issues in *Solaris (All of Solaris, SXCE and OpenSolaris) on the virtualbox forums here:  http://forums.virtualbox.org/viewtopic.php?f=11&t=20015

In their cases it does not seem to be zone related. I wonder if my own issue is definitely related to zones, or if zones are a secondary effect - that is, maybe it is not to do with whethr zones are running, but is just triggered by a certain host resource usage. So when I load zones I consume more CPU and memory on the host, and perhaps that causes the lock up not zones specifically.

Anyway just a thought. I'm shortly about to try 3.0.4 to see if there is any difference there.

comment:5 Changed 5 years ago by ramshankar

The lock up issue is only for 3.0.x and above. You claim to have this issue right from 2.0.

The issue might actually be with ZFS. I heard from another engineer about ZFS locking fixes which might not yet be in Solaris 10. Are you sure you have patched your system with the latest Solaris 10 patches?

comment:6 Changed 5 years ago by tjobbins

Sorry, I have confused the issue by mentioning 2.x issues. I did have issues on 2.x, but different ones. Not a freeze with the same sympoms as this.

So I do believe the issues in that thread could be related to my issue as well.

My boxes have been patched, but are not bang up to date. My Solaris 10 U7 install is still the same as the General Availability release, so it was about 2 months old when I first stated testing this (U7 was released May 1st.)

I will try patching my U7 box with all Recommended and Security patches, and I will also try 3.0.4 and I'll report back.

Thanks

comment:7 Changed 5 years ago by tjobbins

Oh also, I guess I could test if it's ZFS related by putting my VDIs on UFS disks instead?

My Solaris 10 U5 box has a UFS root/boot SVM mirror, so I could try on that. (My U7 box is ZFS-only.)

I'll give that a go too.

comment:8 Changed 5 years ago by ramshankar

The 3.0.x host hang issue is something we are working on. If you are sure your issue is VirtualBox >= 3.0.x don't spend your time with trying under UFS as that's irrelevant.

comment:9 Changed 5 years ago by ramshankar

Also on Solaris 10 hosts make sure you have as much swap as you have physical RAM. Refer to User Manual section 11.6.2. Though it's not related to a hang it's something you should be aware of when you're using a large setup and presumably deploying several VMs.

comment:10 Changed 5 years ago by tjobbins

Ok thanks.

So the question mark is whether my issue is the same as the one in that thread, or different. They do seems quite likely to be the same, though mine has the zone component. I can test to see if this is relevant by running VBox on a host with no zones but with still with actvity on the host, e.g. an Oracle DB server. If it freezes then too, then my issues must be the same as the forum post; if it still only happens when zones are started, it is something different.

If I get time I'll also try 2.2.4 to confirm if that is any different.

Thanks for the swap:ram tip - I do generally have a 1:1 here as it's also required when running many JVMs and/or Oracle.

comment:11 Changed 5 years ago by dannyand

Hi

I think I've got this too - got about 5 zones on a Sun Fire X4150 with Sol 10 u7

It has crashed twice in the last 2 days while on 3.0.2

I am patching with latest 10_x86_Recommended cluster and have upgraded VBox to 3.0.4

I have also started Solaris under kmdb so I can send a system dump if this is any use?

Cheers

Danny

comment:12 follow-up: ↓ 14 Changed 5 years ago by dannyand

ps. by crashed I mean hung

comment:13 Changed 5 years ago by mister-x

Hi,

I've got the same Problem, too.

As I see, the System does not hang completely. It takes 10 - 20 minutes till you got an output for your input, and when you will start an process, most you got an output like:

fork: resource temporarily unavailably

So it thems that Virtualbox starts to much processes or claims to much System Ressources.

I've tried to get an explorer- or an sar-File, but this is not possible...

Here my Configuration: System: Sun Fire X4150, 16 GB Physical RAM, 16 GB swap, 2x Intel Xeon CPU E5450

Host: Solaris 10 U7 with Recommended Patchcluster and ZFS Root FS

VirtualBox 3.0.4

Guests: Ubuntu 9.04 64Bit, WinXP SP3 32 Bit, Opensolaris 32 Bit, Opensolaris 64 Bit, Network bridged

If I can help with any other Information please ask.

Cheers Denis

comment:14 in reply to: ↑ 12 Changed 4 years ago by dannyand

I am no longer experiencing this issue.

I can't speak for the OP of course but this seems to have been resolved at some point prior to the 3.1 release.

Replying to dannyand:

ps. by crashed I mean hung

comment:15 Changed 4 years ago by ramshankar

tjobbins, is this issue resolved for you as well?

comment:16 Changed 4 years ago by ramshankar

  • Status changed from new to closed
  • Resolution set to fixed

Please reopen this ticket if necessary.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use