Opened 14 years ago
Closed 14 years ago
#8294 closed defect (fixed)
VBox 4.0.2 - VM's unresponsive (freeze) after 1-3 days on Solaris 11 Express host => Fixed in SVN
Reported by: | tomwaters | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 4.0.2 |
Keywords: | solaris 11 express freeze | Cc: | |
Guest type: | other | Host type: | Solaris |
Description
Hi, I have recently installed Solaris 11 Express from Opensolaris 2009.06. I have 3 VM's CentOS 5.5 and two Server2008R2 VM's.
The VM's are running in headless mode without vrde (ie..VBoxHeadless -startvm Server6 --vrde off) as I connect to the VM's using vnc.
Under opensolaris 2009.06 (111b) the VM's were stable. Under Solaris 11 express, the VM's freeze/become unresponsive. Not all will become unresponsive all the time - the most recent one was a Windows server 2008R2 machine, however the CentOS 5.5 machine has also frozen and another time all three machines froze at the same time.
I can not connect to the VM or ping it (or ssh etc)....it's frozen, however it still shows as being in the running state - both in the GUI and from the command line.
I had the preview window enabled (now disabled after reading a post on here - I'll see if that makes any difference).
Nested Paging is enabled.
Host system is Xeon 3370 on an Intel S3210SHLC mobo. with 8G ECC ram
Let me know if you need additional information or need me to run some tests. I am keen to get this fixed as I'd like to stay on Solaris 11 Express (151), but may need to retur to opensolaris (111b) to have a stable VBox.
Thanks.
Attachments (2)
Change History (20)
by , 14 years ago
Attachment: | vbox_ticket.txt added |
---|
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Host type: | other → Solaris |
---|
by , 14 years ago
Attachment: | vbox_poweroff_error.txt added |
---|
VMSetError: /export/home/vbox/tinderbox/sol-rel/src/VBox/VMM/VMMR3/VM.cpp(3268) int vmR3TrySetState(VM*, const char*, unsigned int, ...); rc=VERR_VM_INVALID_VM_STATE
comment:3 by , 14 years ago
Thanks for that frank. I left it as "Other" as the instructions said to unless I know it applied to just a specific host. Will post future issues against Solaris host.
fyi. The VM hung again after just a few hours this time...and when I tried to power it off it has "hung" the session on my Solaris host...
nas@nas:~$ VBoxManage controlvm Server5 poweroff 0%...10%...
The VBox gui says "Stopping"....and I see some odd errors in the log (attached as "vbox_poweroff_error.txt") saying "VMR3Suspend failed because the current VM state, SUSPENDING, was not found in the state transition table"...
The process is still running but not using any cpu (.01%). I need to kill the process to stop the machine.
fyi - I am running the latest guest additions and preview and Nested paging are disabled...not that it appears to make any difference.
comment:4 by , 14 years ago
And it's frozen again...
Guys, what do you need from me to help debug this? A core dump? if so tell me how and I'll provide it.
I really need to resolve this - please?
comment:5 by , 14 years ago
Please have a look at the user manual section 9.13. A core dump taken when the guest is frozen would indeed help. If that method does not work (perhaps because the guest is frozen), try the other method, see here.
comment:6 by , 14 years ago
Excellent...thanks for that frank...I just uninstalled 4.0.2 and was trying 3.12...but will reinstall 4.0.2 and with for it to hang and do a core dump a per the manual. Thanks again for helping me out with this.
I'll email it to you as soon as I get a dump.
comment:7 by , 14 years ago
Hmmm...may be related to power management... " Since 4.0, there are some more ACPI options available and when you have the GA installed, they become enabled in the VM OS. Open the Guest OS power management and disable the actions for hibernate and suspend."
I disabled all power management features and screensaver and have not had a crash in the last 2 days...fingers crossed.
comment:8 by , 14 years ago
Summary: | VBox 4.0.2 - VM's unresponsive (freeze) after 1-3 days on Solaris 11 Express host → VBox 4.0.2 - VM's unresponsive (freeze) after 1-3 days on Solaris 11 Express host => Fixed in SVN |
---|
Thanks for the feedback! In that case, your bug should be fixed in the upcoming maintenance release.
comment:9 by , 14 years ago
Frank, Great to hear...still running solid...so yep, happy to attribute this as root cause.
Look forward to the next point release.
Thankyou to you and the team for the support.
comment:10 by , 14 years ago
Frank, can you pls. re-open this ticket?
I spoke too soon... it crashed overnight.
VBoxHeadless: error: Code NS_ERROR_CALL_FAILED (0x800706BE) - Call to remote object failed (extended info not available) Context: "COMGETTER(EventSource)(es.asOutParam())" at line 1244 of file VBoxHeadless.cpp VBoxSVC became unavailable, exiting. VBoxHeadless: error: Code NS_ERROR_CALL_FAILED (0x800706BE) - Call to remote object failed (extended info not available) Context: "COMGETTER(EventSource)(es.asOutParam())" at line 1244 of file VBoxHeadless.cpp [1] Illegal Instruction (core dumped) VBoxHeadless -startvm Server5 --vrde off [2]- Done VBoxHeadless -startvm Server6 --vrde off
I have the core dump and will send it through to you frank.
as@nas:/cloud/coredump# ls -l total 282650 -rw------- 1 root media 256401063 2011-02-16 07:49 core.VBoxHeadless.9776 -rw------- 1 root media 32757671 2011-02-16 07:49 core.VBoxSVC.9539
comment:11 by , 14 years ago
Could you upload the VBox.log from this session you took the core file in if you still have it? I know you attached one at the beginning of this ticket, but it'd be better if we had the appropriate log.
comment:12 by , 14 years ago
I suspect this might be an issue with asynchronous IO. Could you enable "Host IO Cache" in your VM storage settings for the controller and re-try?
comment:13 by , 14 years ago
Thanks for getting back to met. I do not have the log as it seems to only keep 3 versions, and I recently deleted and recreated the zpool, somcan not get the old snapshots back...sorry.
I have ticked Host IO Cache for all the guests and restarted them. Will let you know how they go.
Note: I have updated to the latest box release, 4.0.4 Note I have also updated all the guests with the latest guest additions.
Can you outline the likely impact from having this enabled? I read chapter 5 host io caching, and see comments like this may slowdown the host immensely, wasted mem utilization etc.
Ie. Is this a temporary test setting for me to identify the issue or a long term setting?
follow-up: 15 comment:14 by , 14 years ago
The impact will not be terrible on ZFS. The setting is for narrowing down the issue to find the root cause.
comment:15 by , 14 years ago
Sorry about the delays in updating - I needed to powercycle the server a few time as I was moving HBA cards etc...
Have now had it up for 7 days and all the vm's are running perfectly. Previously would have seen the issue within 1-3 days...so looking good.
Also, can not really see and performance issues with running with Host IO cache ticked.
So, looks like you nailed it.
Is there anything else you need from me to debug this?
comment:16 by , 14 years ago
This has been fixed internally and should be available in the next release. Thank you for the report.
comment:17 by , 14 years ago
Brilliant. Thankyou to you and the team for resolving this so promptly. Outstanding work.
Looking forward to the next release.
comment:18 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Anyone home at Oracle?
Meanwhile, another VM has crashed. I checked top, and note that it goes to 100% utilisation of the CPU (one cpu is allocated to this VM) when it fails...see process "1418".
Anyone want to suggest something, anything?