VirtualBox

Ticket #6505 (closed defect: worksforme)

Opened 4 years ago

Last modified 4 years ago

Hard hang on OpenSolaris

Reported by: RoyK Owned by:
Priority: major Component: other
Version: VirtualBox 3.1.6 Keywords: hang
Cc: Guest type: Linux
Host type: Solaris

Description

Hi all

I have been using VirtualBox with OpenSolaris 111b (2009.06) for some months, and after first fixing an issue with insufficient memory on the host, I thought the problems were solved by that. It seems, however, that this was not really the case alone. At times, random times it may look like, the whole host OS hangs. This may happen with one guest or with more (though no more than four concurrent guests have been tested on this system.) If attached to the console (VGA, keyboard) I can somehow get in contact with the system, but it seems stuck at 100% load and unable to do anything but spin. Interestingly, this problems seems to have been much worse after upgrading to 3.1.6. I had some time lag on guests, so I wanted to try to upgrade. The time lag disappeard, but the hangs are now very common. All guests are all Linux machines (32bit Ubuntu 8.04 LTS Server) using bridged network and local storage on ZFS. There are no entries as far as I can tell in the OpenSolaris logs, and I would be deeply grateful for a solution on this. It really makes vbox quite unusable for anything but playing and annoying for that as well.

The box is a SuperMicro box with 4 gigs of RAM and a dual core Intel CPU sitting on two drives in ZFS mirror. As mentioned, the problems got worse after upgrading to 3.1.6. However, they did not get any better after downgrading back to 3.1.4, so I' quite lost about how to fix this. I'll deeply appreciante hints on how to solve this.

Best regards

roy

Attachments

VBox.log.2 Download (72.6 KB) - added by RoyK 4 years ago.
Log showing the hang

Change History

Changed 4 years ago by RoyK

Log showing the hang

comment:1 Changed 4 years ago by klaus

This looks somewhat like a problem with timekeeping in VirtualBox. The following is pure speculation: the more sophisticated power management code in OpenSolaris might expose an issue which leads to the almost 20 minute host freeze which is visible in the log.

comment:2 Changed 4 years ago by RoyK

Hi all

I can verify that this problem only occurs with VirtualBox. I've been running the server quite hard and it doesn't show instability. With vbox, it may die after only 15-30 minutes. Does anyone know if this might be better with a newer version of OpenSolaris, or should I ditch it for Linux?

roy

comment:3 Changed 4 years ago by ramshankar

Could you please try to reproduce the hang with NAT instead of bridged networking?

comment:4 Changed 4 years ago by RoyK

First of all, why should NAT work better? Also, since this is a server, NAT is not really a good solution.

Second, as the server is in a server farm some 50km from home, and I need to take time off work to get to it, I will try one thing only, which is upgrading to OpenSolaris snv134. If that doesn't work, I won't play around anymore and rather ditch OpenSolaris since Ubuntu Linux works. I really can't waste more time on this error.

PS: The machine is still stable without VMs

roy

comment:5 Changed 4 years ago by ramshankar

  • Priority changed from critical to trivial

If you're reporting an issue that you would like to get fixed, it usually involves eliminating sources of trouble. As for your "why should NAT work better?" question, it's not a question of working better. NAT is userland code while the bridged networking has a kernel driver, which I would first like to eliminate as a possibility of causing this hang that nobody so far has reported.

If you don't have the time or inclination for this, that's fully understandable, as we too have things to work on besides this ticket. Thank you for the report.

comment:6 Changed 4 years ago by RoyK

I understand that workarounds might be a good idea for certain things, but since this server is in production, I really can't spend more time testing. Is there a way to turn on more debugging somewhere? Even if it works with NAT, that's not really a good solution for this setup, and even if it were so, wouldn't it be a good idea to locate and fix bugs like this?

comment:7 Changed 4 years ago by RoyK

Oh, and changing this ticket to 'trivial' is probably the worst of arrogance I've seen in a while. It makes the server hang, the whole host, not just a VM. I have a really hard time seeing the trivialities in that.

comment:8 Changed 4 years ago by ramshankar

  • Priority changed from trivial to major

We cannot make progress on this unless there is co-operation from you with providing information/trying a simple VM config change. Hence the change in priority.

There is no possibility of solving a report when you say "I won't play around anymore" or "I really can't waste more time on this error" just a little bit after opening the report.

Since this is a hang and not a panic/reboot, I assume there will be no kernel cores. But there might be something in "/var/adm/messages". Please upload "cat /var/adm/messages | grep -i vbox" output if any.

Also I never said NAT is a solution. For a kernel hang we must begin by eliminating possible components since there are no other indicators here to go by. And as far as I know there is no major change in this area from 3.1.4 to 3.1.6.

comment:9 Changed 4 years ago by ramshankar

  • Status changed from new to closed
  • Resolution set to worksforme

comment:10 Changed 4 years ago by RoyK

This error is the same on all tested versions of VirtualBox on OpenSolaris. Closing it with worksforme is arrogant at best, since the problem stll persists. I do however choose not to reopen the bug, since I've left VirtualBox for other solutions because of this error.

All the best

roy

comment:11 Changed 4 years ago by frank

You are very keen about incriminating someone to be arrogant but you did not provide any help in searching the cause of the problem. There was zero response from your side for 4 months. You still did not clearly answer Ramshankars question if you experience the same hang with NAT despite the fact that he explained you why he needs this information. Please be aware that we have only limited ressources to provide free support for non-paying users. Thanks for your understanding.

comment:12 Changed 4 years ago by RoyK

I have given all the info I can give, but then, the server is in a server farm outside of town, so staying on the console for long will cost me a bunch of money. The final solution was to install Linux with kvm - that works, and is well supported.

PS: Last I checked, after Oracle took over, VirtualBox wasn't supported for anything 'mission critical', that is, not supported for anything but toys, but then, I may be wrong

roy

comment:13 Changed 4 years ago by frank

Yes, you are 'perhaps' wrong.

comment:14 Changed 4 years ago by RoyK

it'd be nice if you could point me to where this 'perhaps' is explained. I remember the new terms changed when Oracle took over and the click-through said something about 'not for production'

comment:15 Changed 4 years ago by frank

VirtualBox is definitely used in production environments. I'm not sure if your question was meant sarcastic, I hope not.

comment:16 Changed 4 years ago by RoyK

I know it's used in production, but the click-through said it shouldn't be, which was my point.

comment:17 Changed 4 years ago by sandervl73

And which click-through might that be?

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use