VirtualBox

Opened 14 years ago

Closed 13 years ago

#6013 closed defect (fixed)

SLES 10 Linux guest hangs -> retry with 3.1.4

Reported by: Predrag Owned by:
Component: guest smp Version: VirtualBox 3.1.4
Keywords: SLES guest hangs Cc:
Guest type: Linux Host type: Linux

Description

I have SLES Linux 10 host and two SLES 10 Linux guests. Installed Oracle 11g DB (DB size=300GB+) and application on both (testing environment). During a high I/O (some batch jobs e.g.) one of the guests hangs becoming totally unresponsive. RAM size is 12 GB, using dynamic disks. Also tried to allocate all RAM (except 2 GB for host) to one guest. When I used physical machines all worked OK with 4GB RAM. I attached logs for both VM. Any solution?

Attachments (6)

test5 logs.rar (37.0 KB ) - added by Predrag 14 years ago.
VB logs.rar (65.3 KB ) - added by Predrag 14 years ago.
VBox.log (40.6 KB ) - added by Predrag 14 years ago.
VBox.log.1 (41.3 KB ) - added by Predrag 14 years ago.
VBox.log.2 (41.3 KB ) - added by Predrag 14 years ago.
VBox.log.3 (41.0 KB ) - added by Predrag 14 years ago.

Download all attachments as: .zip

Change History (30)

by Predrag, 14 years ago

Attachment: test5 logs.rar added

by Predrag, 14 years ago

Attachment: VB logs.rar added

comment:1 by Predrag, 14 years ago

This is becoming very urgent...

in reply to:  1 comment:2 by Predrag, 14 years ago

Converted dynamic disks to fixed but problem still occurs

comment:3 by Frank Mehnert, 14 years ago

It would be helpful if you could tell us which of the two VMs hang (which log file)? Furthermore you could check if the same hang occurs if you decrease the number of guest CPUs to 1.

And what did you exactly in the guest to provoke the hang? I/O from/to the virtual disk, network I/O or/and I/O over shared folders?

comment:4 by Frank Mehnert, 14 years ago

And: Does the whole VM process hang? If so, forcing a core dump of that VM like described here (Forcing VirtualBox to terminate with a core dump) and sending the core dump to us could help finding the problem. Give me a note if you have such a core dump and I can tell you a server for uploading the file.

by Predrag, 14 years ago

Attachment: VBox.log added

by Predrag, 14 years ago

Attachment: VBox.log.1 added

by Predrag, 14 years ago

Attachment: VBox.log.2 added

by Predrag, 14 years ago

Attachment: VBox.log.3 added

in reply to:  3 comment:5 by Predrag, 14 years ago

Replying to frank:

It would be helpful if you could tell us which of the two VMs hang (which log file)? Furthermore you could check if the same hang occurs if you decrease the number of guest CPUs to 1.

And what did you exactly in the guest to provoke the hang? I/O from/to the virtual disk, network I/O or/and I/O over shared folders?

Both VM hanged, two kinds of high I/O provoke it.
I uploaded logs from VM hanged this morning, around 9:30. It happened during a backup operation (network I/O). Hang also happened during some batch jobs (/O from/to the virtual disk). There's no shared folders.
I decreased number of CPUs from 4 to 1 and I will inform you about result.

in reply to:  4 comment:6 by Predrag, 14 years ago

Replying to frank: > And: Does the whole VM process hang? If so, forcing a core dump of that VM like described here (Forcing VirtualBox to terminate with a core dump) and sending the core dump to us could help finding the problem. Give me a note if you have such a core dump and I can tell you a server for uploading the file.

When hang occur it happens on one VM, second works OK.
I forced a core dump, it is a file of 2GB+, compressed around 450 MB. Should I upload it and where?

in reply to:  3 comment:7 by Predrag, 14 years ago

Replying to frank:

It would be helpful if you could tell us which of the two VMs hang (which log >file)? Furthermore you could check if the same hang occurs if you decrease the >number of guest CPUs to 1.

And what did you exactly in the guest to provoke the hang? I/O from/to the virtual disk, network I/O or/and I/O over shared folders?

It seems that decreasing the number of guest CPUs to 1 work very well (multiprocessing don't work). VT-X is still enabled. We are testing yet but hang didn't occured in situations where it happened with 4 processors. Processor is Intel Xeon CPU E5405 @ 2.00GHz. Am I wrong or it means that VM can use 25% of CPU?
Maybe I mad a mistake on one of the checkboxes for CPU settings?

comment:8 by Frank Mehnert, 14 years ago

The core dump you sent me is useless as you set one guest CPU for that VM. Reading your comments above I assume that the hang does only occur on high I/O with more than one guest CPU enabled.

Such a core dump makes only sense if you take it from a hanging VM session! So if you really want to help debugging this problem then set up 4 guest CPUs, make the VM hang with your I/O operations and then send me the core dump the same way as you already did.

And regarding your last question: On a 4 core host I would never activate 4 guest cores as the virtualization needs some overhead and there are other applications on the host requiring CPU time as well. A better choice would be 3 or 2 cores but nevertheless the guest VM shouldn't hang.

in reply to:  8 ; comment:9 by Predrag, 14 years ago

Replying to frank:

I hope I uploaded an useful core dump file this time for 4 CPU VM (still uploading at the time of writing this - 1.5 GB) Maybe I made one mistake with VM log files. After forcing dump of hanged VM I rebooted it and i couldn't find the right VM log file for hanging session so I uploaded all 3 logs. I sent file names by email.
It seems that another VM that is set to work with 2 CPUs doesn't work well but we are testing yet and will also try to provoke hang.

in reply to:  9 comment:10 by Predrag, 14 years ago

Tested and confirmed that VirtualBox hangs every time during a higher I/O load on VM with more than 1 CPU activated. Also noticed that system time is inaccurate. On VM with 1 CPU there's no such problem. Any solution from you?

comment:11 by Frank Mehnert, 14 years ago

Your last core dump was better but currently there is no solution. The wrong guest time will be most probably fixed in the next VBox maintenance release. So far I suggest you to use only one guest CPU for that VM. VirtualBox will still benefit from multiple host cores as the VMM itself and the virtual devices are multithreaded.

in reply to:  11 ; comment:12 by Predrag, 14 years ago

Is the issue related to guest OS (SLES 10.3 64bit)? SM on guest VM is very important to us.

in reply to:  12 comment:13 by Predrag, 14 years ago

SM - I meant on SMP

comment:14 by Sander van Leeuwen, 14 years ago

Component: otherguest smp
Summary: SLES 10 Linux guest hangsSLES 10 Linux guest hangs -> retry with 3.1.4

Retry with 3.1.4. That version will include an important stability fix for SMP guests.

comment:15 by Sander van Leeuwen, 14 years ago

Please check if 3.1.4 beta 1 solves the problem: http://forums.virtualbox.org/viewtopic.php?f=15&t=27300

in reply to:  15 comment:16 by Predrag, 14 years ago

Replying to sandervl73:

Please check if 3.1.4 beta 1 solves the problem: http://forums.virtualbox.org/viewtopic.php?f=15&t=27300

VirtualBox 3.1.4 beta doesn't solve the problem. VM hangs in the same way with SMP enabled (2 CPU) and the system time is more inaccurate then with version 3.1.3.

comment:17 by Frank Mehnert, 14 years ago

In that case, did you really test an unofficial 3.1.3 test build and if yes, which exact build was it?

in reply to:  17 ; comment:18 by Predrag, 14 years ago

Replying to frank:

In that case, did you really test an unofficial 3.1.3 test build and if yes, which exact build was it?

I tested VirtualBox-3.1-3.1.2_56127_sles10.1-1.x86_64. After post from sandervl73 on 2010-01-29 I downloaded and installed
VirtualBox 3.1-3.1.4_BETA1_57050_sles10.1-1.x86_64

in reply to:  18 comment:19 by Predrag, 14 years ago

SMP doesn't work with version 3.1.4 r57640 neither.
Also guest system time is not accurate (1 sec per minute forward before sync).
I will upload core dump file on ftp://ftp.innotek.de/incoming in a few minutes. File name is core.1246.tar.gz.

comment:20 by Frank Mehnert, 14 years ago

Version: VirtualBox 3.1.2VirtualBox 3.1.4

Analyzing the core dump I saw that the E1000 ethernet card waits for the guest to free more network descriptors. One of the guest CPUs is currently executing code, the other is in halt state. This could be a problem with the E1000 network card emulation. Could you test if your guest works better if you change the network card to PCNet (VM network settings / advanced)?

in reply to:  20 ; comment:21 by Predrag, 14 years ago

Replying to frank:

It seems that setting NC to PCnet_Fast_III solves the problem with hanging. With 2 processors, machine worked under load for 2 days without problem with much better performance than with 1 CPU.
There's just one problem left - system time.

Mar 5 09:19:02 test6 ntpdate[31432]: step time server 10.0.0.x offset -0.615278 sec Mar 5 09:20:01 test6 ntpdate[31484]: step time server 10.0.0.x offset -0.856136 sec Mar 5 09:21:01 test6 ntpdate[31541]: step time server 10.0.0.x offset -1.297398 sec Mar 5 09:22:00 test6 ntpdate[31598]: step time server 10.0.0.x offset -2.188469 sec Mar 5 09:23:02 test6 ntpdate[31654]: step time server 10.0.0.x offset -1.373936 sec Mar 5 09:24:00 test6 ntpdate[31711]: step time server 10.0.0.x offset -1.646268 sec Mar 5 09:25:00 test6 ntpdate[31758]: step time server 10.0.0.x offset -1.749338 sec Mar 5 09:26:01 test6 ntpdate[31810]: step time server 10.0.0.x offset -0.745577 sec Mar 5 09:27:02 test6 ntpdate[31917]: step time server 10.0.0.x offset -0.839320 sec Mar 5 09:28:01 test6 ntpdate[31972]: step time server 10.0.0.x offset -0.545628 sec

Sync with time server is set on 1 minute.

in reply to:  21 comment:22 by Predrag, 14 years ago

Sorry for bad formatting, but it looks OK in Mozilla Firefox

Mar 5 09:19:02 test6 ntpdate[31432]: step time server 10.0.0.x offset -0.615278 sec

Mar 5 09:20:01 test6 ntpdate[31484]: step time server 10.0.0.x offset -0.856136 sec

Mar 5 09:21:01 test6 ntpdate[31541]: step time server 10.0.0.x offset -1.297398 sec

Mar 5 09:22:00 test6 ntpdate[31598]: step time server 10.0.0.x offset -2.188469 sec

Mar 5 09:23:02 test6 ntpdate[31654]: step time server 10.0.0.x offset -1.373936 sec

Mar 5 09:24:00 test6 ntpdate[31711]: step time server 10.0.0.x offset -1.646268 sec

Mar 5 09:25:00 test6 ntpdate[31758]: step time server 10.0.0.x offset -1.749338 sec

Mar 5 09:26:01 test6 ntpdate[31810]: step time server 10.0.0.x offset -0.745577 sec

Mar 5 09:27:02 test6 ntpdate[31917]: step time server 10.0.0.x offset -0.839320 sec

Mar 5 09:28:01 test6 ntpdate[31972]: step time server 10.0.0.x offset -0.545628 sec

comment:23 by Sander van Leeuwen, 14 years ago

Retry with 3.2.10. It contains an SMP performance fix that might apply to your case as well.

comment:24 by Frank Mehnert, 13 years ago

Resolution: fixed
Status: newclosed

No response, closing.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use