VirtualBox

Ticket #9029 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

VBox 4.0.8 segfaults under heavy load (segfault at 18 ip b5948f37 sp b6427f50 error 4 in VBoxREM32.so[b5934000+7d000])

Reported by: sergiomacedo Owned by:
Priority: major Component: other
Version: VirtualBox 4.0.8 Keywords: segfault VBoxREM32
Cc: Guest type: Linux
Host type: Linux

Description

I have a host with Core(TM)2 Quad CPU Q9400, 4GB RAM, with a guest using all four cores and 1.5GB RAM, with bridged interface. Both guest and host are Centos4-based, running kernel 2.6.27.45.

During tests with VBox 4.0.x, I'm executing some stress procedures on the guest (dledford-memtest script, a home made sequential disk write/read and a mprime instance for each core) and a "ping -f -s 1500" from the host to the guest.

Usually this procedure runs well for at least 24 hours, however, after VBox 4.0.x we can't keep the machine running under stress for half this time (for a test started at 14:57 local time, we had a crash near 23:30).

May or may not be related, but with VBox 4.0.2, the VM crashed as well and with VBox 4.0.6 the VM crashed WITHOUT the stress procedure... just letting it on was enough to get the same segfault (the host never is directly under stress and when there are other VMs running, they also aren't under stress).

The VBox.log is attached but it doesn't show much after boot time. The crash messages are:

VBox 4.0.2 Apr 20 23:56:11 host kernel: VBoxHeadless[14083]: segfault at 18 ip b5d4af37 sp b65d5f50 error 4 in VBoxREM32.so[b5d36000+7d000]

VBox 4.0.2: May 12 04:45:22 host kernel: VBoxHeadless[21710]: segfault at 18 ip b5e10f37 sp b659df50 error 4 in VBoxREM32.so[b5dfc000+7d000]

VBox 4.0.6: May 30 20:16:01 host kernel: VBoxHeadless[2043]: segfault at 18 ip b5948f37 sp b6427f50 error 4 in VBoxREM32.so[b5934000+7d000]

VBox 4.0.8: May 31 23:44:39 host kernel: VBoxHeadless[9942]: segfault at 18 ip b5d9cf37 sp b6620f50 error 4 in VBoxREM32.so[b5d88000+7d000]

If you need something else, please ask soon because I'll have to roll back the VMs to 3.2.12 as soon as possible.

Attachments

VBox.log Download (44.3 KB) - added by sergiomacedo 3 years ago.
VBox.log.gz Download (78.5 KB) - added by sergiomacedo 3 years ago.
VBox.log related to the core files generated on 2011/06/20.

Change History

Changed 3 years ago by sergiomacedo

comment:1 Changed 3 years ago by frank

We could only debug this issue if you could provide a  core dump. If you can provide one, please contact me via private E-mail at frank _dot_ mehnert _at_ oracle _dot_ com.

comment:2 Changed 3 years ago by sergiomacedo

Ok.

I configured the environment to allow core dumps and restarted the problematic procedure. If things go well (or bad, as you prefer) I should have the core dump by tomorrow.

Thanks for your attention, Frank.

comment:3 Changed 3 years ago by sergiomacedo

Frank, we had a disk failure on the server where the problematic VM is running (one of the members of the RAID1 array returned "invalid CHS 0" but the system continued up and running as expected). After we replaced the disk and rebuilt the array, the VM is up and "burning" apparently without problems.

I'll keep the test running until the end of this week. If nothing happens until then, I'll ask you to close the ticket with a note "Hardware problem" or something.

Thanks for your attention.

comment:4 Changed 3 years ago by frank

Thanks for the additional information!

comment:5 Changed 3 years ago by sergiomacedo

Frank,

it took a little longer but it happened again. Two core files were generated with the event. I'll send the link to them to your private email...

Changed 3 years ago by sergiomacedo

VBox.log related to the core files generated on 2011/06/20.

comment:6 Changed 3 years ago by frank

Thanks for the core dumps. So far there was not much information I could get out of it except that it crashes somewhere in the recompiler (and VBoxREM32 actually says the same). To me this really looks like host memory corruption. Note that your guest RAM is very tight to the limit on 32-bit Linux hosts. If you would switch to a 64-bit host your problems might vanish. But if you have some more time you could do another test:

  • Turn off your VM
  • Do
    VBoxManage setextradata VM_NAME VBoxInternal/PGM/MaxRing3Chunks 4096
    
    (replace VM_NAME by the acutal name of your VM)
  • Start your VM as usual

In VBox 4.0 there is some algorithm which unmaps parts of the guest RAM if the guest requires more than 1GB RAM. This should allow to run guests with even more than 2GB on 32-bit hosts but there might be a bug. The instructions I posted ensure that this new algorithm is disabled. If you get the same stability with VBox 4.0.8 with these setting applied as with VBox 3.2.12, then this would be a strong hint where to search for the problem.

Thanks for your help!

comment:7 Changed 3 years ago by sergiomacedo

Frank,

sorry about the silence. I got stuck on another project and just now could return to this subject.

I'll apply the change and restart the procedures. Last time we almost got one week of stability, so we'll need to wait at least two weeks before we can say the change was effective.

I'll keep you informed.

comment:8 Changed 3 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Most probably also fixed with VBox 4.1.4. Please reopen if still relevant with VBox 4.1.4.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use