VirtualBox

Opened 17 years ago

Closed 14 years ago

Last modified 14 years ago

#616 closed defect (fixed)

Assertion failed in sems-linux.cpp(219) => Fixed in SVN/3.0.6

Reported by: freggy Owned by:
Component: VMM Version: VirtualBox 3.0.4
Keywords: Cc:
Guest type: other Host type: Linux

Description (last modified by Frank Mehnert)

Virtualbox OSE 1.5.0 as included in Mandriva 2008.0 Cooker crashes while installing Mandriva 2008.0 i585 edition via network on Mandriva 2008.0 Cooker x86_64. This can be found in the logs:

00:16:13.398 !!Assertion Failed!!
00:16:13.398 Expression: i < 4096
00:16:13.411 Location  : /home/mandrake/rpm/BUILD/VirtualBox-1.5.0_OSE/src/VBox/Runtime/r3/linux/sems-linux.cpp(219) int RTSemEventSignal(RTSEMEVENTINTERNAL*)
00:16:13.475 iCur=0x1 pIntEventSem=0000000000a5ccf0

Attachments (5)

VBox.log (27.3 KB ) - added by freggy 17 years ago.
Vbox.log
2.6.30-r5config.rtf (53.4 KB ) - added by Mike Mullen 15 years ago.
2.6.30-R5 Kernel Config- Gentoo
2.6.28.7.config (80.0 KB ) - added by tg2861 15 years ago.
Kernel config on a machine with this issue
2.6.30-r4config (50.8 KB ) - added by Malte Starostik 15 years ago.
Kernel configuration 2.6.30-gentoo-r4
config-2.6.30-gentoo-r6 (64.1 KB ) - added by Nick 15 years ago.
Another Gentoo .config

Download all attachments as: .zip

Change History (81)

by freggy, 17 years ago

Attachment: VBox.log added

Vbox.log

comment:1 by freggy, 17 years ago

Actually this seems to happen when I minimise the guest VMs window in GNOME - Mandriva Cooker 2008.0, x86_64.

comment:2 by freggy, 16 years ago

It seems like this problem still exists in 1.5.4. I just had the same crash on Mandriva Cooker x86_64 (Linux 2.6.24-rc6) with Virtualbox 1.5.4:

1193:59:47.780 !!Assertion Failed!! 1193:59:47.780 Expression: i < 4096 1193:59:47.780 Location : /home/mandrake/rpm/BUILD/VirtualBox-1.5.4_OSE/src/VBox/Runtime/r3/linux/sems-linux.cpp(219) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 1193:59:47.810 iCur=0x1 pIntEventSem=00000000009cb000

The crash mentioned in http://vbox.innotek.de/pipermail/vbox-dev/2007-November/000394.html seems to be the same problem too.

comment:3 by freggy, 16 years ago

crash in a code block to improve readability:

1193:59:47.780 
1193:59:47.780 !!Assertion Failed!!
1193:59:47.780 Expression: i < 4096
1193:59:47.780 Location  : /home/mandrake/rpm/BUILD/VirtualBox-1.5.4_OSE/src/VBox/Runtime/r3/linux/sems-linux.cpp(219) int RTSemEventSignal(RTSEMEVENTINTERNAL*)
1193:59:47.810 iCur=0x1 pIntEventSem=00000000009cb000

comment:4 by benjamin9999, 16 years ago

i had this same assert. 1.5.4-binary on linux 2.6.24-rc8 running win2k3 guest. this same box happens to run vmware-server 1.0.3.

comment:5 by freggy, 16 years ago

This still happens very often in Mandriva Cooker 2008.1 (Linux 2.6.24 - Glibc 2.7 - x86_64) and it makes Virtualbox unusable for production use. Can anybody finally take a look at this please?

comment:6 by blueyed, 16 years ago

The bug has been reported for VirtualBox 1.5.6 on Ubuntu at https://launchpad.net/bugs/206615. The host is Ubuntu 8.04 AMD64 (beta), the host Windows XP and it seems to happen after leaving the machine running/idle for a while.

The bug in Launchpad (https://launchpad.net/bugs/206615), provides additional debugging information, like a stacktrace.

comment:7 by Frank Mehnert, 16 years ago

Just to keep you up-to-date: This is a known issue. Still no fix available.

comment:8 by Matteo Pillon, 16 years ago

I think this bug is related with preemptivity enabled in kernel... I compiled a kernel without preemptivity and it disappeared. Just a day of machine uptime, I'll send another report later ;)

comment:9 by Matteo Pillon, 16 years ago

No, it's not preemptivity, it crashes less, but still aborting...

comment:10 by Frank Mehnert, 16 years ago

Description: modified (diff)

comment:11 by Sander van Leeuwen, 16 years ago

priority: majorcritical

comment:12 by Sander van Leeuwen, 16 years ago

Version: VirtualBox 1.5.0VirtualBox 1.6.2

comment:15 by Sander van Leeuwen, 16 years ago

Similar reports in tickets 1733 and 1746.

comment:16 by Frank Mehnert, 16 years ago

Host type: otherLinux

comment:17 by Frank Mehnert, 16 years ago

Component: otherVMM

comment:18 by Aaron Freed, 16 years ago

Happens under Ubuntu Hardy Heron 64-bit edition. Core 2 Duo Penryn at 2.5 GHz, VirtualBox 1.5.6OSE as supplied as a package with Ubuntu Hardy Heron 8.04 repos.

Any other information required, please just ask! I'd sure like to know if/when this gets fixed.

comment:19 by tomcrummey, 16 years ago

I seem to be having a similar issue as described here.

Host OS is CentOS 5.2 Kernel 2.6.18-92.1.10.el5 Guest is Windows Vista SP1 32bit

VM is aborted. Log message: 04:06:26.848 04:06:26.848 !!Assertion Failed!! 04:06:26.848 Expression: i < 4096 04:06:26.848 Location : /home/vbox/vbox-1.6/src/VBox/Runtime/r3/linux/semevent-linux.cpp(186) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 04:06:26.848 iCur=0x1 pThis=000000000815f390

The abort seems to happen when the screensaver on the host kicks in.

Full log available if required.

comment:20 by tomcrummey, 16 years ago

I forgot to put the VirtualBox version number in. It's 1.6.4.

comment:21 by Tim Broberg, 16 years ago

Just saw this under 1.6.6.

!!Assertion Failed!! Expression: i < 4096 Location : /home/vbox/vbox-1.6.6/src/VBox/Runtime/r3/linux/semevent-linux.cpp(186) int RTSemEventSignal(RTSEMEVENTINTERNAL*) iCur=0x1 pThis=00000000016e6e50

Running from VBoxHeadless on a Dell Precision 370 under Fedora 8 amd_64 host os and a Fedora 8 x86 guest (actually 3 Fedora guests, an XP guest, and a Win 2008 server guest in a test network). I left it pinging overnight with the failed VM acting as a gateway using host interface networking so the VM host could see all the network connections. One connection was bridged internally, the other went out a physical ethernet device.

comment:22 by Frank Mehnert, 16 years ago

Version: VirtualBox 1.6.2VirtualBox 2.0.0

comment:23 by raxyx, 15 years ago

Just to remind you: the problem still exists in 2.0.2.
Host: Debian Lenny 64bit
Guests: Debian Lenny 32bit and WinXP 32bit
Hardware: amd64 x2 4800+
Using the official Debian virtualbox-2.0 package

My Debian VMs are meant to be servers, manually started via the standard GUI, and then basically running idle in background somewhere
some with X installed, some without, all with bridged networking
They keep crashing with logs like this:

Executable: /usr/lib/virtualbox/VirtualBox
Arg[0]: /usr/lib/virtualbox/VirtualBox
Arg[1]: -comment
Arg[2]: Debian Lenny Postgresql
Arg[3]: -startvm
Arg[4]: 47f95c56-9d6b-419c-d7b3-f4cda9a2b8a4

!!Assertion Failed!!
Expression: i < 4096
Location  : /home/vbox/vbox-2.0.2/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*)
iCur=0x1 pThis=00000000010416b0

comment:24 by Frank Mehnert, 15 years ago

We are aware of that problem, and yes, it is annoying. Unfortunately, even the next release expected soon will not have a fix for this problem. We will completely overhaul the NAT network stack and this will fix that problem as well. We hope that the new stack will be available this year.

comment:25 by Stefan de Konink, 15 years ago

Is this a nat only problem? In that case I'll just disable my NAT network card and use bridging only.

in reply to:  25 comment:26 by M. Schinkel, 15 years ago

Replying to Skinkie:

Is this a nat only problem? In that case I'll just disable my NAT network card and use bridging only.

I have the same problem here with Virtualbox 2.0.2 Binary on x86-64 linux but I don't use NAT.

!!Assertion Failed!!
Expression: i < 4096
Location  : /home2/vbox/vbox/lnx64-rel/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*)
iCur=0x1 pThis=00007f568002cfe0
Trace/breakpoint trap

in reply to:  24 comment:27 by M. Schinkel, 15 years ago

Replying to frank:

We are aware of that problem, and yes, it is annoying. Unfortunately, even the next release expected soon will not have a fix for this problem. We will completely overhaul the NAT network stack and this will fix that problem as well. We hope that the new stack will be available this year.

Unfortunately this bug renders VirtualBox useless because we cannot rely on the VMs without watching them all the time (or doing some kind of automatic restart). This bug also applies to usage of host only network adapters which are added to a bridge. Is there a chance that this usage case gets fixed even before the new nat stack is merged?

in reply to:  24 comment:28 by M. Schinkel, 15 years ago

Replying to frank:

We are aware of that problem, and yes, it is annoying. Unfortunately, even the next release expected soon will not have a fix for this problem. We will completely overhaul the NAT network stack and this will fix that problem as well. We hope that the new stack will be available this year.

There is also a forum thread here: http://forums.virtualbox.org/viewtopic.php?t=2794

in reply to:  8 comment:29 by M. Schinkel, 15 years ago

Replying to pmatthew:

I think this bug is related with preemptivity enabled in kernel... I compiled a kernel without preemptivity and it disappeared. Just a day of machine uptime, I'll send another report later ;)

The crash occurs regardless of preemption type (on/voluntarily/off). I verified this with kernel 2.6.27.3.

comment:30 by bilbo, 15 years ago

I encountered same problem ...

Is there any chance at least for some quick temporary workaround before the complicated permanent fix? VM crashing every 4 hours or so isn't exactly the best thing ...

comment:31 by Alex, 15 years ago

I have same trouble. Host machine: Fedora9 (x64) Guest: Windows XP SP3 Last strings in the VBox.log:

00:12:05.821 NAT: DHCP offered IP address 10.0.2.15
00:12:05.823 NAT: DHCP offered IP address 10.0.2.15
00:12:05.834 PCNet#0: Init: ss32=1 GCRDRA=0x021f9420[64] GCTDRA=0x021f9020[64]
00:14:52.879 
00:14:52.879 !!Assertion Failed!!
00:14:52.879 Expression: i < 4096
00:14:52.879 Location  : /home/vbox/vbox-2.0.4/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*)
00:14:52.899 iCur=0x1 pThis=00007f432c02c8e0

comment:32 by Alex, 15 years ago

Added to prev. post: I forget to say about my version. I using 2.0.4 (linux64)

comment:33 by Sander van Leeuwen, 15 years ago

Version: VirtualBox 2.0.0VirtualBox 2.0.4

comment:34 by Frank Mehnert, 15 years ago

This annoying bug is not fixed as we are still not able to reproduce it. This happens only on Linux/64 hosts. We would appreciate any hint how to reproduce this assertion. And no, this bug has nothing (at least not directly) to do with NAT. If some of the reporter could generate a core dump this could help as well.

comment:35 by ebini, 15 years ago

Hi,

i have the same problem here. FYI: I'm not using NAT. I'm using hostinterfaces.

Host ist 64 bit Linux (ubuntu 8.10). client is also linux (ubuntu, centos).

and i have a coredump. (zipped about 70MB)

comment:36 by Frank Mehnert, 15 years ago

Could you make it somehow available to me (frank _dot_ mehnert _at_ sun _dot_ com)? Please don't forget to tell which package you are using.

comment:37 by Ciro Iriarte, 15 years ago

Hi, i'm using VB 2.0.4 on OpenSUSE 11.0@x86_64, the same machine is crashing from time to time. The machine uses host interfase (bridging)

00:57:56.326 
00:57:56.326 !!Assertion Failed!!
00:57:56.326 Expression: i < 4096
00:57:56.326 Location  : /home/vbox/vbox-2.0.4/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*)
00:57:56.327 iCur=0x1 pThis=00007f48a004ccc0

comment:38 by Frank Mehnert, 15 years ago

Resolution: fixed
Status: newclosed

2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway.

in reply to:  38 ; comment:39 by M. Schinkel, 15 years ago

Resolution: fixed
Status: closedreopened

Replying to frank:

2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway

Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.

in reply to:  39 comment:40 by M. Schinkel, 15 years ago

Replying to schinkelm:

Replying to frank:

2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway

Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.

I commented on the new ebuild here: http://bugs.gentoo.org/show_bug.cgi?id=248776#c11

in reply to:  39 ; comment:41 by amdg, 15 years ago

Replying to schinkelm:

Replying to frank:

2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway

Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.

Seconding this. I'm running Gentoo on amd64 and I still see the bug (but so far, it has only happened when more than one VM is running).

in reply to:  41 comment:42 by M. Schinkel, 15 years ago

Replying to amdg:

Replying to schinkelm:

Replying to frank:

2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway

Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.

Seconding this. I'm running Gentoo on amd64 and I still see the bug (but so far, it has only happened when more than one VM is running).

I currently run only one VM and have seen the problem on high network loads.

comment:43 by pkerwien, 15 years ago

I'm also seeing this with virtualbox-bin-2.1.4 on Gentoo amd64:

00:10:13.615 PCNet#0: Init: ss32=1 GCRDRA=0x0f9c7000[32] GCTDRA=0x0f934000[16] 01:38:05.274 01:38:05.274 !!Assertion Failed!! 01:38:05.274 Expression: i < 4096 01:38:05.274 Location : /home/vbox/tinderbox/2.1-lnx64-rel/src/VBox/Runtime/r3/linux/semevent-linux.cpp(203) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 01:38:05.310 iCur=0x1 pThis=00000000023b2090

The guest is running Debian 5.0 i386 with a host network interface. This happens just after a few minutes when I access the webserver running on the virtual machine.

comment:44 by Frank Mehnert, 15 years ago

The problem with Gentoo is still that our .run installer is built against a libc < 2.6.

in reply to:  44 comment:45 by Malte Starostik, 15 years ago

Replying to frank:

The problem with Gentoo is still that our .run installer is built against a libc < 2.6.

So if that has been known for several months now, what exactly is the reason for linking the package with such an ancient libc? And if it's about compatibility, why not provide an alternative package that fixes this very annoying bug? I simply can't run more than one VM at a time which kind of undermines my attempts to test some different network setups :-( Thanks in advance for fixing!

comment:46 by Bram Duvigneau, 15 years ago

Same bug with Vbox 2.2.2 (closed source edition) on ArchLinux.

comment:47 by Artem Alupov, 15 years ago

I have same bug with Vbox 2.2.2 on Debian 4.0 :( has stable crash every two days...

core 2.6.26-bpo.1-amd64

628:36:14.255 !!Assertion Failed!! 628:36:14.255 Expression: i < 4096 628:36:14.255 Location : /home/vbox/vbox-2.2.2/src/VBox/Runtime/r3/linux/semevent-linux.cpp(203) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 628:36:14.264 iCur=0x1 pThis=0000000001040330

Have you a solution of problem??

comment:48 by Frank Mehnert, 15 years ago

Debian/Etch uses a libc < 2.6, therefore we have to use our re-implementation of this event semaphore which is obviously buggy. No idea why, contributions are welcome. If you would upgrade to Debian/Lenny the problem would went away ...

comment:49 by Frank Mehnert, 15 years ago

An easy scenario how to trigger this bug as quick as possible would be helpful.

comment:50 by Kirill, 15 years ago

Same bug with VirtualBox 2.2.4 binary on Gentoo x86_64.

comment:51 by Frank Mehnert, 15 years ago

I want to repeat: An easy scenario how to trigger this bug as quick as possible would be helpful.

comment:52 by Malte Starostik, 15 years ago

Well...all I have to do is start two or more VMs and all but one of them will sooner or later die with the assertion failure, usually within five minutes. I've seen it happen with 32-bit WinXP and Win2k3 guests, I can try with other combinations if you want. Basically it's impossible to run more than one guest at a time. Host info: Gentoo Linux (x86_64 on a Core 2 Duo E6750 2.66GHz with 6GB RAM), kernel 2.6.28, glibc 2.8_p20080602-r1, VirtualBox 2.2.2 (haven't tested with 2.2.4 yet, but as Zer0COOL suggest it's still there).

comment:53 by Nick, 15 years ago

I'm still seeing this with VirtualBox 3.0.2 on a Gentoo Linux x86_64 host running kernel 2.6.30 (with Gentoo patches) & glibc 2.10.1. The problem has occurred on OpenSolaris (x86_64), Fedora 11 (x86), & Ubuntu 9.04 (x86) guests. I'm on a C2D P8400 with VT-x, PAE, 3D accel, & Nested Paging enabled. It's occurred both with and withou IO-APIC enabled, and generally happens to me during the OS install phase. I do have one Windows 7 x86_64 guest which didn't run into that issue, but it did require IO-APIC enabled to install.

comment:54 by tg2861, 15 years ago

Confirmed the same issue with 3.0.4.

4GB of memory, dual Opteron CPUs. Software based RAID 1 mirrored SATA drives shared by 3 Windows server guests (Win 2k3 and Win 2k8).

Basically, if all 3 start sometime between a few hours and a few days this will occur. More than 1 guest appears to be the trigger. Each of the guests are legitimate servers (Exchange 2007, AD domain controller, and Symantec SEP console) -- each of these can have substantial I/O bursts (sometimes concurrently).

I have another identical system that only runs 1 VM at a time and it has gone 6+ mos without an issue.

comment:55 by Frank Mehnert, 15 years ago

Version: VirtualBox 2.0.4VirtualBox 3.0.4

comment:56 by Mike Mullen, 15 years ago

I can also comfirm this issue on 3.04.

share VirtualBox # !!Assertion Failed!! Expression: i < 4096 Location : /home/vbox/tinderbox/3.0-lnx64-rel/src/VBox/Runtime/r3/linux/semevent-linux.cpp(203) int RTSemEventSignal(RTSEMEVENTINTERNAL*) iCur=0x1 pThis=000000000099bfe0

[5]- Trace/breakpoint trap ./VBoxHeadless --startvm "Windows2008-Server2" --vrdpport 3387

share VirtualBox # uname -a Linux share 2.6.30-gentoo-r6 #1 SMP Tue Sep 1 03:42:32 CDT 2009 x86_64 AMD Processor model unknown AuthenticAMD GNU/Linux

Happens when running more than 1 VM. Athlon X2 3.0. 8 GB ram.

in reply to:  56 comment:57 by Mike Mullen, 15 years ago

Very very easy to repeat this bug when installing Windows 2008 concurrently, 2 installs did it for me 3-4 times.

comment:58 by renanbirck, 15 years ago

I can reproduce this systematically here by doing anything like copying a file to the VM. I'm struggling to get SP3 in.stalled on my Windows XP VM, because with every heavy I/O it aborts!

This is the only VM I have here. VirtualBox 3.0.4. Core 2 Duo T5550, 2GB of RAM, Arch Linux, kernel 2.6.30

comment:59 by Frank Mehnert, 15 years ago

renanbirck, copying a file over the NAT network? Could you attach a VBox.log file of such a crashed session? I have done wget in three concurrent running VMs but was still not able to reproduce this problem.

comment:60 by Frank Mehnert, 15 years ago

I have more and more the feeling that some special Linux kernel configuration is required to trigger this bug. Since I don't have neither ArchLinux nor Gentoo installed here, could someone of you who is experiencing this bug attach the configuration of his host Linux kernel here?

by Mike Mullen, 15 years ago

Attachment: 2.6.30-r5config.rtf added

2.6.30-R5 Kernel Config- Gentoo

in reply to:  60 comment:61 by Mike Mullen, 15 years ago

Replying to frank:

I have more and more the feeling that some special Linux kernel configuration is required to trigger this bug. Since I don't have neither ArchLinux nor Gentoo installed here, could someone of you who is experiencing this bug attach the configuration of his host Linux kernel here?

Attached. hopefully we can have some other attach theres to compare.

by tg2861, 15 years ago

Attachment: 2.6.28.7.config added

Kernel config on a machine with this issue

by Malte Starostik, 15 years ago

Attachment: 2.6.30-r4config added

Kernel configuration 2.6.30-gentoo-r4

comment:62 by tg2861, 15 years ago

Another config uploaded

comment:63 by Malte Starostik, 15 years ago

Me Too (TM)

by Nick, 15 years ago

Attachment: config-2.6.30-gentoo-r6 added

Another Gentoo .config

comment:64 by Nick, 15 years ago

I was going through all the .configs posted thus far, and one thing standing out is they're all SMP machines (so perhaps some threading issues are present). I ran a several older releases of VirtualBox on my previous laptop with a Pentium-M also running Gentoo (several kernels all the way up to 2.6.28), but I never saw the issue.

comment:65 by Frank Mehnert, 15 years ago

Yes, I'm using an SMP box as well (T9550 @ 2.66GHz). Yesterday I used a 2.6.30.5 kernel with a the adapted config file by tg2861 -- the build run rock solid for hours doing wget guest=>host and wget host=>guest in parallel. Did similar experiments with a Pentium-D @ 3GHz. Are you guys using a CPU with hyperthreading?

comment:66 by tg2861, 15 years ago

I'm running dual Opterons.

CPUInfo says:

processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 HE stepping : 10 cpu MHz : 1992.244 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good bogomips : 3984.48 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp

processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 HE stepping : 10 cpu MHz : 1992.244 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good bogomips : 3984.72 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp

comment:67 by Malte Starostik, 15 years ago

I tried again now. Created four fresh VMs, everything default. Started a simultaneous installation of XP Pro SP3 (32 bit) in all of them from a CD image. Two machines died in the text setup phase while "Setup is copying files...". The other two finished installing. After the first login I copied to contents of the installation CD to My Documents, which is when the third machine died. On a second run with the same setup, the first VM went away right after setup started to load drivers, the second and third one followed suit in the same phase. Again, the fourth one went on running. I re-ran the test with different Windows variants and Linux live CDs and every time all but one VMs sooner or later hit the assertion - usually sooner.

processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz stepping : 11 cpu MHz : 1998.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority bogomips : 5320.64 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz stepping : 11 cpu MHz : 1998.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority bogomips : 5319.97 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

comment:68 by Frank Mehnert, 15 years ago

Finally that was a good test case, was able to reproduce the assertion now. However, that does not necessarily mean that the bug can easily be fixed ...

comment:69 by Frank Mehnert, 15 years ago

Well, we think we finally fixed this problem. For your convenience I've uploaded a new 3.0.6 .run package for Linux/AMD64. This new package is still not linked from the download page (note the different build number 52130 versus 52128). This package only differs in this semaphore fix. Any feedback is welcome. If it works for you then we will probably update the other affected packages as well (Debian 4.0, RHEL5, sles10.1) and change the links.

comment:70 by Frank Mehnert, 15 years ago

Summary: Assertion failed in sems-linux.cpp(219)Assertion failed in sems-linux.cpp(219) => Fixed in SVN

comment:71 by Malte Starostik, 15 years ago

Looks very promising indeed, this build survived all my torturing so far :-) Thanks alot for looking into this!

comment:72 by tg2861, 15 years ago

Fantastic! I'll get it installed this evening. I can't recall ever getting more than a week with all 3 of my VMs running; I'll post updates.

Thanks

comment:73 by Frank Mehnert, 15 years ago

Summary: Assertion failed in sems-linux.cpp(219) => Fixed in SVNAssertion failed in sems-linux.cpp(219) => Fixed in SVN/3.0.6

Marked as fixed in 3.0.6 because I've replaced the packages on the download server and on the webppage. Replaced all affected packages (rhel5-amd64, Debian/Etch-amd64, SLES12-amd64, Linux/.run-amd64).

comment:74 by Nick, 15 years ago

Frank, any chance we can get a little more details on the fix? I submitted a Gentoo bug (285228) to get Portage updated, but downstream would appreciate a little more info (& notification).

comment:75 by Frank Mehnert, 15 years ago

Sure (and thanks btw for notifying the Gentoo people). The fixes are contained in the changesets r22950, r22952, r22953, r22954, r22955, r22956, r22957, r22958, r22959. As written above the reason for this problem was our own implementation of a event semaphore. Older LibCs (version < 2.6) contain a bug of the 64-bit futex code. So for newer Linux distributions we used the generic implementation (Runtime/r3/posix/semevent-posix.cpp). But as we are building our generic Linux package on RHEL4 (to be compatible with a lot of older Linux distributions), the generic package contained out own implementation and therefore this bug.

The problem was that the signalling thread was responsible for adjusting the numbers of waiting threads. This number was used to determine if a thread which executes RTSemEventSignal() has actually to wakeup another thread or if there no threads sleeping. If this thread was preempted just after he woke up a waiting thread it could take some time until the waking thread was running again (especially if the system load is very high). The following happened: One thread A was leaving a critical section with RTSemEventSignal(). Another thread B was waiting in RTSemEventWait() and was woken up by A. A was preempted before it could adjust the number of waiting threads nWaiters. B continued to run and eventually left the critical section with RTSemEventSignal(). Because nWaiters was still not adjusted, B tried to wake up a waiting thread -- but there was no thread waiting, A just had no chance to adjust nWaiters. B was now looping and waiting for some time but as the system load is very high, it took a long time until A was scheduled again. So the general problem was that A had to adjust nWaiters. You can browse the fixed code to see how the problem is solved.

comment:76 by Nick, 15 years ago

Thanks for the details response; portage has been updated to include the new build.

comment:77 by tg2861, 14 years ago

I can confirm that this patch corrected my problems. I've been running for almost 2 weeks and all 3 VMs running on my dual Opteron system are running -- far longer than I'd ever been able to keep all 3 up.

Thanks for taking care of this!

comment:78 by Frank Mehnert, 14 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use