#616 closed defect (fixed)
Assertion failed in sems-linux.cpp(219) => Fixed in SVN/3.0.6
Reported by: | freggy | Owned by: | |
---|---|---|---|
Component: | VMM | Version: | VirtualBox 3.0.4 |
Keywords: | Cc: | ||
Guest type: | other | Host type: | Linux |
Description (last modified by )
Virtualbox OSE 1.5.0 as included in Mandriva 2008.0 Cooker crashes while installing Mandriva 2008.0 i585 edition via network on Mandriva 2008.0 Cooker x86_64. This can be found in the logs:
00:16:13.398 !!Assertion Failed!! 00:16:13.398 Expression: i < 4096 00:16:13.411 Location : /home/mandrake/rpm/BUILD/VirtualBox-1.5.0_OSE/src/VBox/Runtime/r3/linux/sems-linux.cpp(219) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 00:16:13.475 iCur=0x1 pIntEventSem=0000000000a5ccf0
Attachments (5)
Change History (81)
by , 17 years ago
comment:1 by , 17 years ago
Actually this seems to happen when I minimise the guest VMs window in GNOME - Mandriva Cooker 2008.0, x86_64.
comment:2 by , 17 years ago
It seems like this problem still exists in 1.5.4. I just had the same crash on Mandriva Cooker x86_64 (Linux 2.6.24-rc6) with Virtualbox 1.5.4:
1193:59:47.780 !!Assertion Failed!! 1193:59:47.780 Expression: i < 4096 1193:59:47.780 Location : /home/mandrake/rpm/BUILD/VirtualBox-1.5.4_OSE/src/VBox/Runtime/r3/linux/sems-linux.cpp(219) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 1193:59:47.810 iCur=0x1 pIntEventSem=00000000009cb000
The crash mentioned in http://vbox.innotek.de/pipermail/vbox-dev/2007-November/000394.html seems to be the same problem too.
comment:3 by , 17 years ago
crash in a code block to improve readability:
1193:59:47.780 1193:59:47.780 !!Assertion Failed!! 1193:59:47.780 Expression: i < 4096 1193:59:47.780 Location : /home/mandrake/rpm/BUILD/VirtualBox-1.5.4_OSE/src/VBox/Runtime/r3/linux/sems-linux.cpp(219) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 1193:59:47.810 iCur=0x1 pIntEventSem=00000000009cb000
comment:4 by , 17 years ago
i had this same assert. 1.5.4-binary on linux 2.6.24-rc8 running win2k3 guest. this same box happens to run vmware-server 1.0.3.
comment:5 by , 17 years ago
This still happens very often in Mandriva Cooker 2008.1 (Linux 2.6.24 - Glibc 2.7 - x86_64) and it makes Virtualbox unusable for production use. Can anybody finally take a look at this please?
comment:6 by , 17 years ago
The bug has been reported for VirtualBox 1.5.6 on Ubuntu at https://launchpad.net/bugs/206615. The host is Ubuntu 8.04 AMD64 (beta), the host Windows XP and it seems to happen after leaving the machine running/idle for a while.
The bug in Launchpad (https://launchpad.net/bugs/206615), provides additional debugging information, like a stacktrace.
comment:7 by , 17 years ago
Just to keep you up-to-date: This is a known issue. Still no fix available.
follow-up: 29 comment:8 by , 17 years ago
I think this bug is related with preemptivity enabled in kernel... I compiled a kernel without preemptivity and it disappeared. Just a day of machine uptime, I'll send another report later ;)
comment:10 by , 17 years ago
Description: | modified (diff) |
---|
comment:11 by , 17 years ago
priority: | major → critical |
---|
comment:12 by , 17 years ago
Version: | VirtualBox 1.5.0 → VirtualBox 1.6.2 |
---|
comment:16 by , 17 years ago
Host type: | other → Linux |
---|
comment:17 by , 17 years ago
Component: | other → VMM |
---|
comment:18 by , 17 years ago
Happens under Ubuntu Hardy Heron 64-bit edition. Core 2 Duo Penryn at 2.5 GHz, VirtualBox 1.5.6OSE as supplied as a package with Ubuntu Hardy Heron 8.04 repos.
Any other information required, please just ask! I'd sure like to know if/when this gets fixed.
comment:19 by , 16 years ago
I seem to be having a similar issue as described here.
Host OS is CentOS 5.2 Kernel 2.6.18-92.1.10.el5 Guest is Windows Vista SP1 32bit
VM is aborted. Log message: 04:06:26.848 04:06:26.848 !!Assertion Failed!! 04:06:26.848 Expression: i < 4096 04:06:26.848 Location : /home/vbox/vbox-1.6/src/VBox/Runtime/r3/linux/semevent-linux.cpp(186) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 04:06:26.848 iCur=0x1 pThis=000000000815f390
The abort seems to happen when the screensaver on the host kicks in.
Full log available if required.
comment:21 by , 16 years ago
Just saw this under 1.6.6.
!!Assertion Failed!! Expression: i < 4096 Location : /home/vbox/vbox-1.6.6/src/VBox/Runtime/r3/linux/semevent-linux.cpp(186) int RTSemEventSignal(RTSEMEVENTINTERNAL*) iCur=0x1 pThis=00000000016e6e50
Running from VBoxHeadless on a Dell Precision 370 under Fedora 8 amd_64 host os and a Fedora 8 x86 guest (actually 3 Fedora guests, an XP guest, and a Win 2008 server guest in a test network). I left it pinging overnight with the failed VM acting as a gateway using host interface networking so the VM host could see all the network connections. One connection was bridged internally, the other went out a physical ethernet device.
comment:22 by , 16 years ago
Version: | VirtualBox 1.6.2 → VirtualBox 2.0.0 |
---|
comment:23 by , 16 years ago
Just to remind you: the problem still exists in 2.0.2.
Host: Debian Lenny 64bit
Guests: Debian Lenny 32bit and WinXP 32bit
Hardware: amd64 x2 4800+
Using the official Debian virtualbox-2.0 package
My Debian VMs are meant to be servers, manually started via the standard GUI, and then basically running idle in background somewhere
some with X installed, some without, all with bridged networking
They keep crashing with logs like this:
Executable: /usr/lib/virtualbox/VirtualBox Arg[0]: /usr/lib/virtualbox/VirtualBox Arg[1]: -comment Arg[2]: Debian Lenny Postgresql Arg[3]: -startvm Arg[4]: 47f95c56-9d6b-419c-d7b3-f4cda9a2b8a4 !!Assertion Failed!! Expression: i < 4096 Location : /home/vbox/vbox-2.0.2/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*) iCur=0x1 pThis=00000000010416b0
follow-ups: 27 28 comment:24 by , 16 years ago
We are aware of that problem, and yes, it is annoying. Unfortunately, even the next release expected soon will not have a fix for this problem. We will completely overhaul the NAT network stack and this will fix that problem as well. We hope that the new stack will be available this year.
follow-up: 26 comment:25 by , 16 years ago
Is this a nat only problem? In that case I'll just disable my NAT network card and use bridging only.
comment:26 by , 16 years ago
Replying to Skinkie:
Is this a nat only problem? In that case I'll just disable my NAT network card and use bridging only.
I have the same problem here with Virtualbox 2.0.2 Binary on x86-64 linux but I don't use NAT.
!!Assertion Failed!! Expression: i < 4096 Location : /home2/vbox/vbox/lnx64-rel/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*) iCur=0x1 pThis=00007f568002cfe0 Trace/breakpoint trap
comment:27 by , 16 years ago
Replying to frank:
We are aware of that problem, and yes, it is annoying. Unfortunately, even the next release expected soon will not have a fix for this problem. We will completely overhaul the NAT network stack and this will fix that problem as well. We hope that the new stack will be available this year.
Unfortunately this bug renders VirtualBox useless because we cannot rely on the VMs without watching them all the time (or doing some kind of automatic restart). This bug also applies to usage of host only network adapters which are added to a bridge. Is there a chance that this usage case gets fixed even before the new nat stack is merged?
comment:28 by , 16 years ago
Replying to frank:
We are aware of that problem, and yes, it is annoying. Unfortunately, even the next release expected soon will not have a fix for this problem. We will completely overhaul the NAT network stack and this will fix that problem as well. We hope that the new stack will be available this year.
There is also a forum thread here: http://forums.virtualbox.org/viewtopic.php?t=2794
comment:29 by , 16 years ago
Replying to pmatthew:
I think this bug is related with preemptivity enabled in kernel... I compiled a kernel without preemptivity and it disappeared. Just a day of machine uptime, I'll send another report later ;)
The crash occurs regardless of preemption type (on/voluntarily/off). I verified this with kernel 2.6.27.3.
comment:30 by , 16 years ago
I encountered same problem ...
Is there any chance at least for some quick temporary workaround before the complicated permanent fix? VM crashing every 4 hours or so isn't exactly the best thing ...
comment:31 by , 16 years ago
I have same trouble. Host machine: Fedora9 (x64) Guest: Windows XP SP3 Last strings in the VBox.log:
00:12:05.821 NAT: DHCP offered IP address 10.0.2.15 00:12:05.823 NAT: DHCP offered IP address 10.0.2.15 00:12:05.834 PCNet#0: Init: ss32=1 GCRDRA=0x021f9420[64] GCTDRA=0x021f9020[64] 00:14:52.879 00:14:52.879 !!Assertion Failed!! 00:14:52.879 Expression: i < 4096 00:14:52.879 Location : /home/vbox/vbox-2.0.4/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 00:14:52.899 iCur=0x1 pThis=00007f432c02c8e0
comment:32 by , 16 years ago
Added to prev. post: I forget to say about my version. I using 2.0.4 (linux64)
comment:33 by , 16 years ago
Version: | VirtualBox 2.0.0 → VirtualBox 2.0.4 |
---|
comment:34 by , 16 years ago
This annoying bug is not fixed as we are still not able to reproduce it. This happens only on Linux/64 hosts. We would appreciate any hint how to reproduce this assertion. And no, this bug has nothing (at least not directly) to do with NAT. If some of the reporter could generate a core dump this could help as well.
comment:35 by , 16 years ago
Hi,
i have the same problem here. FYI: I'm not using NAT. I'm using hostinterfaces.
Host ist 64 bit Linux (ubuntu 8.10). client is also linux (ubuntu, centos).
and i have a coredump. (zipped about 70MB)
comment:36 by , 16 years ago
Could you make it somehow available to me (frank _dot_ mehnert _at_ sun _dot_ com)? Please don't forget to tell which package you are using.
comment:37 by , 16 years ago
Hi, i'm using VB 2.0.4 on OpenSUSE 11.0@x86_64, the same machine is crashing from time to time. The machine uses host interfase (bridging)
00:57:56.326 00:57:56.326 !!Assertion Failed!! 00:57:56.326 Expression: i < 4096 00:57:56.326 Location : /home/vbox/vbox-2.0.4/src/VBox/Runtime/r3/linux/semevent-linux.cpp(188) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 00:57:56.327 iCur=0x1 pThis=00007f48a004ccc0
follow-up: 39 comment:38 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway.
follow-ups: 40 41 comment:39 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Replying to frank:
2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway
Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.
comment:40 by , 16 years ago
Replying to schinkelm:
Replying to frank:
2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway
Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.
I commented on the new ebuild here: http://bugs.gentoo.org/show_bug.cgi?id=248776#c11
follow-up: 42 comment:41 by , 16 years ago
Replying to schinkelm:
Replying to frank:
2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway
Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.
Seconding this. I'm running Gentoo on amd64 and I still see the bug (but so far, it has only happened when more than one VM is running).
comment:42 by , 16 years ago
Replying to amdg:
Replying to schinkelm:
Replying to frank:
2.0.6 should fix that problem. Note that the fix currently only works for .deb/.rpm packages for distributions with glibc >= 2.6 (e.g. Ubuntu 7.10 / Hardy or later, Fedora 7 or later, ...). The .run packages are compiled for rhel4 and do not contain the fix. I will close that bug anyway
Could you please compile the .run package on a newer system? On Gentoo (which seems to use the .run package in the app-emulation/virtualbox-bin ebuild) the bug still exists.
Seconding this. I'm running Gentoo on amd64 and I still see the bug (but so far, it has only happened when more than one VM is running).
I currently run only one VM and have seen the problem on high network loads.
comment:43 by , 16 years ago
I'm also seeing this with virtualbox-bin-2.1.4 on Gentoo amd64:
00:10:13.615 PCNet#0: Init: ss32=1 GCRDRA=0x0f9c7000[32] GCTDRA=0x0f934000[16] 01:38:05.274 01:38:05.274 !!Assertion Failed!! 01:38:05.274 Expression: i < 4096 01:38:05.274 Location : /home/vbox/tinderbox/2.1-lnx64-rel/src/VBox/Runtime/r3/linux/semevent-linux.cpp(203) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 01:38:05.310 iCur=0x1 pThis=00000000023b2090
The guest is running Debian 5.0 i386 with a host network interface. This happens just after a few minutes when I access the webserver running on the virtual machine.
follow-up: 45 comment:44 by , 16 years ago
The problem with Gentoo is still that our .run installer is built against a libc < 2.6.
comment:45 by , 16 years ago
Replying to frank:
The problem with Gentoo is still that our .run installer is built against a libc < 2.6.
So if that has been known for several months now, what exactly is the reason for linking the package with such an ancient libc? And if it's about compatibility, why not provide an alternative package that fixes this very annoying bug? I simply can't run more than one VM at a time which kind of undermines my attempts to test some different network setups :-( Thanks in advance for fixing!
comment:47 by , 16 years ago
I have same bug with Vbox 2.2.2 on Debian 4.0 :( has stable crash every two days...
core 2.6.26-bpo.1-amd64
628:36:14.255 !!Assertion Failed!! 628:36:14.255 Expression: i < 4096 628:36:14.255 Location : /home/vbox/vbox-2.2.2/src/VBox/Runtime/r3/linux/semevent-linux.cpp(203) int RTSemEventSignal(RTSEMEVENTINTERNAL*) 628:36:14.264 iCur=0x1 pThis=0000000001040330
Have you a solution of problem??
comment:48 by , 16 years ago
Debian/Etch uses a libc < 2.6, therefore we have to use our re-implementation of this event semaphore which is obviously buggy. No idea why, contributions are welcome. If you would upgrade to Debian/Lenny the problem would went away ...
comment:49 by , 16 years ago
An easy scenario how to trigger this bug as quick as possible would be helpful.
comment:51 by , 16 years ago
I want to repeat: An easy scenario how to trigger this bug as quick as possible would be helpful.
comment:52 by , 16 years ago
Well...all I have to do is start two or more VMs and all but one of them will sooner or later die with the assertion failure, usually within five minutes. I've seen it happen with 32-bit WinXP and Win2k3 guests, I can try with other combinations if you want. Basically it's impossible to run more than one guest at a time. Host info: Gentoo Linux (x86_64 on a Core 2 Duo E6750 2.66GHz with 6GB RAM), kernel 2.6.28, glibc 2.8_p20080602-r1, VirtualBox 2.2.2 (haven't tested with 2.2.4 yet, but as Zer0COOL suggest it's still there).
comment:53 by , 15 years ago
I'm still seeing this with VirtualBox 3.0.2 on a Gentoo Linux x86_64 host running kernel 2.6.30 (with Gentoo patches) & glibc 2.10.1. The problem has occurred on OpenSolaris (x86_64), Fedora 11 (x86), & Ubuntu 9.04 (x86) guests. I'm on a C2D P8400 with VT-x, PAE, 3D accel, & Nested Paging enabled. It's occurred both with and withou IO-APIC enabled, and generally happens to me during the OS install phase. I do have one Windows 7 x86_64 guest which didn't run into that issue, but it did require IO-APIC enabled to install.
comment:54 by , 15 years ago
Confirmed the same issue with 3.0.4.
4GB of memory, dual Opteron CPUs. Software based RAID 1 mirrored SATA drives shared by 3 Windows server guests (Win 2k3 and Win 2k8).
Basically, if all 3 start sometime between a few hours and a few days this will occur. More than 1 guest appears to be the trigger. Each of the guests are legitimate servers (Exchange 2007, AD domain controller, and Symantec SEP console) -- each of these can have substantial I/O bursts (sometimes concurrently).
I have another identical system that only runs 1 VM at a time and it has gone 6+ mos without an issue.
comment:55 by , 15 years ago
Version: | VirtualBox 2.0.4 → VirtualBox 3.0.4 |
---|
follow-up: 57 comment:56 by , 15 years ago
I can also comfirm this issue on 3.04.
share VirtualBox # !!Assertion Failed!! Expression: i < 4096 Location : /home/vbox/tinderbox/3.0-lnx64-rel/src/VBox/Runtime/r3/linux/semevent-linux.cpp(203) int RTSemEventSignal(RTSEMEVENTINTERNAL*) iCur=0x1 pThis=000000000099bfe0
[5]- Trace/breakpoint trap ./VBoxHeadless --startvm "Windows2008-Server2" --vrdpport 3387
share VirtualBox # uname -a Linux share 2.6.30-gentoo-r6 #1 SMP Tue Sep 1 03:42:32 CDT 2009 x86_64 AMD Processor model unknown AuthenticAMD GNU/Linux
Happens when running more than 1 VM. Athlon X2 3.0. 8 GB ram.
comment:57 by , 15 years ago
Very very easy to repeat this bug when installing Windows 2008 concurrently, 2 installs did it for me 3-4 times.
comment:58 by , 15 years ago
I can reproduce this systematically here by doing anything like copying a file to the VM. I'm struggling to get SP3 in.stalled on my Windows XP VM, because with every heavy I/O it aborts!
This is the only VM I have here. VirtualBox 3.0.4. Core 2 Duo T5550, 2GB of RAM, Arch Linux, kernel 2.6.30
comment:59 by , 15 years ago
renanbirck, copying a file over the NAT network? Could you attach a VBox.log file of such a crashed session? I have done wget in three concurrent running VMs but was still not able to reproduce this problem.
follow-up: 61 comment:60 by , 15 years ago
I have more and more the feeling that some special Linux kernel configuration is required to trigger this bug. Since I don't have neither ArchLinux nor Gentoo installed here, could someone of you who is experiencing this bug attach the configuration of his host Linux kernel here?
comment:61 by , 15 years ago
Replying to frank:
I have more and more the feeling that some special Linux kernel configuration is required to trigger this bug. Since I don't have neither ArchLinux nor Gentoo installed here, could someone of you who is experiencing this bug attach the configuration of his host Linux kernel here?
Attached. hopefully we can have some other attach theres to compare.
comment:64 by , 15 years ago
I was going through all the .configs posted thus far, and one thing standing out is they're all SMP machines (so perhaps some threading issues are present). I ran a several older releases of VirtualBox on my previous laptop with a Pentium-M also running Gentoo (several kernels all the way up to 2.6.28), but I never saw the issue.
comment:65 by , 15 years ago
Yes, I'm using an SMP box as well (T9550 @ 2.66GHz). Yesterday I used a 2.6.30.5 kernel with a the adapted config file by tg2861 -- the build run rock solid for hours doing wget guest=>host and wget host=>guest in parallel. Did similar experiments with a Pentium-D @ 3GHz. Are you guys using a CPU with hyperthreading?
comment:66 by , 15 years ago
I'm running dual Opterons.
CPUInfo says:
processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 HE stepping : 10 cpu MHz : 1992.244 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good bogomips : 3984.48 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 HE stepping : 10 cpu MHz : 1992.244 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good bogomips : 3984.72 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
comment:67 by , 15 years ago
I tried again now. Created four fresh VMs, everything default. Started a simultaneous installation of XP Pro SP3 (32 bit) in all of them from a CD image. Two machines died in the text setup phase while "Setup is copying files...". The other two finished installing. After the first login I copied to contents of the installation CD to My Documents, which is when the third machine died. On a second run with the same setup, the first VM went away right after setup started to load drivers, the second and third one followed suit in the same phase. Again, the fourth one went on running. I re-ran the test with different Windows variants and Linux live CDs and every time all but one VMs sooner or later hit the assertion - usually sooner.
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz stepping : 11 cpu MHz : 1998.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority bogomips : 5320.64 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz stepping : 11 cpu MHz : 1998.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority bogomips : 5319.97 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
comment:68 by , 15 years ago
Finally that was a good test case, was able to reproduce the assertion now. However, that does not necessarily mean that the bug can easily be fixed ...
comment:69 by , 15 years ago
Well, we think we finally fixed this problem. For your convenience I've uploaded a new 3.0.6 .run package for Linux/AMD64. This new package is still not linked from the download page (note the different build number 52130 versus 52128). This package only differs in this semaphore fix. Any feedback is welcome. If it works for you then we will probably update the other affected packages as well (Debian 4.0, RHEL5, sles10.1) and change the links.
comment:70 by , 15 years ago
Summary: | Assertion failed in sems-linux.cpp(219) → Assertion failed in sems-linux.cpp(219) => Fixed in SVN |
---|
comment:71 by , 15 years ago
Looks very promising indeed, this build survived all my torturing so far :-) Thanks alot for looking into this!
comment:72 by , 15 years ago
Fantastic! I'll get it installed this evening. I can't recall ever getting more than a week with all 3 of my VMs running; I'll post updates.
Thanks
comment:73 by , 15 years ago
Summary: | Assertion failed in sems-linux.cpp(219) => Fixed in SVN → Assertion failed in sems-linux.cpp(219) => Fixed in SVN/3.0.6 |
---|
Marked as fixed in 3.0.6 because I've replaced the packages on the download server and on the webppage. Replaced all affected packages (rhel5-amd64, Debian/Etch-amd64, SLES12-amd64, Linux/.run-amd64).
comment:74 by , 15 years ago
Frank, any chance we can get a little more details on the fix? I submitted a Gentoo bug (285228) to get Portage updated, but downstream would appreciate a little more info (& notification).
comment:75 by , 15 years ago
Sure (and thanks btw for notifying the Gentoo people). The fixes are contained in the changesets r22950, r22952, r22953, r22954, r22955, r22956, r22957, r22958, r22959. As written above the reason for this problem was our own implementation of a event semaphore. Older LibCs (version < 2.6) contain a bug of the 64-bit futex code. So for newer Linux distributions we used the generic implementation (Runtime/r3/posix/semevent-posix.cpp). But as we are building our generic Linux package on RHEL4 (to be compatible with a lot of older Linux distributions), the generic package contained out own implementation and therefore this bug.
The problem was that the signalling thread was responsible for adjusting the numbers of waiting threads. This number was used to determine if a thread which executes RTSemEventSignal() has actually to wakeup another thread or if there no threads sleeping. If this thread was preempted just after he woke up a waiting thread it could take some time until the waking thread was running again (especially if the system load is very high). The following happened: One thread A was leaving a critical section with RTSemEventSignal(). Another thread B was waiting in RTSemEventWait() and was woken up by A. A was preempted before it could adjust the number of waiting threads nWaiters. B continued to run and eventually left the critical section with RTSemEventSignal(). Because nWaiters was still not adjusted, B tried to wake up a waiting thread -- but there was no thread waiting, A just had no chance to adjust nWaiters. B was now looping and waiting for some time but as the system load is very high, it took a long time until A was scheduled again. So the general problem was that A had to adjust nWaiters. You can browse the fixed code to see how the problem is solved.
comment:76 by , 15 years ago
Thanks for the details response; portage has been updated to include the new build.
comment:77 by , 15 years ago
I can confirm that this patch corrected my problems. I've been running for almost 2 weeks and all 3 VMs running on my dual Opteron system are running -- far longer than I'd ever been able to keep all 3 up.
Thanks for taking care of this!
comment:78 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Vbox.log