Opened 16 years ago
Closed 15 years ago
#4433 closed defect (fixed)
SMP bugs in guest
Reported by: | Costin Grigoras | Owned by: | |
---|---|---|---|
Component: | guest smp | Version: | VirtualBox 3.0.0 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Linux |
Description
Guests run 2.6.29.3, one is 32 bit the other 64 bit. No message is logged in VBox.log. Here are some samples, I'll put more info in separate files.
Eeek! page_mapcount(page) went negative! (-1)
page pfn = 66f89 page->flags = 8000087c page->count = 2 page->mapping = f219f50c vma->vm_ops = 0x0
kernel BUG at mm/rmap.c:725! invalid opcode: 0000 #1 SMP
kernel BUG at arch/x86/mm/highmem_32.c:87! invalid opcode: 0000 #12 SMP last sysfs file: /sys/class/vc/vcs5/dev
BUG: unable to handle kernel paging request at ffffffde IP: [<c015710a>] buffered_rmqueue+0x125/0x211 *pde = 00008067 *pte = 00000000 Oops: 0000 #1 SMP
Attachments (5)
Change History (12)
by , 16 years ago
Attachment: | messages.32bit.log added |
---|
by , 16 years ago
Attachment: | VBox.32bit.log added |
---|
by , 16 years ago
Attachment: | VBox.64bit.log added |
---|
comment:1 by , 16 years ago
Another one just happened while doing a "du" on the guest on a large structure. du segfaulted while the kernel logged:
general protection fault: 0000 [#1] SMP last sysfs file: /sys/class/vc/vcs6/dev CPU 3 Modules linked in: ipv6 parport_pc lp parport vboxvfs autofs4 sunrpc dm_mirror dm_region_hash dm_log dm_mod button battery ac i2c_piix4 i2c_core vboxadd e1000 floppy ext3 jbd ata_piix libata sd_mod scsi_mod [last unloaded: x_tables] Pid: 16486, comm: du Tainted: G W 2.6.28.3 #1 RIP: 0010:[<ffffffff80296b49>] [<ffffffff80296b49>] do_lookup+0xcd/0x1c6 RSP: 0018:ffff880068bf5ca8 EFLAGS: 00010282 RAX: ffffffffa009d620 RBX: fffffffffffffff4 RCX: ffff88010833a3f0 RDX: ffff880068bf5e08 RSI: ffff88007f948380 RDI: ffff880097cf3288 RBP: ffff88007f948380 R08: 0000000000000007 R09: 0000000000000007 R10: 6f6f620000000000 R11: ffffffff80327ed9 R12: ffff880097cf3288 R13: ffff880097cf3358 R14: ffff880068bf5e08 R15: ffff880068bf5d38 FS: 00007f30a896b6e0(0000) GS:ffff88010ec05d00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000084c0c8 CR3: 00000000db878000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process du (pid: 16486, threadinfo ffff880068bf4000, task ffff880104cfc0a0) Stack: ffffffffa009c61f ffff88010dc5ff00 ffff880068bf5d48 0000000000000000 ffff880068bf5e08 ffff880097cf3288 ffff880068bf5d98 00000000ffffff9c ffff880068bf5d48 ffffffff802974e1 0000004468bf5da8 0000000000000003 Call Trace: [<ffffffffa009c61f>] ? ext3_check_acl+0x0/0x53 [ext3] [<ffffffff802974e1>] ? __link_path_walk+0x89f/0xc93 [<ffffffff8029791e>] ? path_walk+0x49/0x8e [<ffffffff80297a73>] ? do_path_lookup+0x110/0x164 [<ffffffff80297e9f>] ? user_path_at+0x48/0x79 [<ffffffff8032b417>] ? _raw_spin_lock+0x61/0xfa [<ffffffff8032b565>] ? _raw_spin_unlock+0x86/0x89 [<ffffffff80292adc>] ? vfs_lstat_fd+0x15/0x3f [<ffffffff80292e48>] ? sys_newlstat+0x19/0x31 [<ffffffff8048da37>] ? _spin_lock_irqsave+0x22/0x2b [<ffffffff8032b565>] ? _raw_spin_unlock+0x86/0x89 [<ffffffff8048dd1a>] ? error_exit+0x0/0x70 [<ffffffff8020b31a>] ? system_call_fastpath+0x16/0x1b Code: ff ff ff 75 3e 48 89 ef 4c 89 fe b3 f4 e8 b3 73 00 00 48 85 c0 48 89 c5 74 29 49 8b 84 24 30 01 00 00 4c 89 f2 48 89 ee 4c 89 e7 <ff> 50 08 48 85 c0 48 89 c3 74 0a 48 89 ef e8 66 67 00 00 eb 03 RIP [<ffffffff80296b49>] do_lookup+0xcd/0x1c6 RSP <ffff880068bf5ca8> ---[ end trace 4eaa2a86a8e2da22 ]---
comment:2 by , 16 years ago
Changing the kernel doesn't help, here is another one with 2.6.31-rc2. I don't know if it's related or not, but I've seen another kernel bug logged at about the same time by another guest on the same host. First message is from a 32bit 2.6.31-rc2, second is a 64bit 2.6.28.3, 4 CPUs each. Both machines were busy compiling for a long time.
I've also left other 2 guests with a single CPU each and since then I've seen no complains from them any more, even under load. Other 2 guests with 4 CPUs each but very low load didn't show these problems.
BUG: unable to handle kernel paging request at c102f631 IP: [<c102f631>] sys_exit_group+0x0/0x10 *pde = 369ed063 *pte = 0102f161 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/irq Modules linked in: ipv6 vboxvfs vboxadd autofs4 hidp rfcomm l2cap bluetooth rfkill sunrpc dm_mirror dm_multipath video output battery lp nvram ac button parport_pc parport pcspkr i2c_piix4 e1000 floppy dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 20687, comm: true Not tainted (2.6.31-rc2 #1) VirtualBox EIP: 0060:[<c102f631>] EFLAGS: 00010283 CPU: 0 EIP is at sys_exit_group+0x0/0x10 EAX: 000000fc EBX: 00000000 ECX: 00c34186 EDX: 3280cc36 ESI: 45453280 EDI: 45453280 EBP: f2d1f000 ESP: f2d1ffb0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process true (pid: 20687, ti=f2d1f000 task=f6301270 task.ti=f2d1f000) Stack: c1002ed4 00000000 00000004 00000000 45453280 45453280 bf9b5018 000000fc <0> 0000007b 0000007b 00000000 00000000 000000fc ffffe424 00000073 00000246 <0> bf9b4fec 0000007b 00000000 00000000 Call Trace: [<c1002ed4>] ? sysenter_do_call+0x12/0x28 Code: ff ff 89 c2 8d 80 58 01 00 00 39 82 58 01 00 00 74 e9 eb be 89 73 34 c7 43 44 08 00 00 00 64 a1 00 30 41 c1 e8 41 7e 00 00 eb c9 <0f> b6 44 24 04 c1 e0 08 e8 6e ff ff ff 31 c0 c3 0f b6 44 24 04 EIP: [<c102f631>] sys_exit_group+0x0/0x10 SS:ESP 0068:f2d1ffb0 CR2: 00000000c102f631 ---[ end trace 0a336e01786aae2f ]---
------------[ cut here ]------------ kernel BUG at mm/mmap.c:2125! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/irq CPU 0 Modules linked in: nfs lockd nfs_acl ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_multipath rfkill input_polldev battery lp floppy ac button parport_pc parport e1000 i2c_piix4 i2c_core pcspkr dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 28333, comm: config.guess Tainted: G W 2.6.28.3 #1 RIP: 0010:[<ffffffff80278de8>] [<ffffffff80278de8>] exit_mmap+0x112/0x11d RSP: 0018:ffff8800df97de68 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff880028034460 RCX: ffffffff80278ccf RDX: ffff88010303bbe0 RSI: ffff88003d9f5200 RDI: 0000000000000246 RBP: ffff88010303bb80 R08: 0000000000000000 R09: ffff8800281015c0 R10: ffff880008485738 R11: ffffffff802f559a R12: 0000000000000000 R13: ffff88010303bbe0 R14: 0000000000000000 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffffffff80686000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000003661c03080 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process config.guess (pid: 28333, threadinfo ffff8800df97c000, task ffff88000756cc60) Stack: 00000000000000a5 ffff880028034460 ffff88010303bb80 ffff88010303bc28 ffff88000756cc60 ffffffff8023219a 0000000000000296 ffff88000756d110 ffff88010303bb80 ffffffff80235a20 0000000000000000 ffffffff802606a0 Call Trace: [<ffffffff8023219a>] ? mmput+0x40/0xbd [<ffffffff80235a20>] ? exit_mm+0xfa/0x105 [<ffffffff802606a0>] ? audit_free+0x183/0x1bb [<ffffffff80236e14>] ? do_exit+0x1e7/0x769 [<ffffffff8048641f>] ? _spin_lock_irqsave+0x25/0x2d [<ffffffff802373fd>] ? do_group_exit+0x67/0x96 [<ffffffff8023743e>] ? sys_exit_group+0x12/0x17 [<ffffffff8020b20a>] ? system_call_fastpath+0x16/0x1b Code: 8d 7b 18 e8 b7 70 00 00 c7 43 08 00 00 00 00 eb 0b 4c 89 e7 e8 93 fe ff ff 49 89 c4 4d 85 e4 75 f0 48 83 bd 10 01 00 00 00 74 04 <0f> 0b eb fe 59 5e 5b 5d 41 5c c3 41 57 41 56 41 55 41 54 49 89 RIP [<ffffffff80278de8>] exit_mmap+0x112/0x11d RSP <ffff8800df97de68> ---[ end trace 4eaa2a86a8e2da22 ]---
comment:3 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
There were several SMP-related fixes in 3.0.2, please reopen if this problem still persists.
comment:4 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
The problem is still here even in 3.0.6. One of the virtual machines that worked fine with only one core for ~40 days was upgraded to 4 cores and in ~6h it hanged. On the only console that was still opened a few date
commands ran at ~1sec interval showed the following sequence:
> date Tue Sep 15 01:59:40 CEST 2009 > date Tue Sep 15 01:59:41 CEST 2009 > date Tue Sep 15 01:59:42 CEST 2009 > date Tue Sep 15 01:59:37 CEST 2009 > date Tue Sep 15 01:59:38 CEST 2009 > date Tue Sep 15 01:59:38 CEST 2009 > date Tue Sep 15 01:59:39 CEST 2009
At the next command it hanged for good so ... no other details are available.
Restarting it again with only one allocated core solved the problem.
This machine runs quite frequently two or even three parallel processes that are CPU intensive. It would've been the perfect use case for more cores...
Host: 64bit Ubuntu 9.10 on Intel X5450
Guest: 64bit RHEL4 with custom 2.6.28.3 (to solve the timing issues)
comment:6 by , 15 years ago
Component: | other → guest smp |
---|
comment:7 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
3.0.10 contains guest SMP fixes to address such issues. Reopen if necessary please.
kernel log for 32bit kernel