VirtualBox

Ticket #3537 (closed defect: fixed)

Opened 5 years ago

Last modified 3 years ago

100% CPU load of events kernel thread triggered by virtualbox

Reported by: Chris56 Owned by:
Priority: major Component: VMM
Version: VirtualBox 2.1.4 Keywords:
Cc: Guest type: other
Host type: Linux

Description (last modified by frank) (diff)

After starting virtualbox on a dual core machine, one of the events kernel threads (nicelevel: -5 !) gets 100% CPU all the time, i.e. occupying one core completely.

Further observations:

  1. This happens as soon as the guest os starts.
  2. It is also the case, when the guest os is idling.
  3. It does NOT depend on the guest operating system. Same with Windows XP as with some Linux flavour.

This is on Ubuntu Intrepid, version: 2.1.4-42893_Ubuntu_intrepid .

Attachments

vboxbug.txt Download (3.2 KB) - added by Chris56 5 years ago.
kenel messages

Change History

comment:1 Changed 5 years ago by kanatacrude

I am experiencing the same thing with VirtualBox 2.1.4. XP partition running as guest in Kubuntu 8.10 and on another PC an XP guest running on Ubuntu 8.10. As soon as the VM starts, one of my two processors is running flat out for hours until I stop the VM. My searchs for help show that so many people are experiencing this problem. This seems to be quite widespread. Wide variety of guests running on a variety of host OSes. MacOS, Ubuntu, CentOS.

This product would really kick bass if it just didn't have this glaring bug right now.

comment:2 Changed 5 years ago by Chris56

Some additional information (even though nobody seems to care...):
It seems to be a bug around vboxnetflt. See the following kernel message:

[26054.680083] BUG: unable to handle kernel NULL pointer dereference at 00000004
[26054.680090] IP: [<c011af98>] __ticket_spin_lock+0x8/0x20
[26054.680097] *pde = 00000000 
[26054.680101] Oops: 0002 [#1] SMP 
[26054.680105] Modules linked in: af_packet binfmt_misc radeon drm rfcomm sco bnep l2cap bluetooth vboxnetflt vboxdrv ppdev lirc_atiusb lirc_dev powernow_k8 cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative pci_slot container sbs sbshc video output wmi battery bridge stp iptable_filter ip_tables x_tables ac it87 hwmon_vid sbp2 lp loop snd_hda_intel dvb_usb_dib0700 snd_pcm_oss dib7000p snd_mixer_oss dib7000m snd_pcm dvb_usb dvb_core snd_seq_dummy dib3000mc dibx000_common dib0070 snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd k8temp i2c_piix4 soundcore snd_page_alloc pcspkr i2c_core evdev parport_pc parport button shpchp pci_hotplug ati_agp agpgart ipv6 ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif ata_generic sg pata_acpi usbhid hid ohci1394 ieee1394 e1000 uhci_hcd pata_atiixp ahci r8169 mii libata ehci_hcd scsi_mod ohci_hcd dock usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[26054.680172] 
[26054.680175] Pid: 10, comm: events/1 Not tainted (2.6.27-11-generic #1)
[26054.680177] EIP: 0060:[<c011af98>] EFLAGS: 00010246 CPU: 1
[26054.680181] EIP is at __ticket_spin_lock+0x8/0x20
[26054.680183] EAX: 00000004 EBX: f7063f5c ECX: f14588b8 EDX: 00000100
[26054.680185] ESI: f14588b4 EDI: f915c1f0 EBP: f7063f34 ESP: f7063f34
[26054.680187]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[26054.680190] Process events/1 (pid: 10, ti=f7062000 task=f7068000 task.ti=f7062000)
[26054.680191] Stack: f7063f3c c037f45d f7063f44 f91616e3 f7063f6c f915c222 00000000 f7063f5c 
[26054.680197]        00000000 f1458810 00000000 f14588b4 f700e900 f915c1f0 f7063fa4 c01436f5 
[26054.680203]        005ce478 c20f9d00 f7063f90 00000246 f14588b8 e07cf500 f700e904 ffffffff 
[26054.680208] Call Trace:
[26054.680210]  [<c037f45d>] ? _spin_lock+0xd/0x10
[26054.680215]  [<f91616e3>] ? RTSpinlockAcquire+0x13/0x20 [vboxnetflt]
[26054.680225]  [<f915c222>] ? vboxNetFltLinuxXmitTask+0x32/0x180 [vboxnetflt]
[26054.680231]  [<f915c1f0>] ? vboxNetFltLinuxXmitTask+0x0/0x180 [vboxnetflt]
[26054.680236]  [<c01436f5>] ? run_workqueue+0x95/0x160
[26054.680241]  [<c0147626>] ? finish_wait+0x16/0x70
[26054.680244]  [<c0143998>] ? worker_thread+0x88/0xf0
[26054.680247]  [<c0147560>] ? autoremove_wake_function+0x0/0x50
[26054.680250]  [<c0143910>] ? worker_thread+0x0/0xf0
[26054.680254]  [<c01471f1>] ? kthread+0x41/0x80
[26054.680257]  [<c01471b0>] ? kthread+0x0/0x80
[26054.680260]  [<c0105297>] ? kernel_thread_helper+0x7/0x10
[26054.680264]  =======================
[26054.680266] Code: 00 00 00 55 8b 10 89 e5 5d 89 d0 c1 f8 08 29 d0 25 ff 00 00 00 83 e8 01 0f 9f c0 0f b6 c0 c3 8d 74 26 00 55 ba 00 01 00 00 89 e5 <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 8d b4 26 00 
[26054.680293] EIP: [<c011af98>] __ticket_spin_lock+0x8/0x20 SS:ESP 0068:f7063f34
[26054.680300] ---[ end trace c976274508e23837 ]---

comment:3 Changed 5 years ago by frank

  • Description modified (diff)

Actually we do care. Perhaps better late than never:

  • Is this kernel fault reproducible (I assume so reading your original remarks).
  • When did this fault occur, that is, can you show a relation between this fault and loading the module, starting a VM, running the VM? Perhaps you could do
    tail -f /var/log/kern.log
    
    to get immediately notified when this problem occurs.
  • The chances are low but could you check if VirtualBox 2.2.2 fixes the problem by any chance?

comment:4 Changed 5 years ago by Chris56

Up to now I cannot reproduce it. It occurs regularly, but I don't know what triggers it.

However, some more observations: Once the kernel event thread on one processor gets into the state of 100% cpu (endless loop I guess), it will continue until the virtualbox is stopped.
When virtualbox starts again, the same events thread goes wild again. Unloading and reloading of the virtualbox modules does not have any effect.
I can imagine, that the error might be in the kernel events thread, triggered by virtual box.
The only way to start virtualbox again without the events thread looping is to reboot the kernel beforehand.
Is virtualbox bound to processor once started on one processor?

As soon as I see the same behavior happens with 2.2.2 I will tell. However, not seeing the bug with 2.2.2 does not reliably tell it's gone, sice I can't trigger it.

comment:5 Changed 5 years ago by ptr_uzl

I'm experiencing this (or at least similar issue) with VirtualBox-2.2.2. I'm not sure what exactly triggers the oops, but it happens several times. IIRC every time it happened, the VM network ran in bridged mode.

Dmesg:

device eth1 left promiscuous mode
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff80226515>] __ticket_spin_lock+0x9/0x20
PGD bb59d067 PUD bb58e067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
CPU 0 
Modules linked in: vboxnetflt(N) vboxdrv(N) binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device md5 des_generic cbc nfs lockd nfs_acl ipv6 af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave powernow_k8 fuse ext2 loop rpcsec_gss_krb5 auth_rpcgss sunrpc dm_mod snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer 8139too ohci1394 snd soundcore nvidia(PN) i2c_nforce2 sr_mod pcspkr 8139cp serio_raw ieee1394 snd_page_alloc sky2 i2c_core usb_storage k8temp(N) forcedeth floppy cdrom button joydev sg usbhid hid sd_mod crc_t10dif ohci_hcd ehci_hcd usbcore rtc_cmos rtc_core rtc_lib edd ext3 mbcache jbd fan ide_pci_generic amd74xx ide_core ata_generic pata_amd sata_nv libata scsi_mod thermal processor thermal_sys hwmon [last unloaded: preloadtrace]
Supported: Yes, External
Pid: 6, comm: events/0 Tainted: P          N  2.6.29-6-default #1  
RIP: 0010:[<ffffffff80226515>]  [<ffffffff80226515>] __ticket_spin_lock+0x9/0x20
RSP: 0018:ffff8800bee9be00  EFLAGS: 00010286
RAX: 0000000000010000 RBX: ffff8800bd809b38 RCX: 0000000000180018
RDX: ffff8800bf75e1c8 RSI: ffff8800bee9be30 RDI: 0000000000000004
RBP: ffff8800bee9be00 R08: ffff8800bee9a000 R09: ffff8800bf75ebd8
R10: ffff8800bef67dd0 R11: ffff8800bef67da0 R12: ffff8800bd809b38
R13: ffffffffa0e172b0 R14: ffff8800bd809a10 R15: ffff8800bd809b40
FS:  00007ff13a248740(0000) GS:ffffffff80966000(0000) knlGS:00000000b7233760
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000004 CR3: 00000000bb58f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process events/0 (pid: 6, threadinfo ffff8800bee9a000, task ffff8800bee98180)
Stack:
 ffff8800bee9be10 ffffffff804f7aa0 ffff8800bee9be20 ffffffffa0e1c023
 ffff8800bee9be70 ffffffffa0e172e3 0000000000000000 ffff8800bee98538
 ffff8800954924c0 ffff8800bf75e1c0 ffff8800bd809b38 ffffffffa0e172b0
Call Trace:
 [<ffffffff804f7aa0>] _spin_lock+0x17/0x1a
 [<ffffffffa0e1c023>] RTSpinlockAcquire+0xd/0xf [vboxnetflt]
 [<ffffffffa0e172e3>] vboxNetFltLinuxXmitTask+0x33/0x18c [vboxnetflt]
 [<ffffffff80258ba5>] run_workqueue+0xac/0x157
 [<ffffffff80258d2d>] worker_thread+0xdd/0xee
 [<ffffffff8025c392>] kthread+0x49/0x76
 [<ffffffff8020d73a>] child_rip+0xa/0x20
Code: 63 22 80 44 89 c6 4c 89 cf e8 76 fb ff ff eb 0e 0f b7 f2 0f b6 f8 4c 89 ca e8 8d fb ff ff c9 c3 90 90 55 b8 00 00 01 00 48 89 e5 <3e> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5 
RIP  [<ffffffff80226515>] __ticket_spin_lock+0x9/0x20
 RSP <ffff8800bee9be00>
CR2: 0000000000000004
---[ end trace 28cb35fbd9aa61ca ]---
device eth1 entered promiscuous mode

Changed 5 years ago by Chris56

kenel messages

comment:6 Changed 5 years ago by Chris56

The problem persists (one events kernel thread eating up one core 100%):
Now with ubuntu SMP kernel 2.6.27-14-generic and
virtualbox-2.2 (2.2.4-47978_Ubuntu_intrepid).
It has to do with the network driver and happens with both NAT and Bridge mode, see kernel message:

[25500.040130] BUG: unable to handle kernel NULL pointer dereference at 00000004
[25500.040137] IP: [<c011afc8>] __ticket_spin_lock+0x8/0x20
[25500.040145] *pde = 00000000 
[25500.040149] Oops: 0002 [#1] SMP 
[25500.040152] Modules linked in: binfmt_misc radeon drm rfcomm sco bnep l2cap bluetooth vboxnetflt vboxdrv ppdev lirc_atiusb lirc_dev powernow_k8 cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative pci_slot container sbs sbshc video output wmi battery bridge stp iptable_filter ip_tables x_tables ac it87 hwmon_vid sbp2 lp loop dvb_usb_dib0700 dib7000p dib7000m dvb_usb snd_hda_intel dvb_core pcspkr snd_pcm_oss snd_mixer_oss dib3000mc dibx000_common dib0070 snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event k8temp snd_seq snd_timer snd_seq_device snd soundcore snd_page_alloc evdev i2c_piix4 i2c_core parport_pc parport button ati_agp shpchp pci_hotplug agpgart ipv6 ext3 jbd mbcache ata_generic usbhid hid sr_mod sd_mod cdrom crc_t10dif sg pata_acpi ohci1394 ehci_hcd ieee1394 uhci_hcd ahci ohci_hcd pata_atiixp usbcore e1000 libata scsi_mod dock r8169 mii floppy raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[25500.040217] 
[25500.040220] Pid: 10, comm: events/1 Not tainted (2.6.27-14-generic #1)
[25500.040222] EIP: 0060:[<c011afc8>] EFLAGS: 00010246 CPU: 1
[25500.040226] EIP is at __ticket_spin_lock+0x8/0x20
[25500.040228] EAX: 00000004 EBX: f7063f5c ECX: ed8e1eb8 EDX: 00000100
[25500.040230] ESI: ed8e1eb4 EDI: f917d300 EBP: f7063f34 ESP: f7063f34
[25500.040232]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[25500.040234] Process events/1 (pid: 10, ti=f7062000 task=f7068000 task.ti=f7062000)
[25500.040236] Stack: f7063f3c c037ffed f7063f44 f9182853 f7063f6c f917d332 00000000 f7063f5c 
[25500.040241]        c0144260 ed8e1e10 00000000 ed8e1eb4 f700e900 f917d300 f7063fa4 c01437c5 
[25500.040246]        004916d2 c20f9d00 5deb8f0c 000011b3 ed8e1eb8 f7624c40 f700e904 f7063fa4 
[25500.040251] Call Trace:
[25500.040253]  [<c037ffed>] ? _spin_lock+0xd/0x10
[25500.040258]  [<f9182853>] ? RTSpinlockAcquire+0x13/0x20 [vboxnetflt]
[25500.040268]  [<f917d332>] ? vboxNetFltLinuxXmitTask+0x32/0x1e0 [vboxnetflt]
[25500.040274]  [<c0144260>] ? queue_delayed_work+0x20/0x40
[25500.040278]  [<f917d300>] ? vboxNetFltLinuxXmitTask+0x0/0x1e0 [vboxnetflt]
[25500.040284]  [<c01437c5>] ? run_workqueue+0x95/0x160
[25500.040287]  [<c0147786>] ? finish_wait+0x16/0x70
[25500.040290]  [<c0143a68>] ? worker_thread+0x88/0xf0
[25500.040293]  [<c0147630>] ? autoremove_wake_function+0x0/0x50
[25500.040296]  [<c01439e0>] ? worker_thread+0x0/0xf0
[25500.040299]  [<c01472c1>] ? kthread+0x41/0x80
[25500.040302]  [<c0147280>] ? kthread+0x0/0x80
[25500.040305]  [<c0105297>] ? kernel_thread_helper+0x7/0x10
[25500.040309]  =======================
[25500.040310] Code: 00 00 00 55 8b 10 89 e5 5d 89 d0 c1 f8 08 29 d0 25 ff 00 00 00 83 e8 01 0f 9f c0 0f b6 c0 c3 8d 74 26 00 55 ba 00 01 00 00 89 e5 <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 8d b4 26 00 
[25500.040334] EIP: [<c011afc8>] __ticket_spin_lock+0x8/0x20 SS:ESP 0068:f7063f34
[25500.040341] ---[ end trace 9f579f3dd54fca5b ]---

comment:7 Changed 5 years ago by Chris56

Same with VirtualBox 3.02 now!!!
Could someone link this to all virtualbox versions (at least to the ones mentioned here!)?
Bridge mode makes it much more likely to happen.
The ticket_spin_lock oops happens when shutting down the guest after the events threads has gone wild.
I'm willing to do additional research and maybe do some kernel debugging.
It shouldn't be that difficult given the traceback...see below for the virtualbox 3.02 kernel (2.6.27-14-generic #1 SMP) traceback:

[ 7149.156072] IP: [<c011afc8>] __ticket_spin_lock+0x8/0x20
[ 7149.156080] *pde = 00000000 
[ 7149.156084] Oops: 0002 [#1] SMP 
[ 7149.156087] Modules linked in: binfmt_misc radeon drm rfcomm sco bnep l2cap bluetooth vboxnetadp vboxnetflt vboxdrv ppdev lirc_atiusb lirc_dev powernow_k8 cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative pci_slot container sbs sbshc video output wmi battery bridge stp iptable_filter ip_tables x_tables ac it87 hwmon_vid sbp2 lp loop snd_hda_intel snd_pcm_oss dvb_usb_dib0700 dib7000p dib7000m dvb_usb snd_mixer_oss dvb_core snd_pcm dib3000mc pcspkr snd_seq_dummy dibx000_common k8temp dib0070 snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device i2c_piix4 snd i2c_core soundcore snd_page_alloc evdev parport_pc parport ati_agp button shpchp pci_hotplug agpgart ipv6 ext3 jbd mbcache usbhid hid ata_generic pata_acpi sr_mod cdrom sd_mod crc_t10dif sg ohci1394 pata_atiixp ohci_hcd ehci_hcd ahci uhci_hcd e1000 ieee1394 libata r8169 mii usbcore scsi_mod dock floppy raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 7149.156151] 
[ 7149.156153] Pid: 10, comm: events/1 Not tainted (2.6.27-14-generic #1)
[ 7149.156156] EIP: 0060:[<c011afc8>] EFLAGS: 00010246 CPU: 1
[ 7149.156159] EIP is at __ticket_spin_lock+0x8/0x20
[ 7149.156161] EAX: 00000004 EBX: f7063f5c ECX: f25324b8 EDX: 00000100
[ 7149.156163] ESI: f25324b4 EDI: f91761f0 EBP: f7063f34 ESP: f7063f34
[ 7149.156165]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 7149.156167] Process events/1 (pid: 10, ti=f7062000 task=f7068000 task.ti=f7062000)
[ 7149.156169] Stack: f7063f3c c037ffed f7063f44 f917b8c3 f7063f6c f9176222 00000000 f7063f5c 
[ 7149.156175]        00000000 f2532410 00000000 f25324b4 f700e900 f91761f0 f7063fa4 c01437c5 
[ 7149.156180]        00067ca7 c20f9d00 f7063f90 00000246 f25324b8 de9aaa80 f700e904 ffffffff 
[ 7149.156184] Call Trace:
[ 7149.156186]  [<c037ffed>] ? _spin_lock+0xd/0x10
[ 7149.156191]  [<f917b8c3>] ? RTSpinlockAcquire+0x13/0x20 [vboxnetflt]
[ 7149.156201]  [<f9176222>] ? vboxNetFltLinuxXmitTask+0x32/0x1e0 [vboxnetflt]
[ 7149.156207]  [<f91761f0>] ? vboxNetFltLinuxXmitTask+0x0/0x1e0 [vboxnetflt]
[ 7149.156213]  [<c01437c5>] ? run_workqueue+0x95/0x160
[ 7149.156218]  [<c0147786>] ? finish_wait+0x16/0x70
[ 7149.156221]  [<c0143a68>] ? worker_thread+0x88/0xf0
[ 7149.156224]  [<c0147630>] ? autoremove_wake_function+0x0/0x50
[ 7149.156227]  [<c01439e0>] ? worker_thread+0x0/0xf0
[ 7149.156230]  [<c01472c1>] ? kthread+0x41/0x80
[ 7149.156233]  [<c0147280>] ? kthread+0x0/0x80
[ 7149.156236]  [<c0105297>] ? kernel_thread_helper+0x7/0x10
[ 7149.156240]  =======================
[ 7149.156241] Code: 00 00 00 55 8b 10 89 e5 5d 89 d0 c1 f8 08 29 d0 25 ff 00 00 00 83 e8 01 0f 9f c0 0f b6 c0 c3 8d 74 26 00 55 ba 00 01 00 00 89 e5 <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 8d b4 26 00 
[ 7149.156265] EIP: [<c011afc8>] __ticket_spin_lock+0x8/0x20 SS:ESP 0068:f7063f34
[ 7149.156271] ---[ end trace 79c06d8e0604e4df ]---

comment:8 Changed 4 years ago by frank

Still relevant with VBox 3.2.8?

comment:9 Changed 4 years ago by Chris56

I've not seen it happen again with 3.28 running on 2.6.32-24-generic 64 bit host.

comment:10 Changed 3 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Thanks. Please reopen this ticket if necessary (except #7908, that's a separate problem).

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use