VirtualBox

Opened 13 years ago

Closed 13 years ago

#8755 closed defect (fixed)

Ubuntu 10.04 kernel panic on high network load => Fixed in SVN

Reported by: Frederick Ryckbosch Owned by:
Component: network Version: VirtualBox 4.0.6
Keywords: e1000 Cc:
Guest type: Linux Host type: Linux

Description (last modified by Frank Mehnert)

Setup:

  • VirtualBox 4.0.4
  • Ubuntu 10.04 host (64 bit)
  • 1 Ubuntu 10.04 guest (64 bit)
  • Running on a quad core AMD Opteron
  • Bridged network, using e1000.

Workload:

  • 4 concurrent http wgets to a local server (at gigabit)

Problem:

  • No problems when using 1 core
  • Kernel panic in guest when using 4 cores

Kernel panic is always in e1000_clean.

Attachments (10)

VBox.log (37.7 KB ) - added by Frederick Ryckbosch 13 years ago.
kernel_panic (4.7 KB ) - added by Frederick Ryckbosch 13 years ago.
kernel_panic-php.txt (3.1 KB ) - added by Stéphane 13 years ago.
Stack trace of the kernel panic
VBox-php.log (47.3 KB ) - added by Stéphane 13 years ago.
VBox log file of the virtual machine.
e1000.2.ko (176.3 KB ) - added by Stéphane 13 years ago.
module for Ubuntu 10.04 x86_64 : uname -a : Linux demo 2.6.32-24-server #39-Ubuntu SMP Wed Jul 28 06:21:40 UTC 2010 x86_64 GNU/Linux
e1000.ko (178.5 KB ) - added by Frederick Ryckbosch 13 years ago.
e1000 driver for the 2.6.35 kernel
VBox.2.log (38.4 KB ) - added by Frederick Ryckbosch 13 years ago.
Log when using PCNet
console.txt (19.4 KB ) - added by Frederick Ryckbosch 13 years ago.
Console output when using PCNet
e1000_smp_panic_fix.patch (565 bytes ) - added by Aleksey Ilyushin 13 years ago.
Patch for panic in SMP Linux kernels
vboxcrash.JPG (142.2 KB ) - added by rgcosma 13 years ago.
SL6 crash with e1000 adapter

Download all attachments as: .zip

Change History (42)

by Frederick Ryckbosch, 13 years ago

Attachment: VBox.log added

by Frederick Ryckbosch, 13 years ago

Attachment: kernel_panic added

comment:1 by Frederick Ryckbosch, 13 years ago

Forgot to mention that the kernel panic does not occur instantaneously but after a while, it can take up to half an hour.

We are having the same problem when running a web server at high load.

comment:2 by Frederick Ryckbosch, 13 years ago

Tested this on VirtualBox 4.0.6 and the problem is still there.

comment:3 by Stéphane, 13 years ago

Same problem here. Also under "heavy" load. The virtual machine is running a web server with PHP.

FYI the virtual machine got two network adapter attached (one public, one private subnet, the latest turns to be our private VLAN on the host).

I'm attaching the kernel_panic dump, and the VBox.log.

by Stéphane, 13 years ago

Attachment: kernel_panic-php.txt added

Stack trace of the kernel panic

by Stéphane, 13 years ago

Attachment: VBox-php.log added

VBox log file of the virtual machine.

comment:4 by Stéphane, 13 years ago

Like fryckbos said earlier, latest VirtualBox 4.0.6 does not solve the problem.

comment:5 by Frank Mehnert, 13 years ago

Description: modified (diff)
Version: VirtualBox 4.0.4VirtualBox 4.0.6

comment:6 by Frank Mehnert, 13 years ago

Just to be sure: Does the guest kernel panics always within the E1000 driver? At least it looks like because we have two similar kernel logs from two different reporters...

comment:7 by Frank Mehnert, 13 years ago

An interesting question would be if similar guest kernel crashes also happen with the PCNet device and with the virtio device (Ubuntu should support both devices).

comment:8 by Frederick Ryckbosch, 13 years ago

The kernel always panics in e1000_clean. We've encountered this at least 10 times.

I will repeat the test with the other 2 network devices.

comment:9 by Frederick Ryckbosch, 13 years ago

The same test with Virtio is even worse. After 2 minutes the networks becomes flaky

  • I can open new ssh sessions, type in the password, but I can't issue any commands.
  • The VM doesn't hang fully: I can still login on the console and run commands.
  • If I do a ping from within the guest, the first ping returns but the others fail.

The VBox.log file does not show any output, and the guest does not show a kernel panic.

The PCNet test is now running for one hour and is doing fine at the moment.

comment:10 by Frank Mehnert, 13 years ago

Oh, one more issue: Could both of you attach the binary e1000 module to this ticket? Thanks!

by Stéphane, 13 years ago

Attachment: e1000.2.ko added

module for Ubuntu 10.04 x86_64 : uname -a : Linux demo 2.6.32-24-server #39-Ubuntu SMP Wed Jul 28 06:21:40 UTC 2010 x86_64 GNU/Linux

comment:11 by Stéphane, 13 years ago

Hi Franck,

As of now, there seems to be 2 workarounds :

  • use a 1 CPU only virtual machine (as fryckbos said),
  • use Am79C973 as NIC type.

Currently our test platform is running with the later option without crashing (let's wait and see, but it should have already crashed according our previous tests).

comment:12 by Frederick Ryckbosch, 13 years ago

Is it possible that the PCNet NIC goes faster than 100 Mbit ?

comment:13 by Frank Mehnert, 13 years ago

There is no speed limitation. The e1000 device is usually a bit faster than pcnet but even with pcnet you should be easily able to exceed 100MBit.

comment:14 by Frank Mehnert, 13 years ago

fryckbos, the kernel module you attached does not fit the kernel log. Your guest kernel is apparently 2.6.35-24, please attach the correct e1000.ko module from this kernel.

by Frederick Ryckbosch, 13 years ago

Attachment: e1000.ko added

e1000 driver for the 2.6.35 kernel

comment:15 by Frank Mehnert, 13 years ago

Ok, the crash happens there at exact the same position.

comment:16 by Frederick Ryckbosch, 13 years ago

The PCNet device also gave up, but it took about 5 hours. VirtualBox is still running, the guest is still running but network connections fail. I can still ssh, give my password, get a console but I can't type commands.

It seems to be a problem with the clocks: if I do a sleep of 10 seconds at the console, it never returns. The following message is in dmesg:

[ 936.534361] hrtimer: interrupt took 59930437 ns [18529.704149] Clocksource tsc unstable (delta = 4686851418 ns)

comment:17 by Frank Mehnert, 13 years ago

That might be a different problem. Are there any relevant entries in the log to the pcnet card? Could you attach the complete output of 'dmesg' from the guest and the VBox.log file?

by Frederick Ryckbosch, 13 years ago

Attachment: VBox.2.log added

Log when using PCNet

by Frederick Ryckbosch, 13 years ago

Attachment: console.txt added

Console output when using PCNet

comment:18 by Frederick Ryckbosch, 13 years ago

I attached the VBox and console log, but as far as I know they don't show any errors. I don't know why the hrtime message does not show up in the console log, it was on the dmesg output, but I did not save the output of dmesg.

comment:19 by Frank Mehnert, 13 years ago

During test it is always advisable to add the kernel parameters console=ttyS0 ignore_loglevel to the kernel command line and to enable a serial port in the VM settings and attach this port to a file. This will log any guest kernel messages to the external file.

comment:20 by Frederick Ryckbosch, 13 years ago

Thanks, I was doing the logging to file with console=ttyS0 but did not have the ignore_loglevel.

comment:21 by Frederick Ryckbosch, 13 years ago

Should I do another test with the appropriate logging ?

comment:22 by Frank Mehnert, 13 years ago

Not sure if that will help. We are currently working on finding the e1000 issue.

comment:23 by Frank Mehnert, 13 years ago

We found a potential problem in the E1000 implementation. We are currently fixing this issue and will provide you a test build when the fix is done.

comment:24 by Frank Mehnert, 13 years ago

Or a patch as you seem to compile VBox yourself.

comment:25 by Frederick Ryckbosch, 13 years ago

A patch would do fine.

by Aleksey Ilyushin, 13 years ago

Attachment: e1000_smp_panic_fix.patch added

Patch for panic in SMP Linux kernels

comment:26 by Aleksey Ilyushin, 13 years ago

I've attached a tiny patch that should fix the issue. In short words it prevents context descriptors from being written back with DD bit set which caused trouble in TX cleanup function of e1000 driver because context descriptors do not have skb associated with them. This happened in SMP guests since they can add TX descriptors and do cleanup simultaneously. Could you check if the patch indeed solves the problem for you?

comment:27 by rgcosma, 13 years ago

I will add a "mee too" because I see what seems to be the same problem on Scientific Linux 6 which has the same kernel - 2.6.32 64-bit, host is Win XP 32 bit.

by rgcosma, 13 years ago

Attachment: vboxcrash.JPG added

SL6 crash with e1000 adapter

comment:28 by Frederick Ryckbosch, 13 years ago

The tiny patch posted above fixes the problem ! Thanks a lot !

comment:29 by Frank Mehnert, 13 years ago

rgcosma, could you check if this build fixes the problem for you as well?

comment:30 by rgcosma, 13 years ago

installed, running, transferring - didn't crash yet so I guess yes.

comment:31 by Frank Mehnert, 13 years ago

Summary: Ubuntu 10.04 kernel panic on high network loadUbuntu 10.04 kernel panic on high network load => Fixed in SVN

Great, thanks for the feedback! The fix will be also available in the next maintenance release.

comment:32 by Frank Mehnert, 13 years ago

Resolution: fixed
Status: newclosed

Fixed in VBox 4.0.8.

Note: See TracTickets for help on using tickets.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette