VirtualBox

Ticket #8755 (closed defect: fixed)

Opened 12 years ago

Last modified 12 years ago

Ubuntu 10.04 kernel panic on high network load => Fixed in SVN

Reported by: fryckbos Owned by:
Component: network Version: VirtualBox 4.0.6
Keywords: e1000 Cc:
Guest type: Linux Host type: Linux

Description (last modified by frank) (diff)

Setup:

  • VirtualBox 4.0.4
  • Ubuntu 10.04 host (64 bit)
  • 1 Ubuntu 10.04 guest (64 bit)
  • Running on a quad core AMD Opteron
  • Bridged network, using e1000.

Workload:

  • 4 concurrent http wgets to a local server (at gigabit)

Problem:

  • No problems when using 1 core
  • Kernel panic in guest when using 4 cores

Kernel panic is always in e1000_clean.

Attachments

VBox.log Download (37.7 KB) - added by fryckbos 12 years ago.
kernel_panic Download (4.7 KB) - added by fryckbos 12 years ago.
kernel_panic-php.txt Download (3.1 KB) - added by smariel 12 years ago.
Stack trace of the kernel panic
VBox-php.log Download (47.3 KB) - added by smariel 12 years ago.
VBox log file of the virtual machine.
e1000.2.ko Download (176.3 KB) - added by smariel 12 years ago.
module for Ubuntu 10.04 x86_64 : uname -a : Linux demo 2.6.32-24-server #39-Ubuntu SMP Wed Jul 28 06:21:40 UTC 2010 x86_64 GNU/Linux
e1000.ko Download (178.5 KB) - added by fryckbos 12 years ago.
e1000 driver for the 2.6.35 kernel
VBox.2.log Download (38.4 KB) - added by fryckbos 12 years ago.
Log when using PCNet
console.txt Download (19.4 KB) - added by fryckbos 12 years ago.
Console output when using PCNet
e1000_smp_panic_fix.patch Download (565 bytes) - added by aleksey 12 years ago.
Patch for panic in SMP Linux kernels
vboxcrash.JPG Download (142.2 KB) - added by rgcosma 12 years ago.
SL6 crash with e1000 adapter

Change History

Changed 12 years ago by fryckbos

Changed 12 years ago by fryckbos

comment:1 Changed 12 years ago by fryckbos

Forgot to mention that the kernel panic does not occur instantaneously but after a while, it can take up to half an hour.

We are having the same problem when running a web server at high load.

comment:2 Changed 12 years ago by fryckbos

Tested this on VirtualBox 4.0.6 and the problem is still there.

comment:3 Changed 12 years ago by smariel

Same problem here. Also under "heavy" load. The virtual machine is running a web server with PHP.

FYI the virtual machine got two network adapter attached (one public, one private subnet, the latest turns to be our private VLAN on the host).

I'm attaching the kernel_panic dump, and the VBox.log.

Changed 12 years ago by smariel

Stack trace of the kernel panic

Changed 12 years ago by smariel

VBox log file of the virtual machine.

comment:4 Changed 12 years ago by smariel

Like fryckbos said earlier, latest VirtualBox 4.0.6 does not solve the problem.

comment:5 Changed 12 years ago by frank

  • Version changed from VirtualBox 4.0.4 to VirtualBox 4.0.6
  • Description modified (diff)

comment:6 Changed 12 years ago by frank

Just to be sure: Does the guest kernel panics always within the E1000 driver? At least it looks like because we have two similar kernel logs from two different reporters...

comment:7 Changed 12 years ago by frank

An interesting question would be if similar guest kernel crashes also happen with the PCNet device and with the virtio device (Ubuntu should support both devices).

comment:8 Changed 12 years ago by fryckbos

The kernel always panics in e1000_clean. We've encountered this at least 10 times.

I will repeat the test with the other 2 network devices.

comment:9 Changed 12 years ago by fryckbos

The same test with Virtio is even worse. After 2 minutes the networks becomes flaky

  • I can open new ssh sessions, type in the password, but I can't issue any commands.
  • The VM doesn't hang fully: I can still login on the console and run commands.
  • If I do a ping from within the guest, the first ping returns but the others fail.

The VBox.log file does not show any output, and the guest does not show a kernel panic.

The PCNet test is now running for one hour and is doing fine at the moment.

comment:10 Changed 12 years ago by frank

Oh, one more issue: Could both of you attach the binary e1000 module to this ticket? Thanks!

Changed 12 years ago by smariel

module for Ubuntu 10.04 x86_64 : uname -a : Linux demo 2.6.32-24-server #39-Ubuntu SMP Wed Jul 28 06:21:40 UTC 2010 x86_64 GNU/Linux

comment:11 Changed 12 years ago by smariel

Hi Franck,

As of now, there seems to be 2 workarounds :

  • use a 1 CPU only virtual machine (as fryckbos said),
  • use Am79C973 as NIC type.

Currently our test platform is running with the later option without crashing (let's wait and see, but it should have already crashed according our previous tests).

comment:12 Changed 12 years ago by fryckbos

Is it possible that the PCNet NIC goes faster than 100 Mbit ?

comment:13 Changed 12 years ago by frank

There is no speed limitation. The e1000 device is usually a bit faster than pcnet but even with pcnet you should be easily able to exceed 100MBit.

comment:14 Changed 12 years ago by frank

fryckbos, the kernel module you attached does not fit the kernel log. Your guest kernel is apparently 2.6.35-24, please attach the correct e1000.ko module from this kernel.

Changed 12 years ago by fryckbos

e1000 driver for the 2.6.35 kernel

comment:15 Changed 12 years ago by frank

Ok, the crash happens there at exact the same position.

comment:16 Changed 12 years ago by fryckbos

The PCNet device also gave up, but it took about 5 hours. VirtualBox is still running, the guest is still running but network connections fail. I can still ssh, give my password, get a console but I can't type commands.

It seems to be a problem with the clocks: if I do a sleep of 10 seconds at the console, it never returns. The following message is in dmesg:

[ 936.534361] hrtimer: interrupt took 59930437 ns [18529.704149] Clocksource tsc unstable (delta = 4686851418 ns)

comment:17 Changed 12 years ago by frank

That might be a different problem. Are there any relevant entries in the log to the pcnet card? Could you attach the complete output of 'dmesg' from the guest and the VBox.log file?

Changed 12 years ago by fryckbos

Log when using PCNet

Changed 12 years ago by fryckbos

Console output when using PCNet

comment:18 Changed 12 years ago by fryckbos

I attached the VBox and console log, but as far as I know they don't show any errors. I don't know why the hrtime message does not show up in the console log, it was on the dmesg output, but I did not save the output of dmesg.

comment:19 Changed 12 years ago by frank

During test it is always advisable to add the kernel parameters console=ttyS0 ignore_loglevel to the kernel command line and to enable a serial port in the VM settings and attach this port to a file. This will log any guest kernel messages to the external file.

comment:20 Changed 12 years ago by fryckbos

Thanks, I was doing the logging to file with console=ttyS0 but did not have the ignore_loglevel.

comment:21 Changed 12 years ago by fryckbos

Should I do another test with the appropriate logging ?

comment:22 Changed 12 years ago by frank

Not sure if that will help. We are currently working on finding the e1000 issue.

comment:23 Changed 12 years ago by frank

We found a potential problem in the E1000 implementation. We are currently fixing this issue and will provide you a test build when the fix is done.

comment:24 Changed 12 years ago by frank

Or a patch as you seem to compile VBox yourself.

comment:25 Changed 12 years ago by fryckbos

A patch would do fine.

Changed 12 years ago by aleksey

Patch for panic in SMP Linux kernels

comment:26 Changed 12 years ago by aleksey

I've attached a tiny patch that should fix the issue. In short words it prevents context descriptors from being written back with DD bit set which caused trouble in TX cleanup function of e1000 driver because context descriptors do not have skb associated with them. This happened in SMP guests since they can add TX descriptors and do cleanup simultaneously. Could you check if the patch indeed solves the problem for you?

comment:27 Changed 12 years ago by rgcosma

I will add a "mee too" because I see what seems to be the same problem on Scientific Linux 6 which has the same kernel - 2.6.32 64-bit, host is Win XP 32 bit.

Changed 12 years ago by rgcosma

SL6 crash with e1000 adapter

comment:28 Changed 12 years ago by fryckbos

The tiny patch posted above fixes the problem ! Thanks a lot !

comment:29 Changed 12 years ago by frank

rgcosma, could you check if  this build fixes the problem for you as well?

comment:30 Changed 12 years ago by rgcosma

installed, running, transferring - didn't crash yet so I guess yes.

comment:31 Changed 12 years ago by frank

  • Summary changed from Ubuntu 10.04 kernel panic on high network load to Ubuntu 10.04 kernel panic on high network load => Fixed in SVN

Great, thanks for the feedback! The fix will be also available in the next maintenance release.

comment:32 Changed 12 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fixed in VBox 4.0.8.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use