#4434 closed defect (fixed)

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Reported by:	Costin Grigoras	Owned by:
Component:	network	Version:	VirtualBox 3.0.0
Keywords:		Cc:
Guest type:	Linux	Host type:	Linux

Description

I see quite frequent this kind of message in the guest, 2.6.28.3, 2.6.29.4 or 2.6.31-rc2 (e1000 ver 7.3.20-k3-NAPI and 7.3.21-k3-NAPI):

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <41>
  TDT                  <46>
  next_to_use          <46>
  next_to_clean        <41>
buffer_info[next_to_clean]
  time_stamp           <ffffc090>
  next_to_watch        <42>
  jiffies              <ffffc120>
  next_to_watch.status <0>

Attachments (3)

panic.png (36.7 KB ) - added by Ole Tange 16 years ago.: Screenshot of kernel panic
watchdog1.png (35.5 KB ) - added by Ole Tange 16 years ago.: Watchdog output part 1
watchdog2.png (35.6 KB ) - added by Ole Tange 16 years ago.: Watchdog output part 2

Download all attachments as: .zip

Change History (18)

comment:1 by Frank Mehnert, 16 years ago

Resolution:	→ duplicate
Status:	new → closed

Most probably a duplicate of #4343.

comment:2 by Costin Grigoras, 16 years ago

Might be, though in this particular case the guest doesn't hang. Let's see then in the next version.

comment:3 by Ole Tange, 16 years ago

Resolution:	duplicate
Status:	closed → reopened

I got the error message, too. But with different values. The network would freeze, but the server would run just fine. Sometimes the network managed to get unstuck.

My setup:

Host: Linux 2.6.30-amd64, 8 cpus. Running virtualbox-ose 3.0.4 Guest: Linux 2.6.30-amd64, 8 cpus. Bridged network.

I can provoke the error by rsyncing a large directory (100 GB) to the guest. This causes sustained inbound traffic of 80 Mbps.

If I run the guest with 8 cpus the error consistently occurs after 1000-2000 seconds. If the guest was run with 1 cpu it took 8500 seconds before it occurred.

If I run the guest with NAT'ed network I managed to provoke the error as well, but after around 6000 seconds on 8 cpu.

I have tested bridging on 8 cpu with virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb and get the same problem after 1500 seconds. So the problem is not fixed.

comment:4 by Ole Tange, 16 years ago

I wondered if increasing the number of CPUs past the number of physical CPUs would provoke this problem earlier. I just ran: Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

This provoked the problem after 130 seconds in first try, and after 1200 seconds in second try.

I have wondered if the problem can be caused by a flaky clock.

[    0.036000] Spurious LAPIC timer interrupt on cpu 0
[    0.196001] Measured 15867 cycles TSC warp between CPUs, turning off TSC clock.
[    0.196001] Marking TSC unstable due to check_tsc_sync_source failed
[    9.548093] PCSP: Timer resolution is not sufficient (4000250nS)
[   10.116099] intel8x0_measure_ac97_clock: measured 59999 usecs (11276 samples)
[   10.120017] intel8x0: measured clock 187936 rejected

by Ole Tange, 16 years ago

Attachment:	panic.png added

Screenshot of kernel panic

comment:5 by Ole Tange, 16 years ago

The problem gets more and more peculiar: I have tested this setup:

Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

The after 25 minutes of 80 Mbit/s sustained traffic the network broke. I even got at dump.

But as soon as I pressed enter in the console window the network worked again.

The network broke again after 3 minutes. Pressing space in the console solved it.

The network broke again after 4 minutes. Pressing 'f' in the console solved it.

It seems pressing any key in the console makes the network run again.

I have been able to get at better screenshot of the watchdog error messages from the kernel. These are attached.

by Ole Tange, 16 years ago

Attachment:	watchdog1.png added

Watchdog output part 1

by Ole Tange, 16 years ago

Attachment:	watchdog2.png added

Watchdog output part 2

comment:6 by Ole Tange, 16 years ago

Because of my experience of having to press a key I got the idea, that the fault may be in the VirtualBox (the GUI). So I ran VBoxHeadless on the same virtual machine. It has now been running for 5000 seconds with 80 Mbit/s sustained without a hiccup. This leads me to believe the problem is in the interaction with VirtualBox (the GUI).

I have now installed virtualbox-ose 3.0.6 (r52128), and will try to see if the VBoxHeadless solves the issue here aswell.

costing: You reported this bug. Can you reproduce it today? Is it gone if you run the vm with VBoxHeadless?

comment:7 by Costin Grigoras, 16 years ago

I was always running under VBoxHeadless. The messages were there in 3.0.4 for sure. Since upgrading to 3.0.6 I haven't seen them any more (yet?). But for me they didn't cause any major problems, the system was still working without intervention. Just that dmesg is full of such errors.

comment:8 by Ole Tange, 16 years ago

virtualbox-ose 3.0.6 (r52128) running VBoxHeadless drops the network connection after 1200 sec. But since it is headless I cannot tell if it was due to the same bug.

comment:9 by Ole Tange, 16 years ago

VBoxHeadless crashed the network, too. I have tested this setup:

Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

Running under VBoxHeadless the network broke down after 5900 seconds. If I rdesktop'ed into the server after the network had stopped working and pressed 'enter' then the network worked again immediately, so it seems the GUI is not the cause of the problem afterall.

Will it be helpful if I make a snapshot of the server in the broken down state?

comment:10 by Frank Mehnert, 16 years ago

No. Btw, you should upgrade to the final 3.0.6 though I don't think this will solve your problem. Do you have more than one guest CPU enabled?

comment:11 by Ole Tange, 16 years ago

Yes: As mentioned I have tried with both 8 CPU and 12 CPU on guest. The 4 extra CPUs did not change anything - neither for good nor bad.

As a workaround is it possible to press 'enter' using a program on the host machine? (I.e. can I write a script that presses 'enter' every minute?).

comment:12 by Ole Tange, 16 years ago

Also as mentioned in initial report: "If the guest was run with 1 cpu it took 8500 seconds before it occurred." So just using 1 CPU does not solve the issue, but seems to postpone it somewhat.

comment:13 by Ole Tange, 16 years ago

One of the things I seem to have forgotten to mention is that both host and guest is 64-bit.

comment:14 by Frank Mehnert, 14 years ago

Still relevant with VBox 4.0.6? Perhaps related to #8755?

comment:15 by Frank Mehnert, 14 years ago

Resolution:	→ fixed
Status:	reopened → closed

No response, closing.

Note: See TracTickets for help on using tickets.

Download in other formats: