VirtualBox

Ticket #4434 (closed defect: fixed)

Opened 5 years ago

Last modified 3 years ago

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Reported by: costing Owned by:
Priority: minor Component: network
Version: VirtualBox 3.0.0 Keywords:
Cc: Guest type: Linux
Host type: Linux

Description

I see quite frequent this kind of message in the guest, 2.6.28.3, 2.6.29.4 or 2.6.31-rc2 (e1000 ver 7.3.20-k3-NAPI and 7.3.21-k3-NAPI):

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <41>
  TDT                  <46>
  next_to_use          <46>
  next_to_clean        <41>
buffer_info[next_to_clean]
  time_stamp           <ffffc090>
  next_to_watch        <42>
  jiffies              <ffffc120>
  next_to_watch.status <0>

Attachments

panic.png Download (36.7 KB) - added by ole.tange 5 years ago.
Screenshot of kernel panic
watchdog1.png Download (35.5 KB) - added by ole.tange 5 years ago.
Watchdog output part 1
watchdog2.png Download (35.6 KB) - added by ole.tange 5 years ago.
Watchdog output part 2

Change History

comment:1 Changed 5 years ago by frank

  • Status changed from new to closed
  • Resolution set to duplicate

Most probably a duplicate of #4343.

comment:2 Changed 5 years ago by costing

Might be, though in this particular case the guest doesn't hang. Let's see then in the next version.

comment:3 Changed 5 years ago by ole.tange

  • Status changed from closed to reopened
  • Resolution duplicate deleted

I got the error message, too. But with different values. The network would freeze, but the server would run just fine. Sometimes the network managed to get unstuck.

My setup:

Host: Linux 2.6.30-amd64, 8 cpus. Running virtualbox-ose 3.0.4 Guest: Linux 2.6.30-amd64, 8 cpus. Bridged network.

I can provoke the error by rsyncing a large directory (100 GB) to the guest. This causes sustained inbound traffic of 80 Mbps.

If I run the guest with 8 cpus the error consistently occurs after 1000-2000 seconds. If the guest was run with 1 cpu it took 8500 seconds before it occurred.

If I run the guest with NAT'ed network I managed to provoke the error as well, but after around 6000 seconds on 8 cpu.

I have tested bridging on 8 cpu with virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb and get the same problem after 1500 seconds. So the problem is not fixed.

comment:4 Changed 5 years ago by ole.tange

I wondered if increasing the number of CPUs past the number of physical CPUs would provoke this problem earlier. I just ran: Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

This provoked the problem after 130 seconds in first try, and after 1200 seconds in second try.

I have wondered if the problem can be caused by a flaky clock.

[    0.036000] Spurious LAPIC timer interrupt on cpu 0
[    0.196001] Measured 15867 cycles TSC warp between CPUs, turning off TSC clock.
[    0.196001] Marking TSC unstable due to check_tsc_sync_source failed
[    9.548093] PCSP: Timer resolution is not sufficient (4000250nS)
[   10.116099] intel8x0_measure_ac97_clock: measured 59999 usecs (11276 samples)
[   10.120017] intel8x0: measured clock 187936 rejected

Changed 5 years ago by ole.tange

Screenshot of kernel panic

comment:5 Changed 5 years ago by ole.tange

The problem gets more and more peculiar: I have tested this setup:

Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

The after 25 minutes of 80 Mbit/s sustained traffic the network broke. I even got at dump.

But as soon as I pressed enter in the console window the network worked again.

The network broke again after 3 minutes. Pressing space in the console solved it.

The network broke again after 4 minutes. Pressing 'f' in the console solved it.

It seems pressing any key in the console makes the network run again.

I have been able to get at better screenshot of the watchdog error messages from the kernel. These are attached.

Changed 5 years ago by ole.tange

Watchdog output part 1

Changed 5 years ago by ole.tange

Watchdog output part 2

comment:6 Changed 5 years ago by ole.tange

Because of my experience of having to press a key I got the idea, that the fault may be in the VirtualBox (the GUI). So I ran VBoxHeadless on the same virtual machine. It has now been running for 5000 seconds with 80 Mbit/s sustained without a hiccup. This leads me to believe the problem is in the interaction with VirtualBox (the GUI).

I have now installed virtualbox-ose 3.0.6 (r52128), and will try to see if the VBoxHeadless solves the issue here aswell.

costing: You reported this bug. Can you reproduce it today? Is it gone if you run the vm with VBoxHeadless?

comment:7 Changed 5 years ago by costing

I was always running under VBoxHeadless. The messages were there in 3.0.4 for sure. Since upgrading to 3.0.6 I haven't seen them any more (yet?). But for me they didn't cause any major problems, the system was still working without intervention. Just that dmesg is full of such errors.

comment:8 Changed 5 years ago by ole.tange

virtualbox-ose 3.0.6 (r52128) running VBoxHeadless drops the network connection after 1200 sec. But since it is headless I cannot tell if it was due to the same bug.

comment:9 Changed 5 years ago by ole.tange

VBoxHeadless crashed the network, too. I have tested this setup:

Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

Running under VBoxHeadless the network broke down after 5900 seconds. If I rdesktop'ed into the server after the network had stopped working and pressed 'enter' then the network worked again immediately, so it seems the GUI is not the cause of the problem afterall.

Will it be helpful if I make a snapshot of the server in the broken down state?

comment:10 Changed 5 years ago by frank

No. Btw, you should upgrade to the final 3.0.6 though I don't think this will solve your problem. Do you have more than one guest CPU enabled?

comment:11 Changed 5 years ago by ole.tange

Yes: As mentioned I have tried with both 8 CPU and 12 CPU on guest. The 4 extra CPUs did not change anything - neither for good nor bad.

As a workaround is it possible to press 'enter' using a program on the host machine? (I.e. can I write a script that presses 'enter' every minute?).

comment:12 Changed 5 years ago by ole.tange

Also as mentioned in initial report: "If the guest was run with 1 cpu it took 8500 seconds before it occurred." So just using 1 CPU does not solve the issue, but seems to postpone it somewhat.

comment:13 Changed 5 years ago by ole.tange

One of the things I seem to have forgotten to mention is that both host and guest is 64-bit.

comment:14 Changed 3 years ago by frank

Still relevant with VBox 4.0.6? Perhaps related to #8755?

comment:15 Changed 3 years ago by frank

  • Status changed from reopened to closed
  • Resolution set to fixed

No response, closing.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use