VirtualBox

Ticket #16960 (closed defect: fixed)

Opened 4 months ago

Last modified 2 months ago

VirtualBox 5.1.26 crashes when using VLAN in linux guest over Internal Network => Fixed in 5.1.28

Reported by: RomanovR Owned by:
Priority: critical Component: network
Version: VirtualBox 5.1.26 Keywords: vlan, internal network
Cc: Guest type: Linux
Host type: Linux

Description (last modified by vushakov) (diff)

VirtualBox 5.1.26 crashes when using VLAN in linux guest over Internal Network. After 5-10 mins launching VM guest in Host console I hane an error:

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [EMT-0:19382]

The same bug was in 5.1.24 version and it there are only when using VLAN in guest VM via Internal Network in Host. Portion of logs attached (log in Host)

Attachments

vbox.log Download (7.3 KB) - added by RomanovR 4 months ago.
Logs from LInux Host for problem of using VLAN with Ineternal Network
error.png Download (25.0 KB) - added by rmaksimov 3 months ago.

Change History

Changed 4 months ago by RomanovR

Logs from LInux Host for problem of using VLAN with Ineternal Network

comment:1 Changed 3 months ago by vushakov

  • Description modified (diff)
  • Summary changed from VirtualBox 5.1.26 crushing when using VLAN in linux guest over Internal Network to VirtualBox 5.1.26 crashes when using VLAN in linux guest over Internal Network

comment:2 Changed 3 months ago by vushakov

Please, can you describe the scenario in more details? I left two VMs pinging each other over internal network with vlan configured and they happily survived for a couple of hours. Do I need more load or some specific load to trigger this?

Also, I can quite parse the last paragraph. What is the difference between 5.1.24 and 5.1.26 that you are trying to describe there?

comment:3 Changed 3 months ago by toMeloos

Would like to confirm this issue.

Been using VirtualBox 5.1.8 on Fedora 25 without any problem for months. Then tried VB 5.1.26 on both F25, F26 and the latest CentOS 7 (both with the stock 3.10 and mainline 4.12 kernel) on a HP Z800 and a Dell PowerEdge r610 and we get the CPU soft lockup issue on both. Downgraded back to 5.1.8 and the problem disappears, so we are now happily running that again on CentOS 7 with the stock 3.10 kernel. We have not tested the VB versions between 5.1.8 and 5.1.26.

Our VM's have 4 vCPU and 16 GB ram and run Ubuntu 16.04. They run in a cluster that uses VLANs over VirtualBox Internal Networks for traffic between nodes. We have two separate VirtualBox Internal Networks that both host a few VLANs.

A while after our second VM comes up, the NMI watchdog starts sending error messages to stdout. I'm pretty sure this is around the time puppet on the second machine has configured networking and services and they start generating network traffic. Very soon after, the second VM crashes/gets killed and we're stuck with a defunct VirtualBox process and the watchdog still generating warnings. Only solution is to reboot the host. What's also really weird is that the NMI watchdog warnings show up even after we disabled the NMI watchdog on both the host and the guests.

Last edited 3 months ago by vushakov (previous) (diff)

comment:4 Changed 3 months ago by vushakov

  • Description modified (diff)

comment:5 Changed 3 months ago by vushakov

Please, provide VM's *.vbox file and the log file.

Changed 3 months ago by rmaksimov

comment:6 Changed 3 months ago by rmaksimov

Confirm this problem.

It seems like there is a bug (???) with Intel PRO/1000 MT Desktop (82540EM), enabled GSO/TSO (by default for this NIC) and configured VLAN. It doesn't matter which "Attached to" type is used (Internal Network or Bridged Adapter).

There is a simple scheme to reproduce this behavior.
VM-1:
Intel PRO/1000 MT Desktop (82540EM)
Ubuntu Server + Wget

ip l a l eth0 name eth0.100 type vlan id 100
ip a a 10.10.10.10/24 dev eth0.100
ip l s eth0.100 up

VM-2:
Intel PRO/1000 MT Desktop (82540EM)
Ubuntu Server + Apache (default page, ~10KiB)

ip l a l eth0 name eth0.100 type vlan id 100
ip a a 10.10.10.20/24 dev eth0.100
ip l s eth0.100 up

Now, if you try to wget 10.10.10.20 from 10.10.10.10, VM-2 will be crashed and the host will be frozen completely a few moments later. Sometimes a window with an error appears (see error.png attachment).

Important things are the following:

  1. Intel PRO/1000 MT Desktop (82540EM) as a NIC on VM-2
  2. Enabled GSO/TSO (by default for this NIC)
  3. Payload size (transferred file size)

Solution:

  1. Just change the network card on VM-2 (the other Intel's cards work perfect, e.g. Intel PRO/1000 T Server 82543GC - GSO/TSO is disabled by default).
  2. Another way is to disable GSO/TSO for eth0.100 on VM-2 with ethtool:
    ethtool -K eth0.100 gso off
    

The problem is present in VirtualBox 5.1.26 r117224; Windows 7 x64.

Last edited 3 months ago by rmaksimov (previous) (diff)

comment:7 Changed 2 months ago by aleksey

This was indeed a regression in 5.1.26 related to segmentation offloading. The fix will be included into the next maintenance release.

comment:8 Changed 2 months ago by aleksey

  • Summary changed from VirtualBox 5.1.26 crashes when using VLAN in linux guest over Internal Network to VirtualBox 5.1.26 crashes when using VLAN in linux guest over Internal Network => Fixed in SVN

comment:9 Changed 2 months ago by michael

  • Status changed from new to closed
  • Resolution set to fixed
  • Summary changed from VirtualBox 5.1.26 crashes when using VLAN in linux guest over Internal Network => Fixed in SVN to VirtualBox 5.1.26 crashes when using VLAN in linux guest over Internal Network => Fixed in 5.1.28
Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use