VirtualBox

Opened 9 years ago

Last modified 9 years ago

#14213 new defect

NAT networking stops responding blocks all I/O on that interface for Ubuntu/Debian x64 guests

Reported by: Coffee_fan Owned by:
Component: network/NAT Version: VirtualBox 4.3.28
Keywords: NAT Cc: pierrj@…
Guest type: Linux Host type: Windows

Description

Summary

I can reproduce almost 100% this fault, which seems to affect Virtualbox versions 4.3.x in both Windows 8.1 and Windows 10 using ubuntu 14.04.2 or debian Jessie guests.

I attached a script, which pings a well known site once a second that you can run to show the precise moment at which networking stops responding. You may use this script or any tool you like for this. For the purposes of the bug description I will assume you are using the embedded script.

Repro steps to detect fault

  1. Make sure you have a standard NAT based single interface ubuntu guest running.
  2. Log-in to guest.
  3. Install either google chrome (triggers fault immediately) or chromium-browser (ubuntu) or chromium (debian jessie) which trigger very often but not 100% of the time. Install google-chrome using the following snippet:
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - 
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt-get -yqq update
sudo apt-get install -yqq google-chrome-stable
  1. Start embedded script to monitor network, access.
    Expected result: It should be possible to ping well known address, the
     result should be something along these lines:
    ...
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    ...
    
  1. Start chromium browser from UI and try to access google.com or any address.
Result: Scripts start showing that network access is broken. Lines like
the following one show in script after a small delay where everything is
frozen:
...
1 packets transmitted, 0 received, 100% packet loss, time 0ms
...

Result 2: Virtualbox process in Windows becomes unresponsive and unstable,
cannot be easily stopped, may turn windows blank as hung processes show.

Expected result: Network access should work normally, no disruptions.

Script to trigger fault

For repro steps I used:

./net_tester.sh -x
$ cat net_tester.sh 
#!/bin/bash

function usage()
{

  cat <<eof

Usage: $(basename $0) [-x] [-h] [output log file]

The purpose of this script is to check network reliability of Virtualbox

Options:

-x, --external  Uses the google DNS to ping instead of NAT gateway address.
-h, --help      This message.

eof
  exit 1
}

#
# Change the following line to the address of the NAT gateway.
#
ping_address=10.0.0.2
out_file=out.log

while [ "$1" != "" ]
do
  case "$1" in
  -h|--help) usage ;;
  -x|--external) ping_address=8.8.8.8 ;;
  *) out_file=$1 ;;
  esac
  shift
done

while :
do
  date
  ping -c 1 ${ping_address}
  sleep 1
done | tee ${out_file}

Attachments (4)

VBox.log (70.3 KB ) - added by Coffee_fan 9 years ago.
Machine VBox.log on Windows 10
syslog.7z (51.8 KB ) - added by Coffee_fan 9 years ago.
Syslog from ubuntu 14.04.2 lxde on Windows 10
vbox4.3.28_windows.log.7z (164.1 KB ) - added by Coffee_fan 9 years ago.
Added file with log of connectivity which shows how network connectivity appears and disappears.
VMs_my_lxde_1432058106676_67481.vbox (9.8 KB ) - added by Coffee_fan 9 years ago.
VBox file

Download all attachments as: .zip

Change History (20)

comment:1 by Valery Ushakov, 9 years ago

Probably a duplicate of #13987

Version 0, edited 9 years ago by Valery Ushakov (next)

comment:2 by Coffee_fan, 9 years ago

As far as I remember, in my case I went as far back as 4.3.8 and reproduced similar behavior in each build. I was hoping it was a regression, but does not seem to be. I will try test-build and I will try again with 4.3.8 or 4.3.12 which seemed to be the most stable. I will also include a table with my findings.

comment:3 by Valery Ushakov, 9 years ago

Actually, since the naming is unfortunately confusing, do you use "NAT" or "NAT Network"?

comment:4 by Coffee_fan, 9 years ago

I use NAT not NAT Network.

comment:5 by Coffee_fan, 9 years ago

This is a table I assembled about a month ago, which listed my observations of running not google chrome, but a build process that takes some 12 minutes without network interruptions. I apologize for the reference to VMWare in 4.3.28, but oddly enough, I am able to recursively run ubuntu in VirtualBox 4.3.28 inside an ubuntu VMWare guest, with no network problems and the build in this case would take 12 mins, which is similar to what Virtualbox native takes when the networking is stable.

Version Summary Comments
4.3.8 Good for W81 Network interruptions every two minutes, but they do not seem as disruptive as in other builds.
4.3.12 Barely OK for W81 Build takes 17m22s with network interruptions of 10secs every minute or so. Recovery takes longer.
4.3.18 Good for W81 Build takes 16 mins with network interruptions of 10secs every 2minutes, whereas, the same, in Virtualbox 4.3.28 inside a VMWare VM takes 11m50 and in VMWare it takes 16m.
4.3.28 Too fragile Constant network interruptions

comment:6 by Coffee_fan, 9 years ago

Sorry for the name confusion, was trying to find the best way to describe. :-).

comment:7 by Valery Ushakov, 9 years ago

When network interruption happens, is it just packet loss or do you lose physical (well, virtual :) link with DHCP renewal afterwards?

comment:8 by Coffee_fan, 9 years ago

I tried 4.3.29.101039 on Windows 10 10130 and the behavior reproes immediately. As soon as I have time, will try test-build with Windows 8.1.

comment:9 by Valery Ushakov, 9 years ago

Please, can you attach VBox.log that corresponds to the run that experience problems. If you can also provide both host and guest side packet captures, that would also be useful. TIA.

in reply to:  7 comment:10 by Coffee_fan, 9 years ago

Replying to vushakov:

When network interruption happens, is it just packet loss or do you lose physical (well, virtual :) link with DHCP renewal afterwards?

I lose all connectivity. I have not sniffed the network to see whether there is DHCP renewal or not. I have wireshark and will check that.

comment:11 by Valery Ushakov, 9 years ago

Do you see e1000: eth0 NIC Link is Down in dmesg in /var/log/syslog (and corresponding network manager messages)?

by Coffee_fan, 9 years ago

Attachment: VBox.log added

Machine VBox.log on Windows 10

by Coffee_fan, 9 years ago

Attachment: syslog.7z added

Syslog from ubuntu 14.04.2 lxde on Windows 10

in reply to:  11 comment:12 by Coffee_fan, 9 years ago

Replying to vushakov:

Do you see e1000: eth0 NIC Link is Down in dmesg in /var/log/syslog (and corresponding network manager messages)?

I did not see NIC Link is down in dmesg. I only saw this:

[    2.665021] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

As to /var/log/syslog not sure what to look for. I attached it and also the host side.

I can create a fresh VM on Windows 8.1 and we start from there if you want, as I am on Windows 10 in this machine, which adds more variables. Your choice.

comment:13 by Valery Ushakov, 9 years ago

Please, can you also attach the .vbox file of the VM. Your VBox.log has

NAT: Host Resolver conflicts with DNS proxy, the last one was forcely ignored

so you probably have them both turned on and that's probably not intended. You don't want host resolver, unless you have a very special setup, which you most likely don't.

by Coffee_fan, 9 years ago

Attachment: vbox4.3.28_windows.log.7z added

Added file with log of connectivity which shows how network connectivity appears and disappears.

by Coffee_fan, 9 years ago

VBox file

comment:14 by Coffee_fan, 9 years ago

I think I use host resolver, because otherwise, the intranet DNS, which is out of Windows Active Directory does not get properly propagated to the VMs, which means I can resolve public IP addresses, but NOT Intranet addresses. When I put --natdnshostresolver1, internal network resolution works.

Could the conflict you mention be because yesterday, trying to find a way in which things would work, I enabled --natdnsproxy1 in addition to --natdnshostresolver1?

If that is the case, the net result is it did not work. Currently in the vagrant file, both settings are commented out and the issue is still happening.

comment:15 by Coffee_fan, 9 years ago

The file size limit impedes uploading the captures. Do you have an email address I can send this to?

comment:16 by Valery Ushakov, 9 years ago

Packet captures compress very well. If they are not significantly larger than the limit, split(1) them. If they are significantly larger, I'm afraid they will be above the limits that the mail server accepts anyway.

dropbox or some other cloud storage?

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use