VirtualBox

Ticket #9371 (closed defect: fixed)

Opened 3 years ago

Last modified 2 years ago

NAT Interface fails -> fixed in svn

Reported by: dnahas Owned by:
Priority: major Component: network/NAT
Version: VirtualBox 4.1.0 Keywords:
Cc: Guest type: Linux
Host type: Windows

Description (last modified by frank) (diff)

I have searched bug tracker and found a few with similar symptoms, but resolved bug id's. For my situation, something must have changed after 3.16(the latest functioning version). Before I submit a new Bug I figured I would post here

Environment:

  • VirtualBox 4.1.0 r73009 win.x86
  • Host OS Windows 7 Release: 6.1.7600
  • Guest OS Linux 2.6.24

Problem:

  • All traffic from the guest OS(10.0.3.101) to the VirtualBox NAT interface (10.0.3.2) fails after a duration of time.
  • Initially all traffic functions correctly, after approximately 15-20 min the guest OS is unable to contact the VirtualBox NAT interface. No ping to VirtualBox or anything else on the NAT interface of the guest. The guest is configured for two adapters. adapter 1 is a Host only network that continues to function normally when adapter 2 fails.

Attempted Resolutions:

  • I am able to restore traffic while the guest is running by changing the interface type to something other than NAT, and then reverting back to a NAT interface. This only leads to recreating the NAT failure again.
  • Restarting the guest also recreates the failure.
  • I tried changing the adapter driver types with no success.
  • Down grading to VirtualBox 3.1.6 r59338 is the only solution that provides continuous NAT connectivity.

Attachments

ZTM2-2011-08-02-13-56-31.log Download (93.8 KB) - added by dnahas 3 years ago.
VBox log
file.part01.exe Download (380.9 KB) - added by dnahas 3 years ago.
Network traffic pcac dump part 1
file.part02.rar Download (253.0 KB) - added by dnahas 3 years ago.
Network traffic pcac dump part 2

Change History

Changed 3 years ago by dnahas

VBox log

comment:1 follow-up: ↓ 2 Changed 3 years ago by dnahas

Tried using guest adapter 1 as NAT interface, still recreates the failure.

comment:2 in reply to: ↑ 1 ; follow-up: ↓ 4 Changed 3 years ago by Hachiman

Replying to dnahas:

Tried using guest adapter 1 as NAT interface, still recreates the failure.

In order to reproduce the issue locally, could you please give me a hint how I can provoke the problem::

  1. what guest you had installed?
  2. what is routing table on your guest?
  3. what kind of networking activity are you doing? e.g. wget  http://10.0.3.2/something_big, can be used to illustrate the problem.
  4. Is it reproducible if network adapter is e1000, or it happens with pcnet only?

comment:3 follow-up: ↓ 5 Changed 3 years ago by dnahas

  1. Guest OS Linux 2.6.24

2.The default route 10.0.2.2 and the only other entries in the routing table are the local subnets, 10.0.2.0/24 and 10.10.10.0/24 with no gateway.
a.eth0 10.0.2.102/24 – NAT interface
b.eth1 10.10.10.102/24 – Host only adapter

3.I have not found a single specific activity causes the failure. Once the failure occurs all connectivity is lost, and the NAT interface on the guest cannot ping the virtualbox NAT gateway(10.0.2.2)

The guest is a Linux based virtual appliance used for traffic management and monitoring.

  1. I have tried all of the adapter type options available in virtual box with no success. VirtualBox 3.1.6 r59338 is the newest version that functions for more than 15-20 min.

comment:4 in reply to: ↑ 2 Changed 3 years ago by dnahas

Replying to Hachiman: posted below

comment:5 in reply to: ↑ 3 Changed 3 years ago by Hachiman

Replying to dnahas:

  1. Guest OS Linux 2.6.24

Where I can download the installation image?

2.The default route 10.0.2.2 and the only other entries in the routing table are the local subnets, 10.0.2.0/24 and 10.10.10.0/24 with no gateway.
a.eth0 10.0.2.102/24 – NAT interface
b.eth1 10.10.10.102/24 – Host only adapter

Could you please add route and ifconfig output to defect?

3.I have not found a single specific activity causes the failure. Once the failure occurs all connectivity is lost, and the NAT interface on the guest cannot ping the virtualbox NAT gateway(10.0.2.2)

The guest is a Linux based virtual appliance used for traffic management and monitoring.

  1. I have tried all of the adapter type options available in virtual box with no success. VirtualBox 3.1.6 r59338 is the newest version that functions for more than 15-20 min.

Could you please collect network traffic dump from your vm interfaces till outage moment (please see Network_tips for details)?

comment:6 Changed 3 years ago by dnahas

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.2.0        *               255.255.255.0   U     0      0        0 eth0
10.10.10.0      *               255.255.255.0   U     0      0        0 eth1
default         10.0.2.2        0.0.0.0         UG    100    0        0 eth0


eth0      Link encap:Ethernet  HWaddr 08:00:27:9c:9a:dd
          inet addr:10.0.2.102  Bcast:10.0.2.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe9c:9add/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20578 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21899 errors:476 dropped:0 overruns:0 carrier:12
          collisions:0 txqueuelen:1000
          RX bytes:4000612 (3.8 MB)  TX bytes:2535106 (2.4 MB)
          Interrupt:10 Base address:0xd020

eth1      Link encap:Ethernet  HWaddr 08:00:27:a4:ef:05
          inet addr:10.10.10.102  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fea4:ef05/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27200 errors:0 dropped:0 overruns:0 frame:0
          TX packets:40175 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:9375526 (8.9 MB)  TX bytes:8685903 (8.2 MB)
          Interrupt:9 Base address:0xd060

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:66163 errors:0 dropped:0 overruns:0 frame:0
          TX packets:66163 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:19082352 (18.1 MB)  TX bytes:19082352 (18.1 MB)

comment:7 Changed 3 years ago by dnahas

DestinationGatewayGenmaskFlagsMetricRefUseIface
10.0.2.0*255.255.255.0U000eth0
10.10.10.0*255.255.255.0U000eth1
default10.0.2.20.0.0.0UG10000eth0

Changed 3 years ago by dnahas

Network traffic pcac dump part 1

Changed 3 years ago by dnahas

Network traffic pcac dump part 2

comment:8 Changed 3 years ago by dnahas

When the failure occurs, the network trace from vbox stops. A TCP dump on the guest shows some requests, and eventually arp requests for the 10.0.2.2 address The install would require developer access/sign up from the vendor. Would you like a copy of my VM?

14:58:05.361615 IP 10.0.2.102 > 10.0.2.2: ICMP echo request, id 32330, seq 4759, length 64 14:58:05.361654 IP 10.0.2.102 > 74.125.227.19: ICMP echo request, id 32330, seq 4760, length 64 14:58:05.361692 IP 10.0.2.102 > 74.125.227.18: ICMP echo request, id 32330, seq 4761, length 64

comment:9 Changed 3 years ago by Hachiman

Could you please don't use self-extracted archives and attach or send me by mail raw archive [vasily _dot_ levchenko _at_ oracle _dot_ com]? Regarding the vm, I like to look at dumps first and will continue conversation about vm later then.

comment:10 follow-up: ↓ 11 Changed 3 years ago by dnahas

pcap sent to requested email address.

comment:11 in reply to: ↑ 10 Changed 3 years ago by Hachiman

Replying to dnahas:

pcap sent to requested email address.

Unfortunately it isn't still clear what is the reason of outage. Recently we have fixed bug affecting NAT networking under some conditions. Could you please try  this build?

If it doesn't fix issue for you please don't uninstall it, we will continue investigate your case with this revision (I'll send you bits with enabled logging). Please note: that ICMP(ping) isn't reliable on Windows host, because on Windows we don't use socket like in Unix case for ICMP, instead we use ICMP API which isn't reliable itself. Other case of course is if the issue repeatable only with involving of ICMP, we should investigate exactly this case then.

So with this build please check the changes in the behavior, if it still unacceptable please let me know and I will share bits with enabled logging.

comment:12 follow-ups: ↓ 13 ↓ 16 Changed 3 years ago by dnahas

Understanding the windows host ICMP reliability, why does the problem not exist with VirtualBox 3.1.6

Installed supplied update and problem still exists, would like to enable logging.

comment:13 in reply to: ↑ 12 Changed 3 years ago by Hachiman

Replying to dnahas:

Installed supplied update and problem still exists, would like to enable logging.

Please do the following:

  1. download  VBoxDD.dll.
  2. Replace original VBoxDD.dll with downloaded one.
  3. Switch on the tracing for the guest side like you've it before.
  4. Switch on tracing on host side (probably you'll need wireshark installation).
  5. Launch VBox in following manner:
    # set VBOX_LOG=drv_nat.e.f.l2
    # set VBOX_LOG_DEST=file=c:/nat.log
    # set VBOX_LOG_FLAGS="thread time"
    # VirtualBox --startvm <vm-name>
    

when you'll get the log and pcap files please send me them via email.

comment:14 Changed 3 years ago by dnahas

log and pcap files sent via email

comment:15 Changed 3 years ago by frank

  • Description modified (diff)

comment:16 in reply to: ↑ 12 ; follow-up: ↓ 18 Changed 3 years ago by Hachiman

Replying to dnahas:

Understanding the windows host ICMP reliability, why does the problem not exist with VirtualBox 3.1.6

please understand me correctly, I don't mean that this defect is issue caused by ICMP instability on Windows, but not very reliable for network outage detection.

You've mentioned that 3.1.6 works fine for you does it mean that e.g. 3.1.8 (or 3.2.12) wasn't working for you?

comment:17 follow-up: ↓ 19 Changed 3 years ago by Hachiman

Btw, does it change anything for you if you don't ping in parallel?

comment:18 in reply to: ↑ 16 Changed 3 years ago by dnahas

Replying to Hachiman:

You've mentioned that 3.1.6 works fine for you does it mean that e.g. 3.1.8 (or 3.2.12) wasn't working for you?

That is correct, version 3.1.6 works fine, and any version newer reproduces the same failure. I have tested 3.1.8, 3.2.12, 4.0.0, 4.0.4, 4.0.6, 4.0.8, 4.0.10, 4.1.0, and now 4.1.1.

comment:19 in reply to: ↑ 17 ; follow-up: ↓ 20 Changed 3 years ago by dnahas

Replying to Hachiman:

Btw, does it change anything for you if you don't ping in parallel?

The guest is used for traffic management and monitoring. The ICMP traffic from the the guest is part of the application and can not be controlled

comment:20 in reply to: ↑ 19 ; follow-up: ↓ 21 Changed 3 years ago by Hachiman

Replying to dnahas:

Replying to Hachiman:

Btw, does it change anything for you if you don't ping in parallel?

The guest is used for traffic management and monitoring. The ICMP traffic from the the guest is part of the application and can not be controlled

Aha, could you please try the build  http://www.virtualbox.org/download/testcase/VirtualBox-2011-08-10-16-55-56-win-rel-4.1.1-r73438-MultiArch.exe ? I've investigate the issue and found the bug in VirtualBox code processing ICMP packets, that lead to outage locally.

comment:21 in reply to: ↑ 20 Changed 3 years ago by dnahas

Replying to Hachiman:

Aha, could you please try the build  http://www.virtualbox.org/download/testcase/VirtualBox-2011-08-10-16-55-56-win-rel-4.1.1-r73438-MultiArch.exe ? I've investigate the issue and found the bug in VirtualBox code processing ICMP packets, that lead to outage locally.

Installed 4.1.1 r73438, problem still occurs. Email sent with nictrace pcap from guest.

comment:22 follow-up: ↓ 23 Changed 3 years ago by Hachiman

Thanks for dump, I'll send you bits with enabled logging.

comment:23 in reply to: ↑ 22 ; follow-up: ↓ 24 Changed 3 years ago by Hachiman

Replying to Hachiman:

Thanks for dump, I'll send you bits with enabled logging.

I've uploaded new DSO  VBoxDD.dll with enabled logging. Could you please repeat steps you've done before?

comment:24 in reply to: ↑ 23 Changed 3 years ago by dnahas

Replying to Hachiman:

I've uploaded new DSO  VBoxDD.dll with enabled logging. Could you please repeat steps you've done before?

emailed requested pcaps and logs

comment:25 follow-up: ↓ 26 Changed 3 years ago by eharmic

Hi,

Just to let you know, I am experiencing problems almost identical to those reported by dnahas.

My configuration is also similar:

  • Host OS: Windows Vista
  • Vbox Version: I was using 4.0.12, just upgraded to 4.1 today
  • Guest OS: Linux 2.6.18 (Centos 5.6)

Network setup is almost identical:

  • Interface 1 is NAT
  • Interface 2 is host-only network
  • Default route is via Interface 1
  • Both interfaces using the Intel PRO/1000 MT driver

Let me know if I can provide any info to help troubleshoot it.

comment:26 in reply to: ↑ 25 Changed 3 years ago by Hachiman

Replying to eharmic:

Hi,

Just to let you know, I am experiencing problems almost identical to those reported by dnahas.

Could you please try the build  http://www.virtualbox.org/download/testcase/VirtualBox-2011-08-10-16-55-56-win-rel-4.1.1-r73438-MultiArch.exe ?

comment:27 Changed 3 years ago by eharmic

Hi,

I installed the 4.1.1-r73438 build. The good news is that so far the NAT adaptor has not failed (although I have not used it long enough to be certain yet - for me the appearance of the fault is random and sometimes the machine has to be up for a number of hours before it happens).

The bad news is that this build seems to have problems with the host-only adaptor. After upgrading to the 4.1.1 build, initially it worked fine. Then after stopping the VM, exiting Virtualbox, and then later starting it up again, I get no traffic through at all on the host-only adaptor.

I can ping the local interface in the guest machine (192.168.56.101) but NOT the host's interface (192.168.56.1).

In the host machine I can ping neither the guest nor the host address.

I have tried:

  1. ifdown / ifup in the guest. It gets an address OK but no traffic gets through.
  2. poweroff / poweron the guest
  3. Disable / Re-enable the interface in the host
  4. Revert to 4.0.12

Only the last step worked.

I suspect it is something to do with the windows virtual device driver, because normally I can ping that even when no VM is running.

I guess this is probably a separate problem? But anyway it is blocking me from trying your 4.1.1 solution out properly.

Mike

comment:28 follow-up: ↓ 29 Changed 3 years ago by eharmic

Hi,

I was able to sort out the problem that I reported yesterday - it seems when I upgraded, the IP address of the host-only adaptor was changed from 192.168.56.1 to another seemingly random address! After manually changing it back I could continue.

I used your 4.1.1-r73438 build today and the NAT fault recurred after the VM had been up for 3 hours 53 seconds.

I noticed a pattern. I left a script running that pinged an outside node every 60 secs and logged the result in a file. Under normal usage I don't use the NAT interface all that regularly. In this case it had been some hours where only the ping was ongoing. Then, when I went to make an sftp connection out (ie a TCP connection) it suddenly stopped. I saw that pattern twice.

Regards Mike

comment:29 in reply to: ↑ 28 ; follow-up: ↓ 31 Changed 3 years ago by Hachiman

Replying to eharmic:

Hi,

I was able to sort out the problem that I reported yesterday - it seems when I upgraded, the IP address of the host-only adaptor was changed from 192.168.56.1 to another seemingly random address! After manually changing it back I could continue.

I used your 4.1.1-r73438 build today and the NAT fault recurred after the VM had been up for 3 hours 53 seconds.

Does it depend if Host Only attachment present or not. i.e. if you have only NAT adapter?

I noticed a pattern. I left a script running that pinged an outside node every 60 secs and logged the result in a file. Under normal usage I don't use the NAT interface all that regularly. In this case it had been some hours where only the ping was ongoing. Then, when I went to make an sftp connection out (ie a TCP connection) it suddenly stopped. I saw that pattern twice.

Thanks for description will try to reproduce it here.

Regards Mike

comment:30 follow-up: ↓ 32 Changed 3 years ago by dnahas

??

comment:31 in reply to: ↑ 29 Changed 3 years ago by eharmic

Hi,

Does it depend if Host Only attachment present or not. e.g. if you have

only 1 adapter?

I will try that out over the weekend and let you know.

Also FYI, I upgraded to 4.1.2 and still have the same problem.

Mike

comment:32 in reply to: ↑ 30 Changed 3 years ago by Hachiman

Replying to dnahas:

??

Sorry, that haven't informed you earlier I've been able reproduce the issue in my local environment with vm you've uploaded for me.

comment:33 Changed 3 years ago by dnahas

Additional testing with VirtualBox 3.1.6 r59338 does reproduce the issue.

comment:34 follow-up: ↓ 35 Changed 3 years ago by dnahas

Over a month and no updates?

comment:35 in reply to: ↑ 34 Changed 3 years ago by Hachiman

Replying to dnahas:

Over a month and no updates?

Sorry, I've just returned from my vacation, that is a reason for lack any updates on this defect.

comment:36 follow-up: ↓ 37 Changed 3 years ago by Hachiman

Could you please verify the fix with build of  VBoxDD.dll 4.1.4, note this DSO built against VBox 4.1.4?

comment:37 in reply to: ↑ 36 Changed 3 years ago by dnahas

Replying to Hachiman:

Could you please verify the fix with build of  VBoxDD.dll 4.1.4, note this DSO built against VBox 4.1.4?

installed 4.1.4 downloaded the provided VBoxDD.dll_9371_r74291 renamed to VBoxDD.dll and copied to C:\Program Files\Oracle\VirtualBox\ reboot Ran the following commands set VBOX_LOG=drv_nat.e.f.l2 set VBOX_LOG_DEST=file=c:/nat.log set VBOX_LOG_FLAGS="thread time" VirtualBox --startvm <vm-name>

The problem is the c:\nat.log file does not get created when I launch the VM.

comment:38 follow-up: ↓ 39 Changed 3 years ago by dnahas

Sorry that did not format nice.

installed 4.1.4
downloaded the provided VBoxDD.dll_9371_r74291
renamed to VBoxDD.dll and copied to C:\Program Files\Oracle\VirtualBox\
Reboot
set VBOX_LOG=drv_nat.e.f.l2
set VBOX_LOG_DEST=file=c:/nat.log[[BR]] set VBOX_LOG_FLAGS="thread time"
VirtualBox --startvm <vm-name>

The problem is the c:\nat.log file does not get created when I launch the VM.

comment:39 in reply to: ↑ 38 ; follow-up: ↓ 40 Changed 3 years ago by Hachiman

Replying to dnahas: Does it change anything for you? Here Zeus TM 7.4 works without issues for 4 days on Win XP host.

comment:40 in reply to: ↑ 39 ; follow-up: ↓ 41 Changed 3 years ago by dnahas

Replying to Hachiman:

Replying to dnahas: Does it change anything for you? Here Zeus TM 7.4 works without issues for 4 days on Win XP host.

Yes, so far I am 12 hours in testing with version 4.1.4 and the NAT interface is properly functioning.

I did not see anything addressed in the change log. What is the fix?

comment:41 in reply to: ↑ 40 Changed 3 years ago by Hachiman

Replying to dnahas:

Replying to Hachiman:

Replying to dnahas: Does it change anything for you? Here Zeus TM 7.4 works without issues for 4 days on Win XP host.

Yes, so far I am 12 hours in testing with version 4.1.4 and the NAT interface is properly functioning.

I did not see anything addressed in the change log. What is the fix?

The fix is on trunk but not on the branch (that why it isn't mentioned in Changelog), I've built 4.1.4 !VBoxDD DSO with changeset from trunk. NAT stores icmp requests in the cache in mbufs representation, ICMP API doesn't reliable provide ICMP replies, that cause exceeding of mbuf limit and outage of all networking, the fix built on assumption that, we don't expect any responses on old cached packets, so we don't let icmp cache grow to much freeing old mbufs.

comment:42 follow-up: ↓ 43 Changed 3 years ago by Hachiman

I've backported changes to 4.1 branch. Could you please verify  4.1 build?

comment:43 in reply to: ↑ 42 ; follow-up: ↓ 44 Changed 3 years ago by eharmic

Replying to Hachiman:

I've backported changes to 4.1 branch. Could you please verify  4.1 build?

Hi,

I installed it yesterday. So far no problems.

Regards Mike

comment:44 in reply to: ↑ 43 Changed 3 years ago by Hachiman

Replying to eharmic:

Replying to Hachiman:

I've backported changes to 4.1 branch. Could you please verify  4.1 build?

Hi,

I installed it yesterday. So far no problems.

Regards Mike

Thanks for feedback.

comment:45 Changed 3 years ago by Hachiman

  • Summary changed from NAT Interface fails to NAT Interface fails -> fixed in svn

comment:46 Changed 2 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fix is part of VBox 4.1.6.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use