Opened 13 years ago
Closed 13 years ago
#9371 closed defect (fixed)
NAT Interface fails -> fixed in svn
Reported by: | david | Owned by: | |
---|---|---|---|
Component: | network/NAT | Version: | VirtualBox 4.1.0 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Windows |
Description (last modified by )
I have searched bug tracker and found a few with similar symptoms, but resolved bug id's. For my situation, something must have changed after 3.16(the latest functioning version). Before I submit a new Bug I figured I would post here
Environment:
- VirtualBox 4.1.0 r73009 win.x86
- Host OS Windows 7 Release: 6.1.7600
- Guest OS Linux 2.6.24
Problem:
- All traffic from the guest OS(10.0.3.101) to the VirtualBox NAT interface (10.0.3.2) fails after a duration of time.
- Initially all traffic functions correctly, after approximately 15-20 min the guest OS is unable to contact the VirtualBox NAT interface. No ping to VirtualBox or anything else on the NAT interface of the guest. The guest is configured for two adapters. adapter 1 is a Host only network that continues to function normally when adapter 2 fails.
Attempted Resolutions:
- I am able to restore traffic while the guest is running by changing the interface type to something other than NAT, and then reverting back to a NAT interface. This only leads to recreating the NAT failure again.
- Restarting the guest also recreates the failure.
- I tried changing the adapter driver types with no success.
- Down grading to VirtualBox 3.1.6 r59338 is the only solution that provides continuous NAT connectivity.
Attachments (3)
Change History (49)
by , 13 years ago
Attachment: | ZTM2-2011-08-02-13-56-31.log added |
---|
follow-up: 2 comment:1 by , 13 years ago
Tried using guest adapter 1 as NAT interface, still recreates the failure.
follow-up: 4 comment:2 by , 13 years ago
Replying to dnahas:
Tried using guest adapter 1 as NAT interface, still recreates the failure.
In order to reproduce the issue locally, could you please give me a hint how I can provoke the problem::
- what guest you had installed?
- what is routing table on your guest?
- what kind of networking activity are you doing? e.g. wget http://10.0.3.2/something_big, can be used to illustrate the problem.
- Is it reproducible if network adapter is e1000, or it happens with pcnet only?
follow-up: 5 comment:3 by , 13 years ago
- Guest OS Linux 2.6.24
2.The default route 10.0.2.2 and the only other entries in the routing table are the local subnets, 10.0.2.0/24 and 10.10.10.0/24 with no gateway.
a.eth0 10.0.2.102/24 – NAT interface
b.eth1 10.10.10.102/24 – Host only adapter
3.I have not found a single specific activity causes the failure. Once the failure occurs all connectivity is lost, and the NAT interface on the guest cannot ping the virtualbox NAT gateway(10.0.2.2)
The guest is a Linux based virtual appliance used for traffic management and monitoring.
- I have tried all of the adapter type options available in virtual box with no success. VirtualBox 3.1.6 r59338 is the newest version that functions for more than 15-20 min.
comment:5 by , 13 years ago
Replying to dnahas:
- Guest OS Linux 2.6.24
Where I can download the installation image?
2.The default route 10.0.2.2 and the only other entries in the routing table are the local subnets, 10.0.2.0/24 and 10.10.10.0/24 with no gateway.
a.eth0 10.0.2.102/24 – NAT interface
b.eth1 10.10.10.102/24 – Host only adapter
Could you please add route and ifconfig output to defect?
3.I have not found a single specific activity causes the failure. Once the failure occurs all connectivity is lost, and the NAT interface on the guest cannot ping the virtualbox NAT gateway(10.0.2.2)
The guest is a Linux based virtual appliance used for traffic management and monitoring.
- I have tried all of the adapter type options available in virtual box with no success. VirtualBox 3.1.6 r59338 is the newest version that functions for more than 15-20 min.
Could you please collect network traffic dump from your vm interfaces till outage moment (please see Network_tips for details)?
comment:6 by , 13 years ago
Destination Gateway Genmask Flags Metric Ref Use Iface 10.0.2.0 * 255.255.255.0 U 0 0 0 eth0 10.10.10.0 * 255.255.255.0 U 0 0 0 eth1 default 10.0.2.2 0.0.0.0 UG 100 0 0 eth0 eth0 Link encap:Ethernet HWaddr 08:00:27:9c:9a:dd inet addr:10.0.2.102 Bcast:10.0.2.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe9c:9add/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:20578 errors:0 dropped:0 overruns:0 frame:0 TX packets:21899 errors:476 dropped:0 overruns:0 carrier:12 collisions:0 txqueuelen:1000 RX bytes:4000612 (3.8 MB) TX bytes:2535106 (2.4 MB) Interrupt:10 Base address:0xd020 eth1 Link encap:Ethernet HWaddr 08:00:27:a4:ef:05 inet addr:10.10.10.102 Bcast:10.10.10.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fea4:ef05/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:27200 errors:0 dropped:0 overruns:0 frame:0 TX packets:40175 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9375526 (8.9 MB) TX bytes:8685903 (8.2 MB) Interrupt:9 Base address:0xd060 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:66163 errors:0 dropped:0 overruns:0 frame:0 TX packets:66163 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:19082352 (18.1 MB) TX bytes:19082352 (18.1 MB)
comment:7 by , 13 years ago
Destination | Gateway | Genmask | Flags | Metric | Ref | Use | Iface |
10.0.2.0 | * | 255.255.255.0 | U | 0 | 0 | 0 | eth0 |
10.10.10.0 | * | 255.255.255.0 | U | 0 | 0 | 0 | eth1 |
default | 10.0.2.2 | 0.0.0.0 | UG | 100 | 0 | 0 | eth0 |
comment:8 by , 13 years ago
When the failure occurs, the network trace from vbox stops. A TCP dump on the guest shows some requests, and eventually arp requests for the 10.0.2.2 address The install would require developer access/sign up from the vendor. Would you like a copy of my VM?
14:58:05.361615 IP 10.0.2.102 > 10.0.2.2: ICMP echo request, id 32330, seq 4759, length 64 14:58:05.361654 IP 10.0.2.102 > 74.125.227.19: ICMP echo request, id 32330, seq 4760, length 64 14:58:05.361692 IP 10.0.2.102 > 74.125.227.18: ICMP echo request, id 32330, seq 4761, length 64
comment:9 by , 13 years ago
Could you please don't use self-extracted archives and attach or send me by mail raw archive [vasily _dot_ levchenko _at_ oracle _dot_ com]? Regarding the vm, I like to look at dumps first and will continue conversation about vm later then.
comment:11 by , 13 years ago
Replying to dnahas:
pcap sent to requested email address.
Unfortunately it isn't still clear what is the reason of outage. Recently we have fixed bug affecting NAT networking under some conditions. Could you please try this build?
If it doesn't fix issue for you please don't uninstall it, we will continue investigate your case with this revision (I'll send you bits with enabled logging). Please note: that ICMP(ping) isn't reliable on Windows host, because on Windows we don't use socket like in Unix case for ICMP, instead we use ICMP API which isn't reliable itself. Other case of course is if the issue repeatable only with involving of ICMP, we should investigate exactly this case then.
So with this build please check the changes in the behavior, if it still unacceptable please let me know and I will share bits with enabled logging.
follow-ups: 13 16 comment:12 by , 13 years ago
Understanding the windows host ICMP reliability, why does the problem not exist with VirtualBox 3.1.6
Installed supplied update and problem still exists, would like to enable logging.
comment:13 by , 13 years ago
Replying to dnahas:
Installed supplied update and problem still exists, would like to enable logging.
Please do the following:
- download VBoxDD.dll.
- Replace original VBoxDD.dll with downloaded one.
- Switch on the tracing for the guest side like you've it before.
- Switch on tracing on host side (probably you'll need wireshark installation).
- Launch VBox in following manner:
# set VBOX_LOG=drv_nat.e.f.l2 # set VBOX_LOG_DEST=file=c:/nat.log # set VBOX_LOG_FLAGS="thread time" # VirtualBox --startvm <vm-name>
when you'll get the log and pcap files please send me them via email.
comment:15 by , 13 years ago
Description: | modified (diff) |
---|
follow-up: 18 comment:16 by , 13 years ago
Replying to dnahas:
Understanding the windows host ICMP reliability, why does the problem not exist with VirtualBox 3.1.6
please understand me correctly, I don't mean that this defect is issue caused by ICMP instability on Windows, but not very reliable for network outage detection.
You've mentioned that 3.1.6 works fine for you does it mean that e.g. 3.1.8 (or 3.2.12) wasn't working for you?
follow-up: 19 comment:17 by , 13 years ago
Btw, does it change anything for you if you don't ping in parallel?
comment:18 by , 13 years ago
Replying to Hachiman:
You've mentioned that 3.1.6 works fine for you does it mean that e.g. 3.1.8 (or 3.2.12) wasn't working for you?
That is correct, version 3.1.6 works fine, and any version newer reproduces the same failure. I have tested 3.1.8, 3.2.12, 4.0.0, 4.0.4, 4.0.6, 4.0.8, 4.0.10, 4.1.0, and now 4.1.1.
follow-up: 20 comment:19 by , 13 years ago
Replying to Hachiman:
Btw, does it change anything for you if you don't ping in parallel?
The guest is used for traffic management and monitoring. The ICMP traffic from the the guest is part of the application and can not be controlled
follow-up: 21 comment:20 by , 13 years ago
Replying to dnahas:
Replying to Hachiman:
Btw, does it change anything for you if you don't ping in parallel?
The guest is used for traffic management and monitoring. The ICMP traffic from the the guest is part of the application and can not be controlled
Aha, could you please try the build http://www.virtualbox.org/download/testcase/VirtualBox-2011-08-10-16-55-56-win-rel-4.1.1-r73438-MultiArch.exe ? I've investigate the issue and found the bug in VirtualBox code processing ICMP packets, that lead to outage locally.
comment:21 by , 13 years ago
Replying to Hachiman:
Aha, could you please try the build http://www.virtualbox.org/download/testcase/VirtualBox-2011-08-10-16-55-56-win-rel-4.1.1-r73438-MultiArch.exe ? I've investigate the issue and found the bug in VirtualBox code processing ICMP packets, that lead to outage locally.
Installed 4.1.1 r73438, problem still occurs. Email sent with nictrace pcap from guest.
follow-up: 23 comment:22 by , 13 years ago
Thanks for dump, I'll send you bits with enabled logging.
follow-up: 24 comment:23 by , 13 years ago
Replying to Hachiman:
Thanks for dump, I'll send you bits with enabled logging.
I've uploaded new DSO VBoxDD.dll with enabled logging. Could you please repeat steps you've done before?
comment:24 by , 13 years ago
Replying to Hachiman:
I've uploaded new DSO VBoxDD.dll with enabled logging. Could you please repeat steps you've done before?
emailed requested pcaps and logs
follow-up: 26 comment:25 by , 13 years ago
Hi,
Just to let you know, I am experiencing problems almost identical to those reported by dnahas.
My configuration is also similar:
- Host OS: Windows Vista
- Vbox Version: I was using 4.0.12, just upgraded to 4.1 today
- Guest OS: Linux 2.6.18 (Centos 5.6)
Network setup is almost identical:
- Interface 1 is NAT
- Interface 2 is host-only network
- Default route is via Interface 1
- Both interfaces using the Intel PRO/1000 MT driver
Let me know if I can provide any info to help troubleshoot it.
comment:26 by , 13 years ago
Replying to eharmic:
Hi,
Just to let you know, I am experiencing problems almost identical to those reported by dnahas.
Could you please try the build http://www.virtualbox.org/download/testcase/VirtualBox-2011-08-10-16-55-56-win-rel-4.1.1-r73438-MultiArch.exe ?
comment:27 by , 13 years ago
Hi,
I installed the 4.1.1-r73438 build. The good news is that so far the NAT adaptor has not failed (although I have not used it long enough to be certain yet - for me the appearance of the fault is random and sometimes the machine has to be up for a number of hours before it happens).
The bad news is that this build seems to have problems with the host-only adaptor. After upgrading to the 4.1.1 build, initially it worked fine. Then after stopping the VM, exiting Virtualbox, and then later starting it up again, I get no traffic through at all on the host-only adaptor.
I can ping the local interface in the guest machine (192.168.56.101) but NOT the host's interface (192.168.56.1).
In the host machine I can ping neither the guest nor the host address.
I have tried:
- ifdown / ifup in the guest. It gets an address OK but no traffic gets through.
- poweroff / poweron the guest
- Disable / Re-enable the interface in the host
- Revert to 4.0.12
Only the last step worked.
I suspect it is something to do with the windows virtual device driver, because normally I can ping that even when no VM is running.
I guess this is probably a separate problem? But anyway it is blocking me from trying your 4.1.1 solution out properly.
Mike
follow-up: 29 comment:28 by , 13 years ago
Hi,
I was able to sort out the problem that I reported yesterday - it seems when I upgraded, the IP address of the host-only adaptor was changed from 192.168.56.1 to another seemingly random address! After manually changing it back I could continue.
I used your 4.1.1-r73438 build today and the NAT fault recurred after the VM had been up for 3 hours 53 seconds.
I noticed a pattern. I left a script running that pinged an outside node every 60 secs and logged the result in a file. Under normal usage I don't use the NAT interface all that regularly. In this case it had been some hours where only the ping was ongoing. Then, when I went to make an sftp connection out (ie a TCP connection) it suddenly stopped. I saw that pattern twice.
Regards Mike
follow-up: 31 comment:29 by , 13 years ago
Replying to eharmic:
Hi,
I was able to sort out the problem that I reported yesterday - it seems when I upgraded, the IP address of the host-only adaptor was changed from 192.168.56.1 to another seemingly random address! After manually changing it back I could continue.
I used your 4.1.1-r73438 build today and the NAT fault recurred after the VM had been up for 3 hours 53 seconds.
Does it depend if Host Only attachment present or not. i.e. if you have only NAT adapter?
I noticed a pattern. I left a script running that pinged an outside node every 60 secs and logged the result in a file. Under normal usage I don't use the NAT interface all that regularly. In this case it had been some hours where only the ping was ongoing. Then, when I went to make an sftp connection out (ie a TCP connection) it suddenly stopped. I saw that pattern twice.
Thanks for description will try to reproduce it here.
Regards Mike
comment:31 by , 13 years ago
Hi,
Does it depend if Host Only attachment present or not. e.g. if you have
only 1 adapter?
I will try that out over the weekend and let you know.
Also FYI, I upgraded to 4.1.2 and still have the same problem.
Mike
comment:32 by , 13 years ago
Replying to dnahas:
??
Sorry, that haven't informed you earlier I've been able reproduce the issue in my local environment with vm you've uploaded for me.
comment:33 by , 13 years ago
Additional testing with VirtualBox 3.1.6 r59338 does reproduce the issue.
comment:35 by , 13 years ago
Replying to dnahas:
Over a month and no updates?
Sorry, I've just returned from my vacation, that is a reason for lack any updates on this defect.
follow-up: 37 comment:36 by , 13 years ago
Could you please verify the fix with build of VBoxDD.dll 4.1.4, note this DSO built against VBox 4.1.4?
comment:37 by , 13 years ago
Replying to Hachiman:
Could you please verify the fix with build of VBoxDD.dll 4.1.4, note this DSO built against VBox 4.1.4?
installed 4.1.4 downloaded the provided VBoxDD.dll_9371_r74291 renamed to VBoxDD.dll and copied to C:\Program Files\Oracle\VirtualBox\ reboot Ran the following commands set VBOX_LOG=drv_nat.e.f.l2 set VBOX_LOG_DEST=file=c:/nat.log set VBOX_LOG_FLAGS="thread time" VirtualBox --startvm <vm-name>
The problem is the c:\nat.log file does not get created when I launch the VM.
follow-up: 39 comment:38 by , 13 years ago
Sorry that did not format nice.
installed 4.1.4
downloaded the provided VBoxDD.dll_9371_r74291
renamed to VBoxDD.dll and copied to C:\Program Files\Oracle\VirtualBox\
Reboot
set VBOX_LOG=drv_nat.e.f.l2
set VBOX_LOG_DEST=file=c:/nat.log[[BR]]
set VBOX_LOG_FLAGS="thread time"
VirtualBox --startvm <vm-name>
The problem is the c:\nat.log file does not get created when I launch the VM.
follow-up: 40 comment:39 by , 13 years ago
Replying to dnahas: Does it change anything for you? Here Zeus TM 7.4 works without issues for 4 days on Win XP host.
follow-up: 41 comment:40 by , 13 years ago
Replying to Hachiman:
Replying to dnahas: Does it change anything for you? Here Zeus TM 7.4 works without issues for 4 days on Win XP host.
Yes, so far I am 12 hours in testing with version 4.1.4 and the NAT interface is properly functioning.
I did not see anything addressed in the change log. What is the fix?
comment:41 by , 13 years ago
Replying to dnahas:
Replying to Hachiman:
Replying to dnahas: Does it change anything for you? Here Zeus TM 7.4 works without issues for 4 days on Win XP host.
Yes, so far I am 12 hours in testing with version 4.1.4 and the NAT interface is properly functioning.
I did not see anything addressed in the change log. What is the fix?
The fix is on trunk but not on the branch (that why it isn't mentioned in Changelog), I've built 4.1.4 !VBoxDD DSO with changeset from trunk. NAT stores icmp requests in the cache in mbufs representation, ICMP API doesn't reliable provide ICMP replies, that cause exceeding of mbuf limit and outage of all networking, the fix built on assumption that, we don't expect any responses on old cached packets, so we don't let icmp cache grow to much freeing old mbufs.
follow-up: 43 comment:42 by , 13 years ago
I've backported changes to 4.1 branch. Could you please verify 4.1 build?
follow-up: 44 comment:43 by , 13 years ago
Replying to Hachiman:
I've backported changes to 4.1 branch. Could you please verify 4.1 build?
Hi,
I installed it yesterday. So far no problems.
Regards Mike
comment:44 by , 13 years ago
Replying to eharmic:
Replying to Hachiman:
I've backported changes to 4.1 branch. Could you please verify 4.1 build?
Hi,
I installed it yesterday. So far no problems.
Regards Mike
Thanks for feedback.
comment:45 by , 13 years ago
Summary: | NAT Interface fails → NAT Interface fails -> fixed in svn |
---|
VBox log