Opened 14 years ago
Last modified 10 years ago
#7459 closed defect
NAT networking drops out after a few days' uptime -> fixed in svn. — at Version 29
Reported by: | dimitris | Owned by: | |
---|---|---|---|
Component: | network/NAT | Version: | VirtualBox 4.0.2 |
Keywords: | NAT DHCP | Cc: | |
Guest type: | Linux | Host type: | Linux |
Description (last modified by )
Both host and guest are Ubuntu Lucid servers, guest additions are up-to-date, guest uses a virtio adapter with NAT networking.
2-3 days after starting the guest, it loses connectivity to the outside world. These are the last few entries in its log:
31:39:39.715 NAT: adding domain name lan to search list 31:39:39.715 NAT: DHCP offered IP address 10.0.2.15 42:40:55.746 NAT: adding domain name lan to search list 42:40:55.746 NAT: DHCP offered IP address 10.0.2.15 51:08:57.554 NAT: error occurred while sending ICMP error message
The guest is running Logitech's Squeezebox Server, a perl-based music server. Ports for that have been forwarded (see machine XML) and traffic flows normally before the machine drops out.
log and machine XML to be attached.
Change History (39)
by , 14 years ago
Attachment: | squeezemachine.xml added |
---|
by , 14 years ago
by , 14 years ago
Attachment: | nodhcp.png added |
---|
comment:1 by , 14 years ago
VBoxManage controlvm <machine> reset
doesn't bring DHCP back. I have to power the machine off then back on to get DHCP working.
by , 14 years ago
Attachment: | VBox.2.log added |
---|
log including the loss of connectivity and a guest reboot
comment:2 by , 14 years ago
Just happened again. Attached guest screenshots of its ifconfig and syslog after the guest lost connectivity but before I rebooted it (from within the guest itself through the terminal server).
After the guest is rebooted, the DHCP is still not functioning, same as shown in the previously attached screenshot (nodhcp.png).
Also attaching a log that covers the guest's reboot.
comment:4 by , 14 years ago
Switched to
00:11.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 02)
Let's see how it does over the next few days.
comment:5 by , 14 years ago
Same behavior with Intel virtual network adapter, see latest attachments:
51:09:51.358 NAT: error occurred while sending ICMP error message
comment:8 by , 13 years ago
Version: | VirtualBox 3.2.8 → VirtualBox 3.2.10 |
---|
comment:9 by , 13 years ago
DHCP client, SSH server, the perl-based Squeezebox Server and its associated MySql server:
Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:9092 0.0.0.0:* LISTEN 744/mysqld tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 565/perl tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 551/mysqld tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 648/sshd tcp 0 0 0.0.0.0:3483 0.0.0.0:* LISTEN 565/perl tcp 0 0 0.0.0.0:9090 0.0.0.0:* LISTEN 565/perl udp 0 0 0.0.0.0:3483 0.0.0.0:* 565/perl udp 0 0 0.0.0.0:50609 0.0.0.0:* 565/perl udp 0 0 0.0.0.0:68 0.0.0.0:* 586/dhclient3
comment:10 by , 13 years ago
Here are the active connections. The one on local:3483 is the hardware player, and normally that's the only active TCP connection on this guest. Traffic is minimal when there's no music streaming (clock update on the player's idle screen is driven by the server). Streaming is a typical MP3 or Ogg stream, should be less than 200kbps.
Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.0.2.15:22 10.0.2.2:47061 ESTABLISHED tcp 0 0 10.0.2.15:3483 10.0.2.2:4718 ESTABLISHED
comment:11 by , 13 years ago
Here I've installed the similar vm, let see if I'll able to reproduce the issue.
comment:12 by , 13 years ago
I have the same problem but on a OS running Windows 7 and the guest is XP.
Takes about 4 hours , log says the same in the end "NAT: error occurred while sending ICMP error message".
Have to shutdown the Guest , reset wont bring the network back.
comment:15 by , 13 years ago
Version: | VirtualBox 3.2.10 → VirtualBox 3.2.12 |
---|
follow-up: 17 comment:16 by , 13 years ago
Any luck reproducing this? If not I could probably FTP the machine (XML and disk image) if that would help.
comment:17 by , 13 years ago
Replying to dimitris:
Any luck reproducing this? If not I could probably FTP the machine (XML and disk image) if that would help.
Instructions are send via mail.
comment:18 by , 13 years ago
Almost on the third day I've got effect you're describing. I'll restart the test with more information collecting to get more information what, could lead to this effect. Also I'll try to minimize tests to provoke the effect in more short period.
comment:19 by , 13 years ago
Debug build, which I've lunched about week ago, has met other unrelated issue, causing me recompile/restart build.
comment:20 by , 13 years ago
yesterday debug build enters the state you'd described, finally. Looks like drops are caused with huge amount requests to UDP.IP.dst:63.17.46.179 and ARP.TIP:63.17.46.179 from NAT service side to the guest. in order to debug the issue I'm trying to create testcase which will provoke similar issue in smaller timeframe.
comment:21 by , 13 years ago
Version: | VirtualBox 3.2.12 → VirtualBox 4.0.2 |
---|
follow-up: 23 comment:22 by , 13 years ago
Also upgraded to 4.0.2 here, problem persists, with this additional info now on the logs:
... 09:03:50.329 NAT: DHCP offered IP address 10.0.2.15 19:34:59.297 NAT: DHCP offered IP address 10.0.2.15 30:17:58.870 NAT: DHCP offered IP address 10.0.2.15 41:06:21.336 NAT: DHCP offered IP address 10.0.2.15 51:06:45.355 NAT: error occurred while sending ICMP error message 51:06:45.416 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:06:45.618 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:06:46.028 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:06:46.842 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:06:48.475 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:06:51.738 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:06:58.266 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:07:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:07:11.323 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY 51:07:18.234 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:07:19.235 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:07:20.234 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:07:37.438 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:07:38.439 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:07:39.440 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:08:03.550 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:08:04.355 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:08:04.550 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:08:05.550 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY 51:09:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:10:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:11:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:12:04.355 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:13:04.356 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:14:04.356 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY 51:14:26.453 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY ...
Over here though I don't see any UDP traffic, albeit on a relatively short packet capture. What I do see is the TCP traffic between the Squeezebox Server and the hardware player, e.g.:
11:45:35.855991 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 167190:168480, ack 3173, win 5840, length 1290 11:45:35.857154 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 168480, win 65535, length 0 11:45:36.854949 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 168480:169770, ack 3173, win 5840, length 1290 11:45:36.856447 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 169770, win 65535, length 0 11:45:37.855543 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 169770:171060, ack 3173, win 5840, length 1290 11:45:37.856682 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 171060, win 65535, length 0 11:45:38.001067 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 171060:171090, ack 3173, win 5840, length 30 11:45:38.002478 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 171090, win 65535, length 0 11:45:38.004206 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [P.], seq 3173:3234, ack 171090, win 65535, length 61 11:45:38.004269 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [P.], seq 3234:3295, ack 171090, win 65535, length 61 11:45:38.006554 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [.], ack 3295, win 5840, length 0 11:45:38.854897 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 171090:172380, ack 3295, win 5840, length 1290 11:45:38.855901 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 172380, win 65535, length 0 11:45:39.854914 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 172380:173670, ack 3295, win 5840, length 1290 11:45:39.855924 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 173670, win 65535, length 0
The above is guest-side: The guest's eth0 is at 10.0.2.15, and the hardware player, on the other side of the VBox NAT, is mapped here as 10.0.2.2.
The traffic looks the same on the host side, just with different addresses. Which UDP port(s) do you see traffic on?
comment:23 by , 13 years ago
Replying to dimitris:
Also upgraded to 4.0.2 here, problem persists, with this additional info now on the logs:
The traffic looks the same on the host side, just with different addresses. Which UDP port(s) do you see traffic on?
it's a lot of udp broadcasts to udp/3483. I don't touch web interface a lot so i have very rare tcp entries in dump. the major part is UDP broadcasts.
follow-up: 25 comment:24 by , 13 years ago
OK, on a longer capture I see the UDP traffic too:
11:11:39.001496 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 44) 10.0.2.15.3483 > 255.255.255.255.3483: UDP, length 16
It's once a minute, and IIRC it's the server process advertising itself on the network so that players can discover it.
comment:25 by , 13 years ago
Replying to dimitris:
It's once a minute, and IIRC it's the server process advertising itself on the network so that players can discover it.
Here, I've made tool broadcasting udp, that cause similar effect. But I don't sure that it's exactly or single the problem. I've fixed the issue and now I'm running the build to verify.
comment:26 by , 13 years ago
Summary: | NAT networking drops out after a few days' uptime → NAT networking drops out after a few days' uptime -> fixed in svn. |
---|
follow-up: 29 comment:28 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I've been seeing this issue on and off for a while now and it is still in the 4.1.14.
With the help of a UDP packet blaster and the PCNetIII adapter I was able to reproduce this problem every single time.
The steps:
- Create a VM with a Natted interface, use PCNetIII
- Start a UDP packet blast with size 1400, the packets must go from the VM and get routed/natted out to the physical NIC.
- After ~5 minutes, the PCNet adapter will bounce the link
- Shortly after that, the NAT interface seems to be trying to send a DHCP ACK back to the PCNetIII adapter
- Finally, the error "error occurred while sending ICMP error message" shows up and no packets can be passed after that. You wont even be able to ping internal ip of the NAT interface (10.0.2.2)
Notes:
- I also tried the same steps with the latest SVN and the problem is still there.
- The intel interfaces do not have the same issue because they do not bounce the link of the NIC card.
- See attached log
- The same bug will happen in other cases but this is the most reliable way so far.
Please let me know if you'd like to test a patch, I can build my own vbox and test it.
by , 12 years ago
Attachment: | ubuntu-2012-05-21-14-37-24.log added |
---|
vboxlog using vbox 4.1.14 on windows 7 32bit with ubuntu 32bit guest
comment:29 by , 12 years ago
Description: | modified (diff) |
---|
Replying to rbhkamal:
I've been seeing this issue on and off for a while now and it is still in the 4.1.14.
With the help of a UDP packet blaster and the PCNetIII adapter I was able to reproduce this problem every single time.
Could you please clarify what are you doing/or what are you meaning by "UDP packet blast"?
Please let me know if you'd like to test a patch, I can build my own vbox and test it.
Thanks, for cooperation, I'll check what disabled experimental code, could possibly solve your issue, but first I'd like to understand what issue have you met.
terminal server view after controlvm reset