VirtualBox

Opened 14 years ago

Last modified 10 years ago

#7459 closed defect

NAT networking drops out after a few days' uptime -> fixed in svn. — at Version 29

Reported by: dimitris Owned by:
Component: network/NAT Version: VirtualBox 4.0.2
Keywords: NAT DHCP Cc:
Guest type: Linux Host type: Linux

Description (last modified by vasily Levchenko)

Both host and guest are Ubuntu Lucid servers, guest additions are up-to-date, guest uses a virtio adapter with NAT networking.

2-3 days after starting the guest, it loses connectivity to the outside world. These are the last few entries in its log:

31:39:39.715 NAT: adding domain name lan to search list
31:39:39.715 NAT: DHCP offered IP address 10.0.2.15
42:40:55.746 NAT: adding domain name lan to search list
42:40:55.746 NAT: DHCP offered IP address 10.0.2.15
51:08:57.554 NAT: error occurred while sending ICMP error message

The guest is running Logitech's Squeezebox Server, a perl-based music server. Ports for that have been forwarded (see machine XML) and traffic flows normally before the machine drops out.

log and machine XML to be attached.

Change History (39)

by dimitris, 14 years ago

Attachment: squeezemachine.xml added

by dimitris, 14 years ago

Attachment: VBox.log added

by dimitris, 14 years ago

Attachment: nodhcp.png added

terminal server view after controlvm reset

comment:1 by dimitris, 14 years ago

VBoxManage controlvm <machine> reset doesn't bring DHCP back. I have to power the machine off then back on to get DHCP working.

by dimitris, 14 years ago

Attachment: ifconfig.png added

ifconfig after machine loses connectivity

by dimitris, 14 years ago

Attachment: syslog.png added

guest syslog after losing connectivity

by dimitris, 14 years ago

Attachment: VBox.2.log added

log including the loss of connectivity and a guest reboot

comment:2 by dimitris, 14 years ago

Just happened again. Attached guest screenshots of its ifconfig and syslog after the guest lost connectivity but before I rebooted it (from within the guest itself through the terminal server).

After the guest is rebooted, the DHCP is still not functioning, same as shown in the previously attached screenshot (nodhcp.png).

Also attaching a log that covers the guest's reboot.

comment:3 by vasily Levchenko, 14 years ago

Does it happen with other adapters or with virt-io only?

comment:4 by dimitris, 14 years ago

Switched to

00:11.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 02)

Let's see how it does over the next few days.

by dimitris, 14 years ago

Attachment: dhcp.png added

DHCP log messages using Intel virtual network adapter

by dimitris, 14 years ago

Attachment: ifconfig.2.png added

ifconfig using Intel virtual network adapter

by dimitris, 14 years ago

Attachment: VBox.3.log added

log using Intel virtual network adapter.

comment:5 by dimitris, 14 years ago

Same behavior with Intel virtual network adapter, see latest attachments:

51:09:51.358 NAT: error occurred while sending ICMP error message

comment:6 by dimitris, 13 years ago

No improvement or behavior change with 3.2.10.

comment:7 by vasily Levchenko, 13 years ago

What kind of network applications are working on guest?

comment:8 by vasily Levchenko, 13 years ago

Version: VirtualBox 3.2.8VirtualBox 3.2.10

comment:9 by dimitris, 13 years ago

DHCP client, SSH server, the perl-based Squeezebox Server and its associated MySql server:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:9092          0.0.0.0:*               LISTEN      744/mysqld      
tcp        0      0 0.0.0.0:9000            0.0.0.0:*               LISTEN      565/perl        
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      551/mysqld      
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      648/sshd        
tcp        0      0 0.0.0.0:3483            0.0.0.0:*               LISTEN      565/perl        
tcp        0      0 0.0.0.0:9090            0.0.0.0:*               LISTEN      565/perl        
udp        0      0 0.0.0.0:3483            0.0.0.0:*                           565/perl        
udp        0      0 0.0.0.0:50609           0.0.0.0:*                           565/perl        
udp        0      0 0.0.0.0:68              0.0.0.0:*                           586/dhclient3   

comment:10 by dimitris, 13 years ago

Here are the active connections. The one on local:3483 is the hardware player, and normally that's the only active TCP connection on this guest. Traffic is minimal when there's no music streaming (clock update on the player's idle screen is driven by the server). Streaming is a typical MP3 or Ogg stream, should be less than 200kbps.

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 10.0.2.15:22            10.0.2.2:47061          ESTABLISHED
tcp        0      0 10.0.2.15:3483          10.0.2.2:4718           ESTABLISHED

comment:11 by vasily Levchenko, 13 years ago

Here I've installed the similar vm, let see if I'll able to reproduce the issue.

comment:12 by Appiah, 13 years ago

I have the same problem but on a OS running Windows 7 and the guest is XP.

Takes about 4 hours , log says the same in the end "NAT: error occurred while sending ICMP error message".

Have to shutdown the Guest , reset wont bring the network back.

comment:13 by dimitris, 13 years ago

Still here with 3.2.12.

in reply to:  13 comment:14 by vasily Levchenko, 13 years ago

Replying to dimitris:

Still here with 3.2.12.

Thank you for update.

comment:15 by vasily Levchenko, 13 years ago

Version: VirtualBox 3.2.10VirtualBox 3.2.12

comment:16 by dimitris, 13 years ago

Any luck reproducing this? If not I could probably FTP the machine (XML and disk image) if that would help.

in reply to:  16 comment:17 by vasily Levchenko, 13 years ago

Replying to dimitris:

Any luck reproducing this? If not I could probably FTP the machine (XML and disk image) if that would help.

Instructions are send via mail.

comment:18 by vasily Levchenko, 13 years ago

Almost on the third day I've got effect you're describing. I'll restart the test with more information collecting to get more information what, could lead to this effect. Also I'll try to minimize tests to provoke the effect in more short period.

comment:19 by vasily Levchenko, 13 years ago

Debug build, which I've lunched about week ago, has met other unrelated issue, causing me recompile/restart build.

comment:20 by vasily Levchenko, 13 years ago

yesterday debug build enters the state you'd described, finally. Looks like drops are caused with huge amount requests to UDP.IP.dst:63.17.46.179 and ARP.TIP:63.17.46.179 from NAT service side to the guest. in order to debug the issue I'm trying to create testcase which will provoke similar issue in smaller timeframe.

comment:21 by vasily Levchenko, 13 years ago

Version: VirtualBox 3.2.12VirtualBox 4.0.2

comment:22 by dimitris, 13 years ago

Also upgraded to 4.0.2 here, problem persists, with this additional info now on the logs:

...
09:03:50.329 NAT: DHCP offered IP address 10.0.2.15
19:34:59.297 NAT: DHCP offered IP address 10.0.2.15
30:17:58.870 NAT: DHCP offered IP address 10.0.2.15
41:06:21.336 NAT: DHCP offered IP address 10.0.2.15
51:06:45.355 NAT: error occurred while sending ICMP error message
51:06:45.416 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:06:45.618 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:06:46.028 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:06:46.842 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:06:48.475 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:06:51.738 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:06:58.266 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:07:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:07:11.323 virtio-net: failed to allocate SG buffer: size=1344 rc=VERR_NO_MEMORY
51:07:18.234 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:07:19.235 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:07:20.234 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:07:37.438 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:07:38.439 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:07:39.440 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:08:03.550 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:08:04.355 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:08:04.550 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:08:05.550 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
51:09:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:10:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:11:04.354 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:12:04.355 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:13:04.356 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:14:04.356 virtio-net: failed to allocate SG buffer: size=58 rc=VERR_NO_MEMORY
51:14:26.453 virtio-net: failed to allocate SG buffer: size=42 rc=VERR_NO_MEMORY
...

Over here though I don't see any UDP traffic, albeit on a relatively short packet capture. What I do see is the TCP traffic between the Squeezebox Server and the hardware player, e.g.:

11:45:35.855991 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 167190:168480, ack 3173, win 5840, length 1290
11:45:35.857154 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 168480, win 65535, length 0
11:45:36.854949 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 168480:169770, ack 3173, win 5840, length 1290
11:45:36.856447 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 169770, win 65535, length 0
11:45:37.855543 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 169770:171060, ack 3173, win 5840, length 1290
11:45:37.856682 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 171060, win 65535, length 0
11:45:38.001067 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 171060:171090, ack 3173, win 5840, length 30
11:45:38.002478 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 171090, win 65535, length 0
11:45:38.004206 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [P.], seq 3173:3234, ack 171090, win 65535, length 61
11:45:38.004269 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [P.], seq 3234:3295, ack 171090, win 65535, length 61
11:45:38.006554 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [.], ack 3295, win 5840, length 0
11:45:38.854897 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 171090:172380, ack 3295, win 5840, length 1290
11:45:38.855901 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 172380, win 65535, length 0
11:45:39.854914 IP 10.0.2.15.3483 > 10.0.2.2.38399: Flags [P.], seq 172380:173670, ack 3295, win 5840, length 1290
11:45:39.855924 IP 10.0.2.2.38399 > 10.0.2.15.3483: Flags [.], ack 173670, win 65535, length 0

The above is guest-side: The guest's eth0 is at 10.0.2.15, and the hardware player, on the other side of the VBox NAT, is mapped here as 10.0.2.2.

The traffic looks the same on the host side, just with different addresses. Which UDP port(s) do you see traffic on?

in reply to:  22 comment:23 by vasily Levchenko, 13 years ago

Replying to dimitris:

Also upgraded to 4.0.2 here, problem persists, with this additional info now on the logs:

The traffic looks the same on the host side, just with different addresses. Which UDP port(s) do you see traffic on?

it's a lot of udp broadcasts to udp/3483. I don't touch web interface a lot so i have very rare tcp entries in dump. the major part is UDP broadcasts.

comment:24 by dimitris, 13 years ago

OK, on a longer capture I see the UDP traffic too:

11:11:39.001496 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 44)
    10.0.2.15.3483 > 255.255.255.255.3483: UDP, length 16

It's once a minute, and IIRC it's the server process advertising itself on the network so that players can discover it.

in reply to:  24 comment:25 by vasily Levchenko, 13 years ago

Replying to dimitris:

It's once a minute, and IIRC it's the server process advertising itself on the network so that players can discover it.

Here, I've made tool broadcasting udp, that cause similar effect. But I don't sure that it's exactly or single the problem. I've fixed the issue and now I'm running the build to verify.

comment:26 by vasily Levchenko, 13 years ago

Summary: NAT networking drops out after a few days' uptimeNAT networking drops out after a few days' uptime -> fixed in svn.

comment:27 by Frank Mehnert, 13 years ago

Resolution: fixed
Status: newclosed

Should be fixed in 4.0.4.

comment:28 by rbhkamal, 12 years ago

Resolution: fixed
Status: closedreopened

I've been seeing this issue on and off for a while now and it is still in the 4.1.14.
With the help of a UDP packet blaster and the PCNetIII adapter I was able to reproduce this problem every single time.

The steps:

  1. Create a VM with a Natted interface, use PCNetIII
  2. Start a UDP packet blast with size 1400, the packets must go from the VM and get routed/natted out to the physical NIC.
  3. After ~5 minutes, the PCNet adapter will bounce the link
  4. Shortly after that, the NAT interface seems to be trying to send a DHCP ACK back to the PCNetIII adapter
  5. Finally, the error "error occurred while sending ICMP error message" shows up and no packets can be passed after that. You wont even be able to ping internal ip of the NAT interface (10.0.2.2)


Notes:

  • I also tried the same steps with the latest SVN and the problem is still there.
  • The intel interfaces do not have the same issue because they do not bounce the link of the NIC card.
  • See attached log
  • The same bug will happen in other cases but this is the most reliable way so far.

Please let me know if you'd like to test a patch, I can build my own vbox and test it.

by rbhkamal, 12 years ago

vboxlog using vbox 4.1.14 on windows 7 32bit with ubuntu 32bit guest

in reply to:  28 comment:29 by vasily Levchenko, 12 years ago

Description: modified (diff)

Replying to rbhkamal:

I've been seeing this issue on and off for a while now and it is still in the 4.1.14.
With the help of a UDP packet blaster and the PCNetIII adapter I was able to reproduce this problem every single time.

Could you please clarify what are you doing/or what are you meaning by "UDP packet blast"?

Please let me know if you'd like to test a patch, I can build my own vbox and test it.

Thanks, for cooperation, I'll check what disabled experimental code, could possibly solve your issue, but first I'd like to understand what issue have you met.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use