VirtualBox

Ticket #6499 (new defect)

Opened 4 years ago

Last modified 4 years ago

VBox causes packet processing delays on the host network interface when using bridged interface. OpenSolaris 2009.06

Reported by: therp Owned by:
Priority: major Component: network/hostif
Version: VirtualBox 3.1.6 Keywords:
Cc: Guest type: other
Host type: Solaris

Description (last modified by frank) (diff)

The moment I turn on bridging for a guest installed on an Opensolaris host, the physical interface of the OpenSolaris host the guest is bridged to, becomes sluggish to the outside world.

192.168.1.5 is my host, which I ping from another box on the same physical LAN (192.168.1.2). You can see clearly that the ping response time jumps when bridging is active (sometimes even to almost half a second).

PING 192.168.1.5 (192.168.1.5) 56(84) bytes of data.
64 bytes from 192.168.1.5: icmp_seq=1 ttl=255 time=0.122 ms
64 bytes from 192.168.1.5: icmp_seq=2 ttl=255 time=0.152 ms
64 bytes from 192.168.1.5: icmp_seq=3 ttl=255 time=0.141 ms
64 bytes from 192.168.1.5: icmp_seq=4 ttl=255 time=0.154 ms
64 bytes from 192.168.1.5: icmp_seq=5 ttl=255 time=0.168 ms
64 bytes from 192.168.1.5: icmp_seq=6 ttl=255 time=0.150 ms
64 bytes from 192.168.1.5: icmp_seq=7 ttl=255 time=0.165 ms
64 bytes from 192.168.1.5: icmp_seq=8 ttl=255 time=0.161 ms
64 bytes from 192.168.1.5: icmp_seq=9 ttl=255 time=0.191 ms
64 bytes from 192.168.1.5: icmp_seq=10 ttl=255 time=0.139 ms
64 bytes from 192.168.1.5: icmp_seq=11 ttl=255 time=0.147 ms
64 bytes from 192.168.1.5: icmp_seq=12 ttl=255 time=0.173 ms
64 bytes from 192.168.1.5: icmp_seq=13 ttl=255 time=0.156 ms
64 bytes from 192.168.1.5: icmp_seq=14 ttl=255 time=0.192 ms
64 bytes from 192.168.1.5: icmp_seq=15 ttl=255 time=0.137 ms
64 bytes from 192.168.1.5: icmp_seq=16 ttl=255 time=0.272 ms
64 bytes from 192.168.1.5: icmp_seq=17 ttl=255 time=0.176 ms
64 bytes from 192.168.1.5: icmp_seq=18 ttl=255 time=0.188 ms
64 bytes from 192.168.1.5: icmp_seq=19 ttl=255 time=0.147 ms
64 bytes from 192.168.1.5: icmp_seq=20 ttl=255 time=0.174 ms
64 bytes from 192.168.1.5: icmp_seq=21 ttl=255 time=0.160 ms
64 bytes from 192.168.1.5: icmp_seq=22 ttl=255 time=0.156 ms
VirtualBox switched to bridged interface
64 bytes from 192.168.1.5: icmp_seq=23 ttl=255 time=0.593 ms
64 bytes from 192.168.1.5: icmp_seq=24 ttl=255 time=0.210 ms
64 bytes from 192.168.1.5: icmp_seq=25 ttl=255 time=0.208 ms
64 bytes from 192.168.1.5: icmp_seq=26 ttl=255 time=0.430 ms
64 bytes from 192.168.1.5: icmp_seq=27 ttl=255 time=404 ms
64 bytes from 192.168.1.5: icmp_seq=28 ttl=255 time=413 ms
64 bytes from 192.168.1.5: icmp_seq=29 ttl=255 time=421 ms
64 bytes from 192.168.1.5: icmp_seq=30 ttl=255 time=249 ms
64 bytes from 192.168.1.5: icmp_seq=31 ttl=255 time=427 ms
64 bytes from 192.168.1.5: icmp_seq=32 ttl=255 time=0.465 ms
64 bytes from 192.168.1.5: icmp_seq=33 ttl=255 time=0.446 ms
64 bytes from 192.168.1.5: icmp_seq=34 ttl=255 time=0.354 ms
64 bytes from 192.168.1.5: icmp_seq=35 ttl=255 time=0.254 ms
64 bytes from 192.168.1.5: icmp_seq=36 ttl=255 time=0.248 ms
64 bytes from 192.168.1.5: icmp_seq=37 ttl=255 time=0.254 ms
64 bytes from 192.168.1.5: icmp_seq=38 ttl=255 time=0.625 ms
64 bytes from 192.168.1.5: icmp_seq=39 ttl=255 time=0.248 ms
64 bytes from 192.168.1.5: icmp_seq=40 ttl=255 time=0.249 ms
64 bytes from 192.168.1.5: icmp_seq=41 ttl=255 time=0.265 ms
64 bytes from 192.168.1.5: icmp_seq=42 ttl=255 time=0.227 ms
64 bytes from 192.168.1.5: icmp_seq=43 ttl=255 time=0.235 ms
64 bytes from 192.168.1.5: icmp_seq=44 ttl=255 time=0.180 ms
64 bytes from 192.168.1.5: icmp_seq=45 ttl=255 time=0.253 ms
64 bytes from 192.168.1.5: icmp_seq=46 ttl=255 time=0.256 ms
64 bytes from 192.168.1.5: icmp_seq=47 ttl=255 time=0.231 ms
64 bytes from 192.168.1.5: icmp_seq=48 ttl=255 time=0.600 ms
64 bytes from 192.168.1.5: icmp_seq=49 ttl=255 time=0.227 ms
64 bytes from 192.168.1.5: icmp_seq=50 ttl=255 time=0.195 ms
64 bytes from 192.168.1.5: icmp_seq=51 ttl=255 time=0.791 ms
64 bytes from 192.168.1.5: icmp_seq=52 ttl=255 time=624 ms
64 bytes from 192.168.1.5: icmp_seq=53 ttl=255 time=570 ms
64 bytes from 192.168.1.5: icmp_seq=54 ttl=255 time=0.524 ms
VirtualBox switched to Not Attached interface
64 bytes from 192.168.1.5: icmp_seq=55 ttl=255 time=0.146 ms
64 bytes from 192.168.1.5: icmp_seq=56 ttl=255 time=0.416 ms
64 bytes from 192.168.1.5: icmp_seq=57 ttl=255 time=0.262 ms
64 bytes from 192.168.1.5: icmp_seq=58 ttl=255 time=0.127 ms
64 bytes from 192.168.1.5: icmp_seq=59 ttl=255 time=0.156 ms
64 bytes from 192.168.1.5: icmp_seq=60 ttl=255 time=0.138 ms
64 bytes from 192.168.1.5: icmp_seq=61 ttl=255 time=0.140 ms
64 bytes from 192.168.1.5: icmp_seq=62 ttl=255 time=0.132 ms
64 bytes from 192.168.1.5: icmp_seq=63 ttl=255 time=0.132 ms
64 bytes from 192.168.1.5: icmp_seq=64 ttl=255 time=0.119 ms
64 bytes from 192.168.1.5: icmp_seq=65 ttl=255 time=0.149 ms
64 bytes from 192.168.1.5: icmp_seq=66 ttl=255 time=1.97 ms
64 bytes from 192.168.1.5: icmp_seq=67 ttl=255 time=0.148 ms
64 bytes from 192.168.1.5: icmp_seq=68 ttl=255 time=0.119 ms
64 bytes from 192.168.1.5: icmp_seq=69 ttl=255 time=0.135 ms
64 bytes from 192.168.1.5: icmp_seq=70 ttl=255 time=0.135 ms
64 bytes from 192.168.1.5: icmp_seq=71 ttl=255 time=0.173 ms
64 bytes from 192.168.1.5: icmp_seq=72 ttl=255 time=0.133 ms
64 bytes from 192.168.1.5: icmp_seq=73 ttl=255 time=0.186 ms
64 bytes from 192.168.1.5: icmp_seq=74 ttl=255 time=0.125 ms
64 bytes from 192.168.1.5: icmp_seq=75 ttl=255 time=0.108 ms
64 bytes from 192.168.1.5: icmp_seq=76 ttl=255 time=0.124 ms

I attached a libpcap dump from another ping session. Reinstalled from 3.1.4 to 3.1.6 to see if this problem persists (yes it does). So I'm confident that you should see this problem even with fresh installations.

One side note: my opensolaris host is serving a lot of data as NFS server, so that might play into this.

OpenSolaris Version is 2009.06.

The guest doesn't play a role here. The problem becomes apparent right at the boot loader.

Attachments

pingvariation.libcap Download (37.8 KB) - added by therp 4 years ago.
libpcap trace from the box that is pinging

Change History

Changed 4 years ago by therp

libpcap trace from the box that is pinging

comment:1 Changed 4 years ago by therp

in the libpcap dump, frame pair 73&74 features a half a second delay for ping reply.

comment:2 Changed 4 years ago by ramshankar

I'm not able to replicate this problem on my setup here. Absolutely no noticeable difference in ping times.

By "VBox switched to bridged", do you mean starting a VM with bridged over the primary interface of the box, or switching from NAT to bridged on a running VM?

Also could you show me ifconfig -a output of your box, and are you running ipfilter?

comment:3 Changed 4 years ago by therp

By "VBox switched to bridged", I mean switching from "Not Attached" in the network settings to "bridged" in a running VM.

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
nfo0: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> mtu 1500 index 2
        inet 192.168.1.5 netmask ffffff00 broadcast 192.168.1.255
vboxnet0: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> mtu 1500 index 3
        inet 192.168.56.1 netmask ffffff00 broadcast 192.168.56.255
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128 

How is the bridge designed under Solaris? Is the user space process involved in filtering out packets? If so, what will happen if the user space process is blocked for some reason? Could that explain the delays?

comment:4 Changed 4 years ago by ramshankar

Bridged is implemented as combined kernel driver. And no, a busy userland process would not have any impact on the performance but if there is heavy network IO on the interface then obviously it might have an impact. Your packet trace only ICMP packets so I cannot say what kind of network traffic or try guess the network load the NIC is being put through.

comment:5 Changed 4 years ago by therp

The original trace contained around 360~ thousand frames. I guess it wouldn't be helpful either.

I tried and failed to find simpler test cases to reproduce this effect. With VirtualBox in bridged mode (VB+B), the nfs server comes unresponsive and the usability of my NFS client drops through the bottom. With VirtualBox in NAT mode (VB+NAT) everything is fine.

Sometimes I see a high CPU load of nfsd on the OpenSolaris box, sometimes I don't. A network latency server test with ssh didn't show anything unusual, "while true; do ssh opensolarishost date; done".

I'm at the end of my wisdom here.

One final test I have not yet concluded is to measure run-time of "tar jxvf"-ing the linux kernel to the NFS share and try to find a different between the VB+B and VB+NAT case. Probably running bonnie++ on the NFS share twice would also yield some results.

However, I'm awefully sorry, I have spent an hour on that problem with no results and regular work is piling up, so I have to stop for now. The only thing I can reassure you is that with VB+B running, my web browser (Google Chrome) running on the NFS share backed by the Opensolaris host, becomes unresponsive the moment I start VB+B. Also any other NFS share access is almost impossible.

comment:6 Changed 4 years ago by rlinfati

repro on Host: Ubuntu 9.10 Guest: Windows XP SP3 VirtualBox 3.1.6

on VB 3.1.4 happens woth less frequency

comment:7 Changed 4 years ago by frank

Which network card do you use for the guest and which physical network card has your host?

comment:8 Changed 4 years ago by frank

  • Component changed from network to network/hostif
  • Description modified (diff)

comment:9 Changed 4 years ago by rlinfati

0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)

comment:10 Changed 4 years ago by frank

therp?

comment:11 Changed 4 years ago by therp

Sorry for the long delay on my side.

I'm using the gigabit chip found on the NVIDIA MCP61 platform using the driver nfo-2.6.3 driver from:  http://homepage2.nifty.com/mrym3/taiyodo/eng/  http://homepage2.nifty.com/mrym3/taiyodo/eng/

Apr 23 18:55:25 alia nfo: [ID 770952 kern.info] nfo0: nfo_mii_config: REALTEK phy (model:11 ver:2) found
Apr 23 18:55:27 alia nfo: [ID 455749 kern.info] nfo0: auto-negotiation done, advert:de1<ASM_DIR,PAUSE,100BASE_TX_FD,100BASE_TX,10BASE_T_FD,10BASE_T>, lpable:c5e1<PAUSE,100BASE_TX_FD,100BASE_TX,10BASE_T_FD,10BASE_T>, exp:f<LPCANNXTP,CANNXTPP,PAGERCVD,LPCANAN>
Apr 23 18:55:27 alia nfo: [ID 479110 kern.info]  MII_1000TC:200<FULL>, MII_1000TS:3800<CFG_LOCALRXOK>
Apr 23 18:55:27 alia nfo: [ID 103695 kern.info] nfo0: Link up: 1000 Mbps full duplex with symmetric flow control
Apr 23 18:55:27 alia mac: [ID 435574 kern.info] NOTICE: softmac1000 link up, 1000 Mbps, full duplex

comment:12 Changed 4 years ago by ramshankar

Oh, so one of the 3rd party NIC drivers. therp, I don't suppose you've tried this on some other driver? I've not been able to reproduce this behaviour with nge and rge drivers here.

comment:13 Changed 4 years ago by therp

Sorry again for the long delay.

Switching to the rge built-in driver didn't help. The moment I start in bridged mode, my workstation grinds to a halt, as presumably it is starved on NFS replies. Starting VBox in NAT mode works fine for me.

comment:14 Changed 4 years ago by Marko73

Also reproduced on Windows XP-SP3 (host), Ubuntu 8.04 (Guest). The physical NIC is a VIA 6102.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use