VirtualBox

Ticket #2900 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

New host interface implementation fails to work with guest bridge setups (e.g. openvpn)

Reported by: decoder Owned by:
Priority: critical Component: network/hostif
Version: VirtualBox 2.1.0 Keywords:
Cc: decoder-vbox@… Guest type: Linux
Host type: Linux

Description

Hello,

I am trying to run an OpenVPN server inside a vbox guest using a bridge setup in the guest (for openvpn itself). Using VirtualBox 2.0.6 and Host Interface, this works fine. In VirtualBox 2.1.0 this does not work in any way (neither by selecting a dedicated hw interface, nor by the old style tap bridge setup that worked with 2.0.6). There also seems no way to disable the new way host interfaces work.

Change History

comment:1 Changed 5 years ago by decoder

Providing more information based on request from erstazi (IRC):

Host:

wjpvm ~ # uname -a
Linux wjpvm 2.6.23-hardened-r7 #8 SMP Mon Nov 24 16:36:59 CET 2008 x86_64 Dual-Core AMD Opteron(tm) Processor 2220 AuthenticAMD GNU/Linux

Guest:

wjpvpn ~ # uname -a
Linux wjpvpn 2.6.25-hardened-r11 #2 Fri Dec 26 19:11:40 CET 2008 i686 Dual-Core AMD Opteron(tm) Processor 2220 AuthenticAMD GNU/Linux

The guest runs the OpenVPN software with a bridge setup, that means in the guest there is a bridge between a tap interface (used by openvpn) and the network interface.

Connection to the OpenVPN service (via UDP) works fine, the client is associated an IP as well. ICMP through the VPN works fine as well. TCP does not, it seems that packets are either dropped or delayed randomly (with delays in magnitudes of 30 seconds to 1 minute).

As said earlier, this setup works perfectly with VirtualBox 2.0.6 and a bridged host interface. With 2.1.0, it fails in any host interface setup I have tried.

I will try to setup a test host, as this main host is a production server and currently had to be downgraded to 2.0.6 because of this issue. If you need any further information, feel free to reply. My guess is that it has something to do with the new way VirtualBox obtains the network traffic from the interface, hence it would be very helpful if one could internally switch between the old and the new method for testing.

comment:2 Changed 5 years ago by aleksey

Could you paste the output of ifconfig from both host and guest? I am particularly interested in MTU sizes. Please also try 'ping -s 1472' from guest to the host (and to some remote host, preferably the other end of vpn tunnel) to see if MTU size packets are transferred reliably. Note that ICMP should go directly, not though the tunnel.

comment:3 Changed 5 years ago by frank

  • Component changed from network to network/hostif

comment:4 Changed 5 years ago by decoder

Unfortunately, I am not able to provide the necessary information at the moment, because the system is a production system (hence we cannot upgrade at the moment). I don't know when I will have time to setup a second system to duplicate the same setup again. If anyone else wants to volunteer and needs more information about the setup, feel free to request it here.

comment:5 Changed 5 years ago by aleksey

If your host used offloading features (rx offloading in particular, check with ethtool -k <if_name>) you may wish to switch to 2.2.4.

comment:6 Changed 5 years ago by decoder

I tried switching to version 2.2.4 and the problem is still unsolved. I did the ping tests you wanted and ICMP seems unaffected, I haven't seen any problems.

As for the ifconfig outputs, you can find them  here for the guest and  here for the host. The relevant interface on the host is eth1 at the moment, i.e. I attached the guest directly to eth1 (eth0 and the bridge there are not involved with the guest).

I also did two packet captures (not simultaneously, but for the same task, if you need them simultaneously, I can do that as well): One is on the guest interface eth0 (i.e. the physical interface that links the guest to the outside world) and the host interface eth1 (i.e. the interface that the guest is attached to).

You can find the tcpdump captures  here for the guest and  here for the host. In both cases, I tried an SSH from the VPN client (IP 134.96.247.201) to a server within our local subnet (IP 134.96.247.42). Command line used here was

Guest: tcpdump -p -s 0 -i eth0 tcp and src 134.96.247.201 or dst 134.96.247.201 -w eth0.dump
Host: tcpdump -p -s 0 -i eth1 tcp and src 134.96.247.201 or dst 134.96.247.201 -w eth1.dump

In both cases I started SSH, and aborted it after a waiting around 10 seconds. As the dumps show, some of the responses from the SSH server aren't received anymore on the guest, but on the host. As far as I could see, this affects only the larger packets.

I hope this is somehow helpful to track down the problem. I am also available on IRC.

Regards,

Chris

comment:7 Changed 5 years ago by decoder

After hours of debugging I haven't come much further. However, I tried removing tap0 from the guest bridge (which would only affect the vpn). After doing so, the connection to the guest itself started hanging and showing the same symptoms as I described before only over the VPN tunnel. I suspect the reason is a problem with MAC addresses. In how far should the vboxnetflt implementation be able to cope with multiple MAC addresses, devices and bridges in the guest anyway?

I'd be so happy if this problem could be solved :(

Thanks in adance,

Chris

comment:8 Changed 5 years ago by decoder

Narrowed down the problem: Happens only with Intel Virtual Nics, with the default AMD Nic, the problem is gone. So this might be a problem in the implementation of the Intel Virtual Network Hardware, or for some other reason, it does not affect the AMD card. When I first tested I got "entering disabled state" from the bridge with the AMD card, twice, but now it runs fine, and all connections are stable and without loss for the first time since vboxnetflt.

comment:9 Changed 5 years ago by decoder

As discussed on IRC, here is a virtual machine to reproduce the problem:

  1. Import VM (I will provide the link on IRC)
  2. Start VM and access via vrdp/on screen
  3. Adjust networking in /etc/network/interfaces (IP is static), also rm /etc/udev/rules.d/70-persistent-net.rules because of new MAC
  4. Reboot
  5. SSH to VM (root password 123456)
  6. Create persistent TAP interface (openvpn --mktun --dev tap0), up and bridge (ifconfig tap0 up && brctl addif br0 tap0)
  7. Edit /etc/openvpn.conf and adjust bind IP address as well as address range. If you use a private subnet, comment out the route push.
  8. (Re)start openvpn.
  9. Use openvpn client to access the VPN server from outside (using provided client config/cert/key ( download, adjust ip.).

Once you are connected, try ssh to some other server in the VPN subnet, test different types of packets, you should see the problem. To verify that the VPN is working, you can use ping, which works for me. If you directly get "No route to host" then something with the VPN might be wrong. For me it hangs quite a while before it does anything else.

If you run into any problems reproducing, let me know and I'll try to help :)

comment:10 Changed 5 years ago by aleksey

With the latest 3.0 release I tried the following setup:

Computer 1: The host -- this is the one where VM with VPN server is started on.
Computer 2: The client -- this one starts VPN client that connects to VPN server on the host.
Computer 3: The helper -- this one resides on the same Ethernet segment with the host and initiates ssh connection to the client over VPN.

Both the client and the helper had single Ethernet adapters, while the host had two: eth0 connected to the same Ethernet segment as the helper, and eth1 connected directly to the client. Or more simply:

The helper <-- LAN 1 --> The host <-- LAN 2 --> The client

The host did not do any routing, so it was not possible to get from LAN 1 to LAN 2.

Next, I started VM that had two adapters configured: eth2 bridged to host's eth0 and eth3 bridged to host's eth1. Again no routing was configured in VM. VPN server listened on eth3 and created tap0 which was bridged to eth2.

When I brought up VPN client on the client, it connected to VPN server in VM creating the tunnel "client's tap0-->client's eth0-->host's eth1-->VM's eth3-->VPN server-->VM's tap0-->VM's eth2-->host's eth0", so every Ethernet packet going to client's tap0 gets to the Ethernet segment host's eth0 is connected to and vice versa.

Copying a 2.5 GB file over ssh connection from the client to the helper got me a decent rate of 4.5 MB/s (both LANs are 10 MB/s and the host's CPU is three years old, so I get 100% CPU load on the host during this transfer). Both adapter types in VM were 'Intel PRO/1000 MT Desktop'.

At least we know that openvpn server works in VM with e1000 under certain circumstances. I'll elaborate on your setup in a short while as well as try 2.2.4 in my setup.

comment:11 Changed 5 years ago by decoder

Thanks for the effort. I will try our setup with version 3.x as soon as we upgrade. That will however still take a while I guess. Until that we run with the AMD Nics :)

comment:12 Changed 4 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Please reopen if still relevant with VBox 3.1.6.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use