VirtualBox

Ticket #4343 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

VirtualBox 3.0 freezes under network load => Fixed in SVN

Reported by: rynofinn Owned by:
Priority: major Component: network/NAT
Version: VirtualBox 3.0.4 Keywords: network freeze
Cc: Guest type: Linux
Host type: Linux

Description

I can reliably cause VirtualBox 3.0 to freeze with the following configuration:

  • Fedora 11 running within a Redhat EL5 host

Fedora runs fine until I try to download a package through yum, then it freezes

  • Ubuntu Intrepid running within a Vista host

Ubuntu runs fine until I try to download a package through apt-get, then it freezes

When the client freezes, it is consuming CPU but is unresponsive.

The attached log is for the Fedora/Redhat configuration.

Attachments

Fedora-2009-07-01-10-11-56.log Download (44.0 KB) - added by rynofinn 5 years ago.
Ubuntu-2009-07-01-13-36-20.log Download (52.7 KB) - added by jjamor 5 years ago.
jjamor's log
vboxtime.txt Download (19.7 KB) - added by jeffhoff 5 years ago.
RTC time issue
vboxreboot.2.txt Download (20.1 KB) - added by jeffhoff 5 years ago.
After Reboot
Clock.jpg Download (22.8 KB) - added by jeffhoff 5 years ago.
Screenshot showing diffrences in time from the OpenSolaris Host and the Guext XP
2009-07-25-13-22-40.022-VirtualBox-1682.log Download (226 bytes) - added by jeffhoff 5 years ago.
Lock up log #1
VBox.log Download (120.7 KB) - added by jeffhoff 5 years ago.
XP lock up #1
VBox.2.log Download (64.9 KB) - added by jeffhoff 5 years ago.
Fedora lock up #1
VBox.3.log Download (57.2 KB) - added by jeffhoff 5 years ago.
Fedora crashed immediatly after Vbox additions for 3.0.3
Screenshot-1.png Download (181.3 KB) - added by jeffhoff 5 years ago.
Screen shot of initial Fedora guest screen
Screenshot-1.2.png Download (181.3 KB) - added by jeffhoff 5 years ago.
Screen Shot of Fedora Guest
Screenshot-4.png Download (264.8 KB) - added by jeffhoff 5 years ago.
Screen shot of the Speed test from the Solaris Host
Screenshot-5.jpg Download (299.2 KB) - added by jeffhoff 5 years ago.
Screen shot of the Speed test from the XP Guest

Change History

Changed 5 years ago by rynofinn

comment:1 Changed 5 years ago by jjamor

I can confirm this bug. Host: Vista 64 bit. Client: Ubuntu 9.04 i386 Desktop.

Problem: When I do a network load over the client (for example, upgrading the distro through Internet) it starts ok but after a couple of minutes it hangs.

The machine works ok with Virtualbox 2.2, so I'm downgrading to it and waiting this ticket.

I'm attaching my log also.

Changed 5 years ago by jjamor

jjamor's log

comment:2 Changed 5 years ago by jjamor

I think #4343 is a duplicate of #4334.

And also I'm testing the Intel Pro driver and the VM does not hang now. I think this is the temporary fix.

comment:3 Changed 5 years ago by bob23450

Same problem in 32-bit host (Windows XP Pro SP3), 32-bit guest (Ubuntu 9.04), NAT networking, not using SMP. I haven't tried other networking modes. The hung VM can be resumed closing the machine with "save state" and restarting it. VB 2.2.4 is not affected by this problem.

comment:4 Changed 5 years ago by frank

We believe we found the problem. The fix will be available with the next VBox release. For users compiling their own OSE binaries: Use r21153 or later.

comment:5 Changed 5 years ago by frank

  • Summary changed from VirtualBox 3.0 freezes under network load to VirtualBox 3.0 freezes under network load => Fixed in SVN

comment:6 Changed 5 years ago by seiryu

Would this issue also cause slow network performance within the VM? I'm running OSX Host with OpenSolaris Guest seeing very poor network performance. A simple web-page load takes minutes within the VM while I can surf without problem from the host at the same time.

comment:7 Changed 5 years ago by frank

Yes, definitely.

comment:8 Changed 5 years ago by bliss

This is also happening to me... do you know when first vbox 3 patch (3.0.1?) will be released? aprox date? just to know if go back to vbox 2.X or wait a few days...

Thanks

comment:9 Changed 5 years ago by frank

You can look at the release date of our previous versions. It will definitely take more time that some days so you probably want to go back to 2.2.4 for the time being. Sorry, we can't give an exact ETA.

comment:10 follow-up: ↓ 107 Changed 5 years ago by kilpatds

Note quite sure if it's the same bug, but I've seem my Windows XP VM (running under Fedora 11) freeze during network traffic. Switching off all the "use fancy CPU feature" flags seems to avoid the problem.

comment:11 Changed 5 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

comment:12 Changed 5 years ago by rasta

  • Status changed from closed to reopened
  • Resolution fixed deleted

I am not sure that it is exactly the same problem, but NAT networking is still slow and intermittent in 3.0.2 for a Solaris 10 u7 32-bit guest (with VT-x) running on a 32-bit WinXP host.

I had entered another bug for this, but it was removed with a link to here.

comment:13 Changed 5 years ago by bqbauer

NAT is still almost unusable for OpenSolaris 2009.06 guests. 64-bit guest of OpenSolaris 64-bit. Never a problem with 2.2.4, just the 3.0.x products up to this point (3.0 beta through 3.0.2).

comment:14 Changed 5 years ago by StevenWang

I'm sure it's still not fixed in 3.0.2. And I found a way to un-freezed the guest:press Host+Del, the task manager will show out , then you get back the control.

host: Vista64 and Vista32(I want to confirm if it's 32 or 64 bit problem), guest: windows XP

comment:15 Changed 5 years ago by Sascha-KE

Have the same issues on Vista32 host with XP guest. Grtz

comment:16 Changed 5 years ago by bliss

My guest is ubuntu... any alternative to Host+Del?

comment:17 Changed 5 years ago by rasta

For the record, with a Solaris 10 u7 32-bit guest (with VT-x) running on a 32-bit WinXP host, I do not experience real hanging related to the NAT networking problem. I only experience very slow and intermittent network connectivity, which is very annoying.

comment:18 Changed 5 years ago by bqbauer

3.0.3 build r50119 also unusable with NAT for both 32-bit XP and 64-bit OpenSolaris guests. Networking to the host works great, but connectivity beyond the host is useless.

comment:19 Changed 5 years ago by bqbauer

Further clarification with 3.0.2 and development build 3.0.3: After a few connections to external hosts, they begin to work progressively more slowly, then everything either stops working or times out. This is with both 32-bit XP and 64-bit OpenSolaris guests that I have tested. Connectivity to the host appears to be slightly more reliable during the problem periods, but can still timeout. Periodically, all connectivity comes back to life, but soon fails again.

I don't know if it's the number of connections, or simply the amount of time it takes to make a few random connections via http, FTP, or SSH. I also have not let a guest run for hours to see if it eventually loses complete connectivity, because the apparent randomness of the problem would make it very difficult to confirm how much networking was still functioning.

comment:20 follow-up: ↓ 64 Changed 5 years ago by cwixon

I can confirm the same thing rasta and bqbauer are seeing (probably others too, but they didn't provide details), on a 32-bit XP Pro host, with 32-bit Ubuntu 9.04 desktop guest.

Virtualbox 3.0.2 build 49928, software virtualization (NO VT-x/AMD-V). Guest additions version 1.4, guest kernel 2.6.28-13-generic.

I'm seeing slow, intermittent NAT connections in the guest. Lots of timeouts. I DO NOT need to kick the VM to restart connections, so it's not a total networking failure. Simply re-initiating a connection (e.g., hitting "reload" in Firefox) will always get it going again.

comment:21 Changed 5 years ago by rasta

Yes, I am getting the same behavior as cwixon using Vbox 3+. I should add that without VT-x enabled for the guest, my 32-bit Solaris guests are COMPLETELY unusable on a WinXP 32-bit host (see bug #4411).

comment:22 Changed 5 years ago by cwixon

I will also add that Vbox 2.2.x (whatever it was I upgraded from) was fine. This issue is new to 3.0.

comment:23 Changed 5 years ago by ruthrsc

I've got similar issue on CentOS 5.3 host (VT-x enabled, e1000 NIC) + VirutalBox 3.0.2 + FreeBSD guests (versions 7.0,7.1,7.2). When trying to run rsync-ing big ammount of data (~100GB) over NAT interface, transfer is timing out in the middle (repeatable). I've tried disabling network offloading in guest, assigning different network interfaces (Am79C970A,82540EM), incrasing NAT buffers according to manual section "Tuning TCP/IP buffers for NAT" - everytime problem still exists. Network doesn't die completly so after rerunning rsync it's working for antother couple minutes and the die again. Switching to bridged network solves the problem.

comment:24 Changed 5 years ago by jeffhoff

I am running opensolaris 5.11 snv_118 host 64 bit AMD quad. I see the same problem when running a Fedora 11 64 bit guest and an 32 bit XP guest. Networking is bridged. I see pretty constant Virtual Disk access and slow network speeds. Eventually the network gets so bad I can not telnet to the Open Solaris host and must power off the machine to recover.

This has happened a few times. Something has changed since 2.2.4 as that was pretty darn good.

In bridged mode running a speed test from a guest results in 1/10 the speed of the host system over my cable modem.

Host gets 15 megs down 6 megs up, guests get 1.5 megs down and .6megs up.... on a cable connection. Used to be pretty much the same.

comment:25 Changed 5 years ago by Hachiman

sent test build (r50282) for verification.

comment:26 follow-up: ↓ 27 Changed 5 years ago by jeffhoff

Question, for Hachiman do you need me to test this? Happy to help just need a bit of guidance in order to do the testing.

Also, forgot to mention both the keyboard and mouse responses are slow with this version. If that helps.

comment:27 in reply to: ↑ 26 Changed 5 years ago by Hachiman

Replying to jeffhoff:

Question, for Hachiman do you need me to test this? Happy to help just need a bit of guidance in order to do the testing.

Also, forgot to mention both the keyboard and mouse responses are slow with this version. If that helps.

good, thanks. I'll sent you test build (open solaris one). Please check the mouse/keyboard speed and anomalies. It probably sounds strange but socket->guest communication done in the same way it done for mouse/keyboard.

comment:28 Changed 5 years ago by Hachiman

Solaris build (r50282) has been sent.

comment:29 Changed 5 years ago by Hachiman

  • Summary changed from VirtualBox 3.0 freezes under network load => Fixed in SVN to VirtualBox 3.0 freezes under network load

comment:30 follow-ups: ↓ 31 ↓ 32 Changed 5 years ago by jeffhoff

I have install 3.0.3-r50282 and it is running. May need a day or two to see if it locks up. Still seems a bit slow boot guests and initializing them. Login is slow to both guests XP 2 minutes Fedora 11 minutes, and keyboard and mouse context switching slow and inconsistent. Still notice constant virtual Disk access.

Solaris Host comcastspeed test at Chicago is: Down 15.8 megs up 4.7

XP guest speeds: Down 2.1 up 1.7

Fedora guest Still trying to get logged in.. Hmmmm.. Down 2.5 up 1.6

comment:31 in reply to: ↑ 30 Changed 5 years ago by Hachiman

Replying to jeffhoff:

I have install 3.0.3-r50282 and it is running. May need a day or two to see if it locks up. Still seems a bit slow boot guests and initializing them. Login is slow to both guests XP 2 minutes Fedora 11 minutes, and keyboard and mouse context switching slow and inconsistent. Still notice constant virtual Disk access.

Solaris Host comcastspeed test at Chicago is: Down 15.8 megs up 4.7

XP guest speeds: Down 2.1 up 1.7

Fedora guest Still trying to get logged in.. Hmmmm.. Down 2.5 up 1.6

Thank you for feedback, will check test on several version.

comment:32 in reply to: ↑ 30 Changed 5 years ago by Hachiman

Replying to jeffhoff:

I have install 3.0.3-r50282 and it is running. May need a day or two to see if it locks up. Still seems a bit slow boot guests and initializing them. Login is slow to both guests XP 2 minutes Fedora 11 minutes, and keyboard and mouse context switching slow and inconsistent. Still notice constant virtual Disk access.

Solaris Host comcastspeed test at Chicago is: Down 15.8 megs up 4.7

XP guest speeds: Down 2.1 up 1.7

Fedora guest Still trying to get logged in.. Hmmmm.. Down 2.5 up 1.6

probably, it'd be better to show the ratio between from host comcastspeed result and guest ones.

comment:33 follow-up: ↓ 35 Changed 5 years ago by jeffhoff

Also, noticed later in the day that there was a 30 minute diffrence between the Guest ime and the host time.. Host was at 5:00 XP Guest was at 4:30.

Question was there a new VBoxAdditions in that update?

Changed 5 years ago by jeffhoff

RTC time issue

Changed 5 years ago by jeffhoff

After Reboot

comment:34 Changed 5 years ago by jeffhoff

Added to Dmesg logs. The first seems to explain the time loss issue. The second is after a reboot this morning.

comment:35 in reply to: ↑ 33 Changed 5 years ago by Hachiman

Replying to jeffhoff:

Also, noticed later in the day that there was a 30 minute diffrence between the Guest ime and the host time.. Host was at 5:00 XP Guest was at 4:30.

interesting ...

Question was there a new VBoxAdditions in that update?

no, updates come separately via user :)

comment:36 follow-up: ↓ 38 Changed 5 years ago by jeffhoff

I installed the VBox guest additions for both the XP and Fedora guests. Restarted the Guests and both guests locked up disk 100%. Then the Solaris host locked up.

Restarted the machine and started the XP guest 20 mins to get logged in. Restarted Fedora never got to a login after 8 minutes they system locked up again.

Rolling back to 3.0.2

comment:37 Changed 5 years ago by jeffhoff

3.0.2 more stable than 3.0.3..

Let me know if you would like me to test another cut.

comment:38 in reply to: ↑ 36 Changed 5 years ago by Hachiman

Replying to jeffhoff:

I installed the VBox guest additions for both the XP and Fedora guests. Restarted the Guests and both guests locked up disk 100%. Then the Solaris host locked up.

Restarted the machine and started the XP guest 20 mins to get logged in. Restarted Fedora never got to a login after 8 minutes they system locked up again.

Not sure it relates to NAT/Networking , but thanks anyway for feedback.

Rolling back to 3.0.2

Changed 5 years ago by jeffhoff

Screenshot showing diffrences in time from the OpenSolaris Host and the Guext XP

comment:39 Changed 5 years ago by jeffhoff

Attached a screen shot showing the clock drift in an XP guest. Two times are displayed. Upper right is correct and is OpenSolaris Host the other is windows XP guest.

comment:40 Changed 5 years ago by jeffhoff

Some more information system locked up today with the XP network icon totally busy.

Fedora updated itself to 2.6.29.6-213.fc11.x86_64 when booting it gets stuck on Starting VirtualBox Additions. This also happens on the prior kernel 2.6.29.5.191.fc11.x86_64.

Left long enough it locks up the Solaris Host and requires a power off/on to restart.

comment:41 Changed 5 years ago by jeffhoff

Rolled back to VB 3.0.0 to get the current Fedora Kernel to boot 2.6.29.6-213.fc11.x86_64.

Something is out of align somewhere.

comment:42 Changed 5 years ago by jeffhoff

Ok looks like the shell script /etc/init.d/vboxadd is stalling the start up.

Boot to single user. Chmoded the files vboxadd and vboxadd-service to 644 init 5 and Fedora Boots latest kernel 2.6.29.6-213.fc11.x86_64. Of course no graphical login but I have VNC..

I then as super user went bak and chmoded the files back to 755

I tried to start vboxadd and it stalled again. Using another telnet session vboxadd status show it running.

Then I started vboxadd-service without issue.

Changed 5 years ago by jeffhoff

Lock up log #1

Changed 5 years ago by jeffhoff

XP lock up #1

Changed 5 years ago by jeffhoff

Fedora lock up #1

comment:43 Changed 5 years ago by jeffhoff

Some logs attached when it locked up after Fedora was running and then started XP.

Changed 5 years ago by jeffhoff

Fedora crashed immediatly after Vbox additions for 3.0.3

comment:44 Changed 5 years ago by jeffhoff

Ok went back to 3.0.3 and when I was installing VBadditions on Fedora it aborted. I attached the log.

It also reset the desktop for Opensolaris Host creating a 357MB core file. Tried to attach it but it takes for ever. Zipped down to 93 MB.

Happy to send it to someone or place on their ftp server.

comment:45 Changed 5 years ago by jeffhoff

Ok back to 3.0.3.

Only weirdness is the initial screen for the fedora login is jazzed up. Have to login blind as it only shows half a screen. The auto resize then does not function once logged in. tried it in both 3d and non 3d settings.

XP seems ok so far. Startup, Disk access and network access still slooooow for both guests.

Changed 5 years ago by jeffhoff

Screen shot of initial Fedora guest screen

Changed 5 years ago by jeffhoff

Screen Shot of Fedora Guest

comment:46 follow-up: ↓ 56 Changed 5 years ago by jeffhoff

MOre information..

Well things seemed to be going ok when the system locked up again. On reboot I had to fsck fedora. Then restart it on to have it lock up everything with sata errors.

comment:47 Changed 5 years ago by r_mano

Hmm. I think this bug report is quite mangled. I can confirm that with Linux host (ubuntu 8.10) and Win XP guest, NAT networking performances dropped so much (especially with a WiFi connection in the host) that I could not use outlook at all in the client. I had to revert to 2.2.4.

comment:48 Changed 5 years ago by cwixon

I agree with r_mano that we have drifted a bit off-course here -- I wonder if there's something else new in svn that might account for jeffhoff's troubles.

In any case, I still have the NAT networking issues (per my earlier comment) but I haven't tried anything newer than the 3.0.2 release. If there's a 32-bit Windows build that incorporates Hachimann's proposed fix, I will test it. I don't have a Windows build environment to roll my own from svn.

comment:49 follow-ups: ↓ 50 ↓ 58 Changed 5 years ago by willyo

I just installed 3.0.2 on Windows Vista. After firing up the Fedora 10 guest I then installed the latest guest additions.

FWIW: After noticing bad network performance on the guest, I ran Wireshark simultaneously on both the host and the client and noted the following:

  1. Ping requests from the guest go out fine from the host and always get replies which are received by the host. However many/most of the replies don't show up in the guest.
  1. When a ping reply *is* received by the guest, about 1 out of 10 times the From IP address is the 10.0.2.... address of the (NAT router ? what's the correct term ?) rather than the IP address of the ping reply source.

I agree that it sounds like NAT is seriously messed up .....

comment:50 in reply to: ↑ 49 Changed 5 years ago by Hachiman

Replying to willyo: Your comment more related to #4540

comment:51 Changed 5 years ago by Hachiman

  • Component changed from network to network/NAT

comment:52 Changed 5 years ago by Hachiman

please check with 3.0.4

comment:53 in reply to: ↑ description ; follow-up: ↓ 55 Changed 5 years ago by rubentrf

Checked with 3.0.4 and I confirm NAT network is seriously SLOW, totally unusable.

Host: Mac OS Leopard Guest: Windows 7

comment:54 Changed 5 years ago by StevenWang

Host: Vista Home Premium Guest: Windows XP SP2 NAT network speed: OK (seems better than V2.2.4)

comment:55 in reply to: ↑ 53 Changed 5 years ago by Hachiman

Replying to rubentrf:

Checked with 3.0.4 and I confirm NAT network is seriously SLOW, totally unusable.

Host: Mac OS Leopard Guest: Windows 7

Could you please clarify? because "network is seriously SLOW, totally unusable" don't give me any chance to improve situation for you. What would be helpful is comparisons of network speed with e.g. 2.2.4 (containing talking numbers).

comment:56 in reply to: ↑ 46 Changed 5 years ago by Hachiman

Replying to jeffhoff:

MOre information..

Well things seemed to be going ok when the system locked up again. On reboot I had to fsck fedora. Then restart it on to have it lock up everything with sata errors.

jeffhoff, Could you please re-check network with 3.0.4, problems related to additions and other special effects you've met in development built should gone, so it's nice time to compare network usability.

comment:57 Changed 5 years ago by Hachiman

  • Version changed from VirtualBox 3.0.0 to VirtualBox 3.0.4

comment:58 in reply to: ↑ 49 ; follow-up: ↓ 59 Changed 5 years ago by willyo

Replying to willyo:

I've now spent sometime testing 3.0.4

From my comment above:

  1. When a ping reply *is* received by the guest, about 1 out of 10 times the From IP address is the 10.0.2.... address of the (NAT router ? what's the correct term ?) rather than the IP address of the ping reply source.

--- As noted this is bug #4540 and appears to be fixed.

  1. Ping requests from the guest go out fine from the host and always get replies which are received by the host.

However many/most of the replies don't show up in the guest.

===> This problem still exists on my Vista Host/Fedora 10 client setup.

The degree of packet loss varies. Sometimes ping running on the guest runs fine for several minutes with no missed replies then starts missing 5-10 % of the replies.

Other times I'll start ping on the guest and 200-300 replies will be missed before some are seen.

While running the pings I did 3 separate captures.

  1. In the Windows host using Wireshark;
  1. In the Fedora guest using Wireshark;
  1. Using VBoxManage modifyvm xxxx --nictrace1 on ......

Looking at the captures I see the following:

  1. In the Windows capture all ping requests from the client are captured and replies to all the requests are seen.
  1. In both the "nictrace" and the guest captures many of the ping replies are missing with the amount missing varying over time.

The nictrace and the guest captures appear to be pretty much identical.

---

If this is not the right place to report this problem please let me know.

Thanks

comment:59 in reply to: ↑ 58 ; follow-up: ↓ 60 Changed 5 years ago by Hachiman

Replying to willyo: yes you're pretty right with detection technique. But it better to apply it to TCP and UDP traffic. ICMP is bit specific especially on Windows please see my last comment to #4540.

comment:60 in reply to: ↑ 59 ; follow-up: ↓ 61 Changed 5 years ago by willyo

Replying to Hachiman:

yes you're pretty right with detection technique. But it better to apply it to TCP and UDP traffic. ICMP is bit specific especially on Windows please see my last comment to #4540.

OK: I've done some testing using scp to copy files and see no problems with packet loss.

Thanks

comment:61 in reply to: ↑ 60 Changed 5 years ago by Hachiman

  • Status changed from reopened to closed
  • Resolution set to fixed

Replying to willyo:

Replying to Hachiman:

yes you're pretty right with detection technique. But it better to apply it to TCP and UDP traffic. ICMP is bit specific especially on Windows please see my last comment to #4540.

OK: I've done some testing using scp to copy files and see no problems with packet loss.

Thanks

thanks for feedback.

Changed 5 years ago by jeffhoff

Screen shot of the Speed test from the Solaris Host

Changed 5 years ago by jeffhoff

Screen shot of the Speed test from the XP Guest

comment:62 Changed 5 years ago by Hachiman

  • Status changed from closed to reopened
  • Resolution fixed deleted

comment:63 Changed 5 years ago by jeffhoff

OK updated to 3.0.4. Still had a mysterious lockup over night..

Attached two screen shots showing speed test from the Solaris host Screenshot-4.png and then from the XP guest ScreenShot-5.jpg

Pretty significant internal VB network loss...

comment:64 in reply to: ↑ 20 Changed 5 years ago by cwixon

Replying to cwixon:

I can confirm the same thing rasta and bqbauer are seeing (probably others too, but they didn't provide details), on a 32-bit XP Pro host, with 32-bit Ubuntu 9.04 desktop guest.

Virtualbox 3.0.2 build 49928, software virtualization (NO VT-x/AMD-V). Guest additions version 1.4, guest kernel 2.6.28-13-generic.

3.0.4 is much better -- no further problems for me.

comment:65 Changed 5 years ago by bqbauer

No change here. NAT under 3.0.4 is useless for me on an OpenSolaris host (same information as before).

Ticket #2838 addressed an overall NAT performance problem, but it was usable and worked. With 2009.06 and Crossbow, that ticket was effectively resolved by the changes made with Crossbow networking, yielding NAT performance of around 200Mb/s. See my notes on ticket 2838.

However, versions 3.0.0-3.0.4 have rendered NAT completely useless, at least with OpenSolaris hosts. This is severely broken for an official release.

With 3.0.4, a simple 47MB FTP file transfer from my host finished after 10 minutes, returning a speed of 77.8KBytes/s. This is the same file I tested with in ticket 2838, with which I consistently get about 200MBits/s (note the bytes and bits differences here) . DNS queries takes 10-20 seconds for a single hostname. When I connect to www.google.com, the browser sits there and says "connecting" for up to a minute before anything displays, and then Google's basic web page takes another minute or two to complete loading.

I was so happy when 2009.06 fixed the problem in ticket 2838, what was done to so miserably break NAT, and moreover why?

comment:66 follow-up: ↓ 67 Changed 5 years ago by bqbauer

An update: Further testing with NAT is showing widely varied results. One moment my same file transfer (see above) finishes with the speed of bridged networking at around 400Mb/s as in my comments in ticket 2838, the next moment everything just stops working, then it has the slow performance previously mentioned, then back to the blazing 400Mb/s. I just transferred the OpenSolaris 2008.11 ISO file at 400Mb/s from my host.

So what's going on? NAT is very much on-off-on-off. It's much more obvious with web browsing, because you can see web pages just sit there, then suddenly half will load, then it all either stops or finishes. During the pauses, command line activities such as FTP or SSH also break down. I'm not one to put much faith in ping output, but it does fluctuate between 20-ish ms to over 500ms from moment to moment. I've observed none of this random behavior with 2.2.4.

comment:67 in reply to: ↑ 66 Changed 5 years ago by Hachiman

Replying to bqbauer: Thanks for feedback. There were several serious changes introduced in 3.0. I'll find opportunty to send you profile build to detect reasons for such behavior.

comment:68 Changed 5 years ago by andyw

I have similar issues for VirtualBox 3.0.4 r50677 running on a Win XP x64 host, VT-X enabled. Internet through NAT is working only sporadically for my openSUSE 11.1 guest; I'm just downloading the latest patches for the guest OS and the built-in update client has a network timeout roughly every 30 sec. After hitting retry and waiting another couple of seconds, the download continues, just to freeze again after a couple of seconds. I experience the same network freezees also when surfing the internet within the guest system. The guest OS itself is responsive at any time. The same issues where present also for earlier releases of the 3.x branch. I would be happy to provide additional details, if requested. Thanks for your efforts! Andy

comment:69 Changed 5 years ago by mduigou

I too have encountered similar problems with incredibly slow network performance using NAT. I have tried both the PCNet and Intel interfaces per a suggestion I saw elsewhere but it made no difference. Connectivity to the host and remote network destinations is incredibly slow.

Interestingly scp file copies have the behaviour that the connection takes a long time to set up and then each file is sent quickly but with a long pause between files. It's as if there's something timing out almost instantly if there's not additional traffic queued.

VirtualBox 3.0.0-3.0.4 Host: MacOS 10.5.7 Intel Core 2 Duo VT-x enabled. Guest : Solaris 10u7 (with matching guest additions)

I am a Sun employee and can provide access to the system via ssh for debugging.

comment:70 Changed 5 years ago by rasta

I am still getting intermittent network connectivity for a Solaris 10 u7 guest on a 32-bit WinXPsp3 host using Vbox 3.0.4. Behavior is different that previous 3+ releases, however, in that most web pages do load, although after long delays.

comment:71 Changed 5 years ago by gordonwatts

I am also having issues similar to above, but it depends on location. I have VB runing on a portable as a host (Lenovo X61T). At home I'm behind a ADSL modem's NAT (my portable's ip address is 192.168.1.xxx). At work my portable gets an IP address on the internet directly (no NAT). At home the guest gets no network packets, and work it gets them.

Possible source of difference is simply that I'm behind a NAT here at home, but not at work.

If I ping a network address I get almost no packets back (perhaps 1 in 200 or so). Most web pages don't load.

Config: Host: VB 3.0.4, Running on Windows 7, RC. Guest: W7, Ubuntu, and generic Linux distro (made by rBuilder). In all cases the network adaptor is configured as Intel pro 1000 desktop, NAT. If I reconfigure the network adaptor as bridged then it works in all situations I've been able to test.

If I am amble to test this in more than two places I'll report further. Curious if others see something similar.

comment:72 Changed 5 years ago by jeffhoff

I believe I caught the error see ticket #4179..... Contains Screen shots and a log file with the error message.

comment:73 Changed 5 years ago by r_mano

jeffhoff: probably not related. I do not use AMD-V extension, and moreover, what I see here is a *network* freeze when I use NAT, not a VM freeze.

I think this bug report is quite messed up, there are two issues discussed here. I think the title of the bug is misguiding, It shoud be "VB 3.0 NAT network freezes" or "NAT unbearably slow in 3.0".

I want to restate: NAT network (Linux host, XP guests) is very slow with 3.0 and sometimes freezes for minutes at a time. That makes VB 3.0 practically impossible to use with NAT.

comment:74 Changed 5 years ago by jeffhoff

R_mano. You probably are correct as to the two issues. But, since the VB is using the network stack underneath hard to tell what dies first.

With regard to NAT the only networking I have been able to successfully configure and use is Bridged..

If I remember correctly Nat configures and runs but only connects to the host and not to the WEB.

comment:75 Changed 5 years ago by gordonwatts

r_mano: I think the NAT problem is broader than Linux host/XP guests. I observe the amazingly slow/non-working NAT situation with a W7 host, and Linux and Windows guests. BTW - have you tried it when you are behind a NAT vs facing the internet? I'll be able to run more tests of this type next week as I'll be back at a work location that will have my portable network facing rather than NAT facing, as it is at home (where NAT does not work).

If someone wants more info, please let me know. This bug has made VB almost unusable for me.

comment:76 Changed 5 years ago by gordonwatts

I am in a second institution where my portable has gotten a IP address that is directly on the internet -- not behind a NAT. And NAT in VB works just fine here as well. I have not been able to test as second NAT - only mine at home is availible to me. In about a week I will have a second NAT I can test - but others could probably tell in the meantime.

For me this is starting to look like the issue. Should I open a new bug report (sorry - newbe when it comes to bug reports w/ VirtualBox).

comment:77 Changed 5 years ago by chowes

I can also confirm this issue.. I am running VB 3.0.4r50677 on a Windows 7 64-bit host. The primary guest I'm using is a WinXP SP3 system..

When I use NAT networking, the connectivity is very, very slow and there are a lot of network timeouts within the guest. This is also adversely affecting VRDP and shared folders. When I switch to bridged networking, the performance returns for applications connecting to the external network. The VRDP and shared folders are still very, very slow.

comment:78 Changed 5 years ago by gordonwatts

chowes: is your x64 bit W7 box itself behind a NAT? (say at home, etc.) Or is it straight on the internet? I've noticed a pattern with my VB NAT networking failures - if my host is behind a NAT then VB's NAT networking is very slow, if my box is directly on the internet, then VB's NAT network connection works just fine.

comment:79 Changed 5 years ago by chowes

My systems are behind another NAT device..

That doesn't explain the very slow VRDP and shared folder connectivity, though.. I can see how this type of slowness would cause the kinds of lockups and hangups that are described in this ticket, though..

comment:80 Changed 5 years ago by gordonwatts

Chowes - I don't understand either - but I've not tried to understand the source code either. I'm just doing pattern recognition. :-) But I'd bet you a beer (or your drink of choice) that if you moved your host out from behind the NAT then everything would work just fine - including VRDP and shared folders.

I'm curious if anyone that has observed these problems does not have the host computer behind a NAT?

comment:81 Changed 5 years ago by r_mano

My host computer is not behind a NAT --- just the corporative network. I can see that the problems are worst when I use VB from home, behind a NAT, but there are problems too when the host is directly on the net... so no, I do not think that the fact of being behind a NAT does imply anything.

comment:82 Changed 5 years ago by gordonwatts

r_mano - thanks. that is interesting. It is night-and-day for me. Behind my home NAT the guest OS is useless unless I use bridge networking. On the corporate network, which for me is basically unfiltered (advantage of working at science labs) and the guest works perfectly when using VB's NAT implementation. This is my portable - so everything else remains constant.

I wonder what is going on. Well, interested to see what further debugging finds!

comment:83 Changed 5 years ago by jbrown

I am observing similar behavior with Bridged Networking on a Windows 7 64-bit Host, Windows Server 2003 32-bit Client. The host is a Core i7 with quad cores and hyperthreading. The client has 4 virtual processors.

After half an hour or so of heavy CPU & disk load (running a parallel build) but very light network load (some HTTP requests), I am no longer able to ping the client although it appears to be still running. The client no longer responds to pings or TCP. Moreover, I lose the ability to connect to the client via RDP. The client is running within a VBoxHeadless.

Interestingly, I have observed similar but more catastrophic failures when running the client interactively rather than headless. After a while all of a sudden the screen image becomes garbled and the client window resizes to a wide but short (like 1200x300) profile. The client becomes completely unresponsive at that stage and must be powered off forcibly.

Assuming these events are in any way related, then it seems plausible that VBox has a memory corruption problem that is exacerbated by heavy load. After reading the other comments, I'm inclined to believe that NAT has nothing to do with the main issue of unresponsiveness although it may have other problems. Instead it sounds like people are experiencing most trouble downloading large files (high network & high disk activity). My own case only has very light network activity so I'm inclined to wonder whether concurrent and persistent high disk activity triggers the problem...

I am going to try changing a few parameters to see if I can make things more reliable for now.

comment:84 Changed 5 years ago by gordonwatts

Hi Jbrown, I think the NAT slowdown that we've been talking about on this thread happens under all loads - high and low network load (mine certianly happens under low network activity). Further, I think most poeple have been complaining about outgoing connectivity rather than incoming connectivity. Do you see any problems with ougoing connectivity?

comment:85 Changed 5 years ago by firefly

I have the same problem with bridged networking as well. I am running OpenSuse 64 bit as the host. Guest ranging from Window XP to Linux 64 bit.

The problem tend to occur under heavy network load (for any of the guest). Once the network stop responding on any particular guest it will also stop working for all the other guest as well. I think this is related... but please let me know if I should open a new ticket.

comment:86 Changed 5 years ago by firefly

I forgot to add that closing out all the guest and Virtualbox then restart always fix the problem. The host doesn't need to be restarted.

comment:87 Changed 5 years ago by gordonwatts

This sounds like a different bug than the one in this thread (restarting does not help). You might open a new bug report and then see if it gets labeled as a duplicate.

comment:88 Changed 5 years ago by jbrown

I tried changing the virtual network adapter to Intel Pro/1000T Server (still Bridged) instead of PCNET-Fast III and switched the virtual hard dists from using the PIIX IDE port to a SATA (AHCI) port. I also disabled audio support (not needed for this VM anyways). The VM continued to "lock up" within half an hour with these changes.

Where "lock up" means that the VM was still consuming CPU and apparently doing work but was not responding to network access.

@gordonwatts I agree that my problem may well be different. I measured my bandwidth rate at approx. 50Mbit/s down, 8Mbit/s up on both the host and the client, at least after changing the virtual network adapter to the Intel. I did not test bandwidth with the PCNet.

I am now experimenting with reducing the number of virtual processors to 1 from 4 and the VM appears to be relatively stable at the moment.

When the VM "locks up", I often find messages in the client's Windows Event Log describing very unusual crashes such as access violations in various programs that happen to be running as part of a build. So just to be sure, I am also going to test my (host) RAM...

comment:89 Changed 5 years ago by jbrown

Actually, my issue looks just like this, random segfaults and all that when multiple virtual processors are enabled:

 http://forums.virtualbox.org/viewtopic.php?f=3&t=19530

comment:90 Changed 5 years ago by r_mano

jbrown: yes, this is yet another thing. The main comments here are about network freezes when using NAT, especially if the host is behind another NAT.

To some VB administrator: what about splitting this ticket in two or three or at least changing the name to "NAT network almost stops in VB 3"? Thanks.

comment:91 Changed 5 years ago by Hachiman

##############################################################################
VBox-3.0.6r1 was released
(please see  forum for details.)
##############################################################################
Could you please check if NAT freezes has been gone in your environment. Good practice is comparing bridged networking with NAT, but VRDP and VBox Shared folders do not use networking so please create or add your comments about VRDP and Shared Folders in corresponding defect's categories.

comment:92 Changed 5 years ago by jeffhoff

I have installed VBox-3.0.6R1...

One thing I noticed in the VBox GUI under the system settings for a Virtual machine. It used to say 1mb to whatever you had installed. Now it says 4mb to whatever you have installed. However, you can set it for less then the 4mb. My questions is this, if Vbox is relying on that number as a base for buffers for IO say disk, network, keyboard, mouse etc. is it possible that the Virtual machines wind up walking over each other and causing the crashes by buffer over or under runs??

Just a thought..

comment:93 Changed 5 years ago by gordonwatts

jeffhoff: have you been able to test if the NAT problem remains in 3.0.6?

You are talking about memory allocation, right? That should be for each individual machine. One VM should never be able to see the internal memory of another VM. That slider controls the allocation of the VM's personal core memory.

Unless I misunderstood what you are asking about. Also, I say this from general knowledge about how these things work, not from specific source code knowledge of VB.

comment:94 follow-up: ↓ 95 Changed 5 years ago by jeffhoff

Ok I set both VBoxes to NAT. Both Fedora and XP did not connect anywhere, could not ping or Surf. After trying to set the network address in Fedora, that Vbox crashed. Going back to Bridged.

On the memory issue, I have 6 gigs total.

Fedora says 4mg to 6GB but the setting is for 1.2mb. So is it using the 4mg or the 1.2? XP says 4mb to 6GB but it is set to 768kb. Ditto.

Question I have is where the allocation starts and how it gets allocated. If the math for the allocation has a typo and it results in buffer over/under runs nay be the cause of the random 48 hour lock ups and probably affects operations network, display, shared folders, mouse, keyboard, usb.

comment:95 in reply to: ↑ 94 Changed 5 years ago by Hachiman

Replying to jeffhoff:

Ok I set both VBoxes to NAT. Both Fedora and XP did not connect anywhere, could not ping or Surf. After trying to set the network address in Fedora, that Vbox crashed. Going back to Bridged.

Jeff, Have you been able got  crash dump? if yes I'll provide you instructions how to transit here via mail.

comment:96 Changed 5 years ago by seiryu

Back on Jul 2 I asked if this issue could also result in a painfully slow network. That aspect of the problem in my environment appears to be resolved under the 3.0.6beta. I have NAT enabled and my OpenSolaris guest running on OSX host is performing well.

comment:97 follow-up: ↓ 98 Changed 5 years ago by gordonwatts

If someone tells me where the beta's can be downloaded from I'm happy to see how things work for me.

comment:98 in reply to: ↑ 97 Changed 5 years ago by Hachiman

Replying to gordonwatts:

If someone tells me where the beta's can be downloaded from I'm happy to see how things work for me.

 3.0.6 Beta1 can be downloaded here

comment:99 follow-up: ↓ 102 Changed 5 years ago by jeffhoff

Searched for a core file and did not find one on either the Fedroa VB or the Open Solaris Host...

comment:100 Changed 5 years ago by gordonwatts

I'm not running behind a NAT - which is where things seemed to be most problematic under 3.0.4 (I won't be back in that configuration until Saturday evening, unfortunately). Commands involving networking always worked under this condition for me.

  • Seems to work just fine
  • Exercising it and putting the host and the guest under a fair amount of stress (i.e. two VM's running at once, heavy network load).
  • Speed measurements using NAT networking seem to indicate things are faster under 3.0.6B1 than 3.0.4 (I have timed commands which access the network, not the data throughput, so I can't give you numbers).

No crashes.

I'll post as soon as I get more info about working behind a NAT, but that may not be till after the release, depending on your schedule. :( Sorry.

comment:101 Changed 5 years ago by gordonwatts

Ops - by "Commands involving networking always worked under this condition for me." I meant that the VM's work well when the host is not behind a NAT. Too much wine for dinner...

comment:102 in reply to: ↑ 99 Changed 5 years ago by Hachiman

Replying to jeffhoff:

Searched for a core file and did not find one on either the Fedroa VB or the Open Solaris Host...

Jeff, I´ve updated  core dump wiki page. Please take a look, and please attach your log file of your crashing session.

comment:103 Changed 5 years ago by jeffhoff

Thanks. I have created the files and will put them in place in a bit.

Looks like Open Solaris has a new BE so I will update that first.

3.0.6b1 did crash sometime last night it was dead this am.

comment:104 Changed 5 years ago by gordonwatts

I am back home and behind the NAT. My config: Host: W7 RC. Guest: W7 RC. Host is behind a NAT (of an ADSL connection). When the host is not hidden behind a NAT, VB3.0.4 works fine. I tested the 3.0.6 B1 for this.

  • ping -t www.fnal.gov from the host always works 100% (no dropped packets). 3.0.4 I see maybe 1 packet in 100. In 3.0.6B1 I see 4 out of 5 packets returned. So this seems like a huge improvement, but the ping problem isn't quite fixed yet.
  • opening www.nytimes.com under 3.0.4 was a hit or miss operation, at best. Often I'd get nothing and eventually the page would time out. Outlook 2010 would refuse to connect to Exchange under 3.0.4, under 3.0.6B1 it seems to work.

So it would seem the NAT problem is fixed! I looked at the network status in the Resource Monitor for about 5 minutes and the % packet loss column never got above zero. However, I forgot to ever look at this on 3.0.4 to see if this was a symptom of this problem.

I'll continue to use this for a few days to test its stability. if I obvserve problems, I'll let you know. But thanks for spending the time and going after this!!

comment:105 Changed 5 years ago by Hachiman

  • Summary changed from VirtualBox 3.0 freezes under network load to VirtualBox 3.0 freezes under network load => Fixed in SVN

comment:106 Changed 5 years ago by frank

  • Status changed from reopened to closed
  • Resolution set to fixed

comment:107 in reply to: ↑ 10 Changed 5 years ago by john.doe

Replying to kilpatds:

Note quite sure if it's the same bug, but I've seem my Windows XP VM (running under Fedora 11) freeze during network traffic. Switching off all the "use fancy CPU feature" flags seems to avoid the problem.

Thanks for this pointer. After struggling with this for a while the problem was with the multiple CPU switch. I have a Core2 processor so I thought I'd benefit from setting the CPU to 2 but I received no such benefit. All my PAE, VT-x, etc options are checked because my machine supports them. Moreover I wanted to make one change at a time to find the problem instead of turning all the options off and then guessing what the issue might have been. So, once more - my problem (which showed up with increased network activity) was actually because I had set the number of CPUs to 2.

comment:108 Changed 5 years ago by jeffhoff

They may have this corrected. see 5014 and 4775. Seems DHCP in XP causes crashes.

comment:109 follow-up: ↓ 110 Changed 5 years ago by sc1993

  • Status changed from closed to reopened
  • Resolution fixed deleted

I am running an FC11 host with VirtualBox 3.0.6 hosting two FC10 guests (samba servers), configured in bridged network mode (Intel PRO/1000 MT Server). During heavy network load, the guests still hang suddenly. This almost always happen at night, when automatic backup runs (rsync). The only way to solve the problem is to power off the guest and restart it. I am convinced that the problem still exists.

comment:110 in reply to: ↑ 109 Changed 5 years ago by Hachiman

  • Status changed from reopened to closed
  • Resolution set to fixed

Replying to sc1993:

I am running an FC11 host with VirtualBox 3.0.6 hosting two FC10 guests (samba servers), configured in bridged network mode (Intel PRO/1000 MT Server). During heavy network load, the guests still hang suddenly. This almost always happen at night, when automatic backup runs (rsync). The only way to solve the problem is to power off the guest and restart it. I am convinced that the problem still exists.

This bug about NAT/networking.

comment:111 Changed 5 years ago by jeffhoff

See 5014, 3.0.7 beta available.

comment:112 Changed 5 years ago by r_mano

I humbly think that this is not fixed yet.

I have a win XP guest machine in linux ubuntu juanty host. Vanilla kernel. NAT networking, I am attached via ethernet to the university network. 32 bit host, 32 bit client, no fancy virtualization selected.

I just tried to upgrade now from 2.2.4 to 3.0.8 (because the 2.2.4 vboxdrv refused to compile in 2.6.31 kernel), and NAT networking is extremely slow. Outlook startup went from 20 seconds or so in 2.2.4 to 4 minutes with 3.0.8. Sometime outlook simply hangs.

I have not upgraded guest additions because otherwise downgrading VBox is a pain in the back.

In linux I am quite a power user, but I am quite at loss with windows; nevertheless, if you tell me how to help to debug this problem, I will try to help as much as I can.

I suggest re-opening of the ticket.

Thanks,

(*) this is a show-stopper for me. What about a tiny upgrade to 2.2 to make it compatible with newer kernels?

comment:113 Changed 5 years ago by bqbauer

It appears this is completely resolved under OpenSolaris hosts, but don't confuse this with the SMP issues (as someone already pointed out).

You can snapshot your guest, install the additions, then roll back the snapshot if you downgrade. Be careful, because VB snapshots are VERY confusing--do a test on another guest first. Perhaps you're experiencing an indirect problem by not having an optimal environment? Also, if by "no fancy virtualization selected" you mean you don't have VT-x enabled, it can't hurt to try, assuming you have the option. Hopefully someone with more knowledge of the additions can say whether not updating them will influence NAT.

comment:114 Changed 5 years ago by r_mano

I have to take partially back what I said. Now I have been using 3.0.8 during a few days, and most of the time the network seems to work. Outlook still locks sometime, especially while sending mail to our exchange server via a tunnel, but it may well be an outlook or tunnel problem.

Will try to upgrade the additions. Snapshotting could be a solution, but it's messy (it will roll back all the system). Do you know if the Addition are downward-compatible? I mean, if I have additions 3.0.8, can I run the VM with 2.2.4? That would solve everything.

Thanks

comment:115 Changed 4 years ago by mnlipp

  • Status changed from closed to reopened
  • Resolution fixed deleted

The problem is not solved by 3.0.8. I have VirtualBox running on Ubuntu 9.04 with Ubuntu 9.04 as guest (additions installed). When I try to copy 18G to the guest, the guest becomes unreachable sooner or later (I achieved between 1G to 12G in various tests). The guest is no longer reachable from the network and uses about 100% CPU. I tried both PC Net Fast III and the PRO/1000 adapters, no difference.

The log file shows nothing between successful startup and beginning of poweroff.

comment:116 Changed 4 years ago by frank

mnlipp, so you are sure that the copy operation includes only the network card, that is the guest fetches the file from the network and stores it on a local drive, correct? And no guest additions are involved, so you don't store the file on a shared folder, correct? Which is the command you are using to copy the file (exact command line please)?

comment:117 Changed 4 years ago by mnlipp

It's the good old unix over the network directory copy:

source> tar czf - . | (ssh me@guest 'cd my/target/dir; tar xzf -')

The command is issued on yet another Ubuntu 9.04 PC ("source", no VirtualBox involved here) and targets the guest ("guest").

As a workaround, I used the exact same command to copy the files to the host. No problem there, so the host's network interface works and "tar" obviously works with my data. (I finally got the data on the guest then by shutting down the guest and mounting the guest's file system on the host and copying the data.)

If you want me to test something else, tell me. - Michael

comment:118 Changed 4 years ago by frank

Ok, but still the question: my/target/dir at the guest was not a shared folder, was it?

comment:119 Changed 4 years ago by mnlipp

Oh, that' what you meant. No, the target folder is just plain disk storage on the guest (it is a partition of the host's storage provided as rawdisk). Definitely not a shared folder.

comment:120 follow-up: ↓ 121 Changed 4 years ago by jirka_

I seem to have the same problem. I am running Ubuntu 9.04 guest on a Windows XP SP3 host with NAT networking.

After uploading a few MB of data from the guest to a physically separate server PC via ftp, the transfer becomes extremely slow, eventually hangs completely and the target computer is no longer reachable. The target server is only reachable again after restart of the guest. The network transfer definitely does not involve shared folders.

Yesterday I downgraded back to VirtualBox 2.2.4 and the problem vanished.

comment:121 in reply to: ↑ 120 Changed 4 years ago by Hachiman

Replying to jirka_, mnlipp

I seem to have the same problem. I am running Ubuntu 9.04 guest on a Windows XP SP3 host with NAT networking.

Could you please attach the logs?

comment:122 follow-up: ↓ 123 Changed 4 years ago by jaapd

this is still also happening with bridged network and in 3.1.0 sn rsync or TSM backup will freeze the system.

comment:123 in reply to: ↑ 122 Changed 4 years ago by Hachiman

  • Status changed from reopened to closed
  • Resolution set to fixed

Replying to jaapd:

this is still also happening with bridged network and in 3.1.0 sn rsync or TSM backup will freeze the system.

This ticket is about NAT, but not bridged networking please look at  bridged tickets.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use