VirtualBox

Ticket #5260 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

page allocation failure ... vboxNetFltLinuxPacketHandler

Reported by: oetiker Owned by:
Priority: major Component: network
Version: VirtualBox 3.0.8 Keywords:
Cc: tobi@… Guest type: Windows
Host type: Linux

Description

I am running virtualbox 3.0.8 on linux 2.6.31.4 with bridged ethernet interfaces. I am seeing several "page allocation failure" warnings from the kernel every day. In the Call Trace the vboxNetFltLinuxPacketHandler is showing up all the time. There is some talk about "page allocation failures" on the kernel mailinglist presently, but the bug proofes to be rather elusive. I wonder if my instance could be somehow related to the vboxnetflt driver?

I have put up a few of my traces on  http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt

Attachments

vboxNetFltLinuxPacketHandler.crash Download (122.4 KB) - added by bijwaard 4 years ago.
multiple page allocation failures with vboxNetFltLinuxPacketHandler
vbox-3.1.4.dmesg Download (122.3 KB) - added by bijwaard 4 years ago.
recent dmesg output of multiple page allocation errors in ubuntu 9.10 with vbox-3.1.4
config-2.6.31-20-generic Download (108.8 KB) - added by bijwaard 4 years ago.
Ubuntu 9.10 kernel configuration
config-2.6.32.8-vboxhost Download (95.7 KB) - added by oetiker 4 years ago.
the kernel config as requested
diff_vboxnetflt_linux Download (2.2 KB) - added by frank 4 years ago.
Patch for /usr/src/vboxnetflt-/

Change History

comment:1 Changed 4 years ago by oetiker

we just found the same failures on ubunty jaunty

comment:2 Changed 4 years ago by bijwaard

I'm running virtualBox 3.1.2 on Linux and experience the same sort of page allocation problems reported from other programs (both our homebrewn ASF.exe, and normal programs like firefox, thunderbird, vino-server) with with vboxNetFltLinuxPacketHandler in the call trace. However, it is with another process (vino.server in my case) and I've quite some TCP load between Linux host and ms-windows guest, and some load from a perl script within ms-windows guest. Next I'll attach my trace.

My ms-windows VM is configured with 384 MB internal memory and without VT-x and nested paging on (but greyed out). According to top, VirtualBox claims 750m virtual (output from top is also in the trace to give an idea of the load in my system).

When a such a page allocation occurs, sometimes multiple in a row, a Linux window (probably one of the reported programs above) freezes for >10 seconds.

Changed 4 years ago by bijwaard

multiple page allocation failures with vboxNetFltLinuxPacketHandler

comment:3 Changed 4 years ago by bijwaard

By the way, I'm running ubuntu 9.04. If any additional info is needed, please let me know.

comment:4 Changed 4 years ago by bijwaard

I have updated to ubuntu 9.10, and still experience the same problem, it even became worse. Ubuntu is sometimes almost unusable, also currently I get these page allocation errors every minute or so and multiple applications freeze at each of these.

comment:5 Changed 4 years ago by frank

Which VBox version are you currently using?

comment:6 Changed 4 years ago by bijwaard

I am currently using version virtualbox 3.1.4. To get a working system again, I just disabled networking and removed the vbox modules vboxnetflt and vboxnetadp. My systems is now much more responsive and I've not yet seen the page allocation error for a few hours, but still experience application freezes.

Changed 4 years ago by bijwaard

recent dmesg output of multiple page allocation errors in ubuntu 9.10 with vbox-3.1.4

comment:7 Changed 4 years ago by oetiker

we see this problem on one of our systems, it is a 'Dual Core AMD Opteron(tm) Processor 265' with 16 GB Ram. (The cpu has no virtualization capability) ... our errors are 'order:5' ... we are running 2.6.32.8 with vbox 3.1.4.

I have seen mention of other linux network drivers causing similar problems on the LKML it seems to be triggerd by some changes in the way the kernel allocates memory ...

comment:8 Changed 4 years ago by frank

Please could you append the configuration of your host kernel to this ticket?

Changed 4 years ago by bijwaard

Ubuntu 9.10 kernel configuration

Changed 4 years ago by oetiker

the kernel config as requested

comment:9 Changed 4 years ago by bijwaard

Some additional info about my system: I'm running a Dual Intel(R) Core(TM)2 Duo CPU E8400 @3GHz with 2GB internal memory and 6GB swap. I've started the vbox network modules again today after upgrading to vbox-3.1.6, until now it is not misbehaving.

comment:10 Changed 4 years ago by bijwaard

It took some time, but the page allocation failures returned yesterday and are getting more frequent. The load on my system has not been very high since I re-started the vbox network modules again for my virtual machine 8 days ago.

So, it looks like this bug is triggered after multiple days of running the vbox network modules and a virtual machine.

comment:11 Changed 4 years ago by oetiker

I have upgraded to 2.6.33.3 ... a day after reboot, the page allocation failures (with vbox 3.1.6) are back ... always order:5 ...

comment:12 Changed 4 years ago by oetiker

I found a workaround ... it seems the problem only ocures in connection with gso enable on a tg3 networkcard ... with

 ethtool -K eth0 gso off

the problem goes away. On debian/ubuntu I put the following into /etc/network/if-up.d:

#!/bin/sh
ETHTOOL=/usr/sbin/ethtool
if [ ! -f $ETHTOOL ]; then
    exit 0
fi

# vbox creates pagefaults when tg3 generic segmentation offloading os on
if $ETHTOOL -i "${IFACE}" |grep -q tg3; then
   echo "turn off gso on $IFACE"
   $ETHTOOL -K "${IFACE}" gso off
fi

comment:13 Changed 4 years ago by oetiker

looking through logs I found the same problem on another box with e1000e driver ...

comment:14 Changed 4 years ago by frank

Ticket #6622 has been marked as duplicate of this ticket.

comment:15 Changed 4 years ago by oleitner

thanks for linking me here...

i am still testing with the gso trick...

i have another card though:

driver: 3c59x version: firmware-version: bus-info: 0000:04:00.0

if the gso trick fixes this one too, ill let you all know.

Changed 4 years ago by frank

Patch for /usr/src/vboxnetflt-/

comment:16 Changed 4 years ago by frank

I've just attached a patch for a possible memory leak if GSO is enabled. This fix is for a rarely used error path so it doesn't seem that it will fix your problems but you could try anway. Do the following as root on your host (first make sure that no VM is running):

cd /usr/src/vboxnetflt
patch -p0 < ~/diff_vboxnetflt_linux
/etc/init.d/vboxdrv setup

Would be interesting to know if this makes any difference for you.

comment:17 Changed 4 years ago by dbharris

Hey, looks like #5675 is also solved by this (the symptoms appear quite similar, and the ethtool trick worked).

Thanks,

Dave

comment:18 Changed 4 years ago by dbharris

Note that I don't use tg3. I use forcedeth.

comment:19 Changed 4 years ago by frank

Any chance to try the attached patch?

comment:20 Changed 4 years ago by oleitner

hello frank

the gso off seems to have fixed my problem, if it stays without a mem/cpu hog for nother 5 days, ill try to use your patch, and see, if this fixes it for ever...

greetings Oliver

comment:21 Changed 4 years ago by frank

Thanks Oliver. Please make sure to enable GSO again when you test the patch.

comment:22 Changed 4 years ago by oleitner

sad news, the patch did not fix the problem.

but since theres a new virtualbox version out today, and a few days ago a new linux kernel got packaged for ubuntu lucid lynx, im giving that combination a try, maybe the bug fixed "itself" somewhere inbetween.

however, i have to agree that gso off fixes the thing.

comment:23 Changed 4 years ago by Linux777

comment:24 Changed 3 years ago by rvp_lan

Hi, I'm very interested in this thread, especially for the ethtool gso trick.

But... I'm a little bit confused here. Does the trick applies to the host OS or the guest OS?

The host linux server on which the vboxNetFlt allocmem error occurs is under OpenSuSe 11.2 x86_64 kernel 2.6.31-12. Vbox 3.2.10-109.3.x86_64 running in headless mode. Guest system is Ubuntu 9.10 x86_64 kernel 2.6.31-20. The host system doesn't have much real memory: 2GB, backuped with a 6GB swap space.

The host network card is a bonding of two Broadcom BCM5780 Gigabit. The guest network card is bridged on this bonding with the virtual Intel 1000e driver.

The host OS doesn't do much, but as a backup server, it has an intensive activity at night when rsync script starts. Mostly, this is at this time that the vboxNetFlt allocmem error occurs. Rsynced files are on an XFS filesystem laying over a 3Ware RAID6 volume.

The guest OS, as opposite, has a continuous intensive network activity: access to shared files, file sharing itself and master network service for computing dispatcher.

By the way, I noticed that when the guest OS starts, it does something (wrong?) to the bonding which one of the network card goes into promiscuous mode. As if the virtual card tries absolutely to bind to a physical card.

I first thought that the vboxNetFlt allocmem error occurs because of the particular case of bridged over bonding, but I have another system (pretty much identical) with same network config which doesn't cause any allocmem error.

Thanks for any clue.

comment:25 Changed 3 years ago by aleksey

Can anybody try 4.0.2 and confirm the problem still appears?

comment:26 Changed 3 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

No response, closing.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use