VirtualBox

Opened 14 years ago

Closed 13 years ago

#5260 closed defect (fixed)

page allocation failure ... vboxNetFltLinuxPacketHandler

Reported by: Tobias Oetiker Owned by:
Component: network Version: VirtualBox 3.0.8
Keywords: Cc: tobi@…
Guest type: Windows Host type: Linux

Description

I am running virtualbox 3.0.8 on linux 2.6.31.4 with bridged ethernet interfaces. I am seeing several "page allocation failure" warnings from the kernel every day. In the Call Trace the vboxNetFltLinuxPacketHandler is showing up all the time. There is some talk about "page allocation failures" on the kernel mailinglist presently, but the bug proofes to be rather elusive. I wonder if my instance could be somehow related to the vboxnetflt driver?

I have put up a few of my traces on http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt

Attachments (5)

vboxNetFltLinuxPacketHandler.crash (122.4 KB ) - added by bijwaard 14 years ago.
multiple page allocation failures with vboxNetFltLinuxPacketHandler
vbox-3.1.4.dmesg (122.3 KB ) - added by bijwaard 14 years ago.
recent dmesg output of multiple page allocation errors in ubuntu 9.10 with vbox-3.1.4
config-2.6.31-20-generic (108.8 KB ) - added by bijwaard 14 years ago.
Ubuntu 9.10 kernel configuration
config-2.6.32.8-vboxhost (95.7 KB ) - added by Tobias Oetiker 14 years ago.
the kernel config as requested
diff_vboxnetflt_linux (2.2 KB ) - added by Frank Mehnert 14 years ago.
Patch for /usr/src/vboxnetflt-/

Download all attachments as: .zip

Change History (31)

comment:1 by Tobias Oetiker, 14 years ago

we just found the same failures on ubunty jaunty

comment:2 by bijwaard, 14 years ago

I'm running virtualBox 3.1.2 on Linux and experience the same sort of page allocation problems reported from other programs (both our homebrewn ASF.exe, and normal programs like firefox, thunderbird, vino-server) with with vboxNetFltLinuxPacketHandler in the call trace. However, it is with another process (vino.server in my case) and I've quite some TCP load between Linux host and ms-windows guest, and some load from a perl script within ms-windows guest. Next I'll attach my trace.

My ms-windows VM is configured with 384 MB internal memory and without VT-x and nested paging on (but greyed out). According to top, VirtualBox claims 750m virtual (output from top is also in the trace to give an idea of the load in my system).

When a such a page allocation occurs, sometimes multiple in a row, a Linux window (probably one of the reported programs above) freezes for >10 seconds.

by bijwaard, 14 years ago

multiple page allocation failures with vboxNetFltLinuxPacketHandler

comment:3 by bijwaard, 14 years ago

By the way, I'm running ubuntu 9.04. If any additional info is needed, please let me know.

comment:4 by bijwaard, 14 years ago

I have updated to ubuntu 9.10, and still experience the same problem, it even became worse. Ubuntu is sometimes almost unusable, also currently I get these page allocation errors every minute or so and multiple applications freeze at each of these.

comment:5 by Frank Mehnert, 14 years ago

Which VBox version are you currently using?

comment:6 by bijwaard, 14 years ago

I am currently using version virtualbox 3.1.4. To get a working system again, I just disabled networking and removed the vbox modules vboxnetflt and vboxnetadp. My systems is now much more responsive and I've not yet seen the page allocation error for a few hours, but still experience application freezes.

by bijwaard, 14 years ago

Attachment: vbox-3.1.4.dmesg added

recent dmesg output of multiple page allocation errors in ubuntu 9.10 with vbox-3.1.4

comment:7 by Tobias Oetiker, 14 years ago

we see this problem on one of our systems, it is a 'Dual Core AMD Opteron(tm) Processor 265' with 16 GB Ram. (The cpu has no virtualization capability) ... our errors are 'order:5' ... we are running 2.6.32.8 with vbox 3.1.4.

I have seen mention of other linux network drivers causing similar problems on the LKML it seems to be triggerd by some changes in the way the kernel allocates memory ...

comment:8 by Frank Mehnert, 14 years ago

Please could you append the configuration of your host kernel to this ticket?

by bijwaard, 14 years ago

Attachment: config-2.6.31-20-generic added

Ubuntu 9.10 kernel configuration

by Tobias Oetiker, 14 years ago

Attachment: config-2.6.32.8-vboxhost added

the kernel config as requested

comment:9 by bijwaard, 14 years ago

Some additional info about my system: I'm running a Dual Intel(R) Core(TM)2 Duo CPU E8400 @3GHz with 2GB internal memory and 6GB swap. I've started the vbox network modules again today after upgrading to vbox-3.1.6, until now it is not misbehaving.

comment:10 by bijwaard, 14 years ago

It took some time, but the page allocation failures returned yesterday and are getting more frequent. The load on my system has not been very high since I re-started the vbox network modules again for my virtual machine 8 days ago.

So, it looks like this bug is triggered after multiple days of running the vbox network modules and a virtual machine.

comment:11 by Tobias Oetiker, 14 years ago

I have upgraded to 2.6.33.3 ... a day after reboot, the page allocation failures (with vbox 3.1.6) are back ... always order:5 ...

comment:12 by Tobias Oetiker, 14 years ago

I found a workaround ... it seems the problem only ocures in connection with gso enable on a tg3 networkcard ... with

 ethtool -K eth0 gso off

the problem goes away. On debian/ubuntu I put the following into /etc/network/if-up.d:

#!/bin/sh
ETHTOOL=/usr/sbin/ethtool
if [ ! -f $ETHTOOL ]; then
    exit 0
fi

# vbox creates pagefaults when tg3 generic segmentation offloading os on
if $ETHTOOL -i "${IFACE}" |grep -q tg3; then
   echo "turn off gso on $IFACE"
   $ETHTOOL -K "${IFACE}" gso off
fi

comment:13 by Tobias Oetiker, 14 years ago

looking through logs I found the same problem on another box with e1000e driver ...

comment:14 by Frank Mehnert, 14 years ago

Ticket #6622 has been marked as duplicate of this ticket.

comment:15 by Oliver Leitner, 14 years ago

thanks for linking me here...

i am still testing with the gso trick...

i have another card though:

driver: 3c59x version: firmware-version: bus-info: 0000:04:00.0

if the gso trick fixes this one too, ill let you all know.

by Frank Mehnert, 14 years ago

Attachment: diff_vboxnetflt_linux added

Patch for /usr/src/vboxnetflt-/

comment:16 by Frank Mehnert, 14 years ago

I've just attached a patch for a possible memory leak if GSO is enabled. This fix is for a rarely used error path so it doesn't seem that it will fix your problems but you could try anway. Do the following as root on your host (first make sure that no VM is running):

cd /usr/src/vboxnetflt
patch -p0 < ~/diff_vboxnetflt_linux
/etc/init.d/vboxdrv setup

Would be interesting to know if this makes any difference for you.

comment:17 by David Harris, 14 years ago

Hey, looks like #5675 is also solved by this (the symptoms appear quite similar, and the ethtool trick worked).

Thanks,

Dave

comment:18 by David Harris, 14 years ago

Note that I don't use tg3. I use forcedeth.

comment:19 by Frank Mehnert, 14 years ago

Any chance to try the attached patch?

comment:20 by Oliver Leitner, 14 years ago

hello frank

the gso off seems to have fixed my problem, if it stays without a mem/cpu hog for nother 5 days, ill try to use your patch, and see, if this fixes it for ever...

greetings Oliver

comment:21 by Frank Mehnert, 14 years ago

Thanks Oliver. Please make sure to enable GSO again when you test the patch.

comment:22 by Oliver Leitner, 14 years ago

sad news, the patch did not fix the problem.

but since theres a new virtualbox version out today, and a few days ago a new linux kernel got packaged for ubuntu lucid lynx, im giving that combination a try, maybe the bug fixed "itself" somewhere inbetween.

however, i have to agree that gso off fixes the thing.

comment:24 by Hervé Pellan, 13 years ago

Hi, I'm very interested in this thread, especially for the ethtool gso trick.

But... I'm a little bit confused here. Does the trick applies to the host OS or the guest OS?

The host linux server on which the vboxNetFlt allocmem error occurs is under OpenSuSe 11.2 x86_64 kernel 2.6.31-12. Vbox 3.2.10-109.3.x86_64 running in headless mode. Guest system is Ubuntu 9.10 x86_64 kernel 2.6.31-20. The host system doesn't have much real memory: 2GB, backuped with a 6GB swap space.

The host network card is a bonding of two Broadcom BCM5780 Gigabit. The guest network card is bridged on this bonding with the virtual Intel 1000e driver.

The host OS doesn't do much, but as a backup server, it has an intensive activity at night when rsync script starts. Mostly, this is at this time that the vboxNetFlt allocmem error occurs. Rsynced files are on an XFS filesystem laying over a 3Ware RAID6 volume.

The guest OS, as opposite, has a continuous intensive network activity: access to shared files, file sharing itself and master network service for computing dispatcher.

By the way, I noticed that when the guest OS starts, it does something (wrong?) to the bonding which one of the network card goes into promiscuous mode. As if the virtual card tries absolutely to bind to a physical card.

I first thought that the vboxNetFlt allocmem error occurs because of the particular case of bridged over bonding, but I have another system (pretty much identical) with same network config which doesn't cause any allocmem error.

Thanks for any clue.

comment:25 by Aleksey Ilyushin, 13 years ago

Can anybody try 4.0.2 and confirm the problem still appears?

comment:26 by Frank Mehnert, 13 years ago

Resolution: fixed
Status: newclosed

No response, closing.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use