Context Navigation

← Previous Ticket
Next Ticket →

#11171 closed defect (fixed)

repeating 'page allocation failure' error in debian squeeze 64bit => Fixed in SVN

Reported by:	Marco Gaiarin	Owned by:
Component:	other	Version:	VirtualBox 4.2.4
Keywords:	page allocation failure debian squeeze	Cc:
Guest type:	all	Host type:	Linux

Description

I've upgraded a set of servers (all are different HP ProLiant ML350 ones, G6 or G7) from debian lenny (using debian official VirtualBox) to debian squeeze (using virtualbox 4.2 provided from repository).

On all servers i got a series (20-30 per day) of 'page allocation errors': the system seems to work fine, guest OS run and there's no other error, but i'm pretty sure that the trouble came from VirtualBox, because:

other similar server, upgraded to squeeze but without Virtualbox have no single error like that
if i stop the guest and unload the module, errors desappear at all.

Some more sparse info:

one of these server, same hardware, have VirtualBox 4.1 (4.1.22-80657~Debian~squeeze) and here the 'page allocation failure' happen for some hour/day after a reboot, then desappear.
as sayed, all the server run kernel 2.6.32-5-amd64
i've seen ticket #5260, i've a marvel eth adapter, but doing 'ethtool -K eth0 gso off' does not solve the trouble.
error are surely related to system/network load: happens only during workhour when the server works.

I'm currently using VirtualBox 4.2 (4.2.4-81684~Debian~squeeze), with only a guest machine (winXP), but as sayed i think that does not matter at all. kern.log attached.

Thanks.

Attachments (7)

kern.log.bz2 (46.2 KB ) - added by Marco Gaiarin 12 years ago.: Kernel log
VBox.log.3 (83.6 KB ) - added by entilza 12 years ago.: Log of guest that has potential to trace page allocation
l.txt (28.6 KB ) - added by Marco Gaiarin 12 years ago.: Syslog exerpt of an error
VBox.log (79.1 KB ) - added by Marco Gaiarin 12 years ago.: Companion VBox.log
diff_gfp_nowarn (3.4 KB ) - added by Frank Mehnert 11 years ago.: Diff against VBox 4.3.6 host kernel drivers
vbox-page_alloc-err.txt (21.7 KB ) - added by 64bitten 10 years ago.: VirtualBox page allocation errors
diff_gfp_nowarn_2 (1.9 KB ) - added by Frank Mehnert 10 years ago.

Download all attachments as: .zip

Change History (34)

by Marco Gaiarin, 12 years ago

Attachment:	kern.log.bz2 added

Kernel log

comment:1 by Frank Mehnert, 12 years ago

So these page allocation failures come from the host, right? This could be a memory leak of the VM(s) on the host so monitoring the memory consumption of the host processes over a longer time might give a hint.

comment:2 by Marco Gaiarin, 12 years ago

Ahem, sorry, from the host, sure. I've also forgot to say:

servers are not short in RAM, for example:

neuromante:~# free
             total       used       free     shared    buffers     cached
Mem:       6118264    6019844      98420          0       7024    3610848
-/+ buffers/cache:    2401972    3716292
Swap:      8000328     431544    7568784

And im using SYSSTAT to monitor RAM consumption and seeems normal.

comment:3 by Marco Gaiarin, 12 years ago

I've reverted to 4.1.22-80657~Debian~squeeze as in other server, but trouble remains,

But a google search lead me to:

http://www.linuxsmiths.com/blog/?p=527 http://lime-technology.com/forum/index.php?topic=23222.0

I've had /proc/sys/vm/min_free_kbytes to 9951, now i've set to 16384 and i'll try to increase it until (i hope) this trouble desappear...

comment:4 by Marco Gaiarin, 12 years ago

OK, seems that really i've to expand the RAM of my host machine.

If if increment too much /proc/sys/vm/min_free_kbytes, this trouble desappear but the system tend to trash (load sky high...).

comment:5 by entilza, 12 years ago

I've had the same issue:

I am running Ubuntu 10.04 -64bit, with virtualbox 4.2.4.

I am using a bonded e1000e adapter on the Server with bridged networking on VM guests.

While using Virtualbox I am getting page allocation errors:

Each entry has [vboxnetflt] as part of the cause.

The server has 16GB memory, and available memory is fine. The guest OS is just using 256MB of ram not much.

The system seems to work fine regardless of these messages. I did see one time a remote desktop terminated the connection when this happened. rsync transfers work fine even though they cause this error sometime.

I increased min_free_kbytes to 512MB!! and still get these errors

I've seen the following processes in the log:

VBoxHeadless: page allocation failure. order:4, mode:0x4020
smbd: page allocation failure. order:4, mode:0x4020
kworker/0:1: page allocation failure. order:4, mode:0x4020

Log snipets:

<IRQ>  [<ffffffff810fe6ac>] __alloc_pages_nodemask+0x6bc/0x830
 vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt]
 vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt]
 vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt]
 vboxNetFltLinuxStartXmitFilter+0x100/0x230 [vboxnetflt]

note: Jan 18, 2013: I am using intel e1000e driver 1.5.1 - I will test updating this tonight to see if that's part of the problem.

Last edited 12 years ago by entilza (previous) (diff)

comment:6 by entilza, 12 years ago

Update: Jan 19, 2013:

I updated e1000e drivers to 2.1.4
Updated VirtualBox to 4.2.6

SAME Problem. Able to reproduce with high network activity, ie: FTPing a 20gig file across host to backup server.

I added 1 GB (1000000) to /proc/sys/vm/min_free_kbytes to see if this helps.

Notes:

I am wondering if this is an issue with bonding ethernet and virtualbox. I have been using round robin mode (mode 0). Yesterday I tried balanced-alb (Mode 6) and had massive packet loss, I had to revert back to round robin. Mode: 0

From what I see when you have a bonded connection it changes the mode to the controllers to "promiscuous". When in round robin it does both (eth0, eth1) When I used balanced-alb it just did it to one as if virtualbox only connected to one ethernet controller and that explained the packet loss.

I am also noticing "Unkown Protocols" statistics being reported by the Windows guests with "netstat -e".

My next test will be to move the guests to another computer and reduce min_free_kbytes and see if I can reproduce any more allocation errors from the host with no virtualbox running.

Afterwards I will test with just 1 ethernet and no bonding.

Last edited 12 years ago by entilza (previous) (diff)

comment:7 by entilza, 12 years ago

Update: Jan 20, 2013:

I created a VM guest on our backup machine (different architecture)

The only thing I was curious here was the results of "netstat -e" and the "Unknown Protocols"

It also appeared here, I then checked a physical Windows XP computer on our network and it also contained the same Uknown Protocols. So it appears not to be related. These Uknown Protocols seem to be common to Windows 2000 and Windows XP. Atleast on our network.

I got sucked into this by some strange /var/log/messages errors saying:

vboxnetflt: dropped 296483 out of 34021461 packets

When I shutdown the guest - Not sure how these develop yet...

My next test will be to move the VM from the production server to the guest, reduce min_free_kbytes then see the results.

However for now I will wait a couple days and see if my 1 GIG > /proc/sys/vm/min_free_kbytes solves the production server alloc errors.

Last edited 12 years ago by entilza (previous) (diff)

comment:8 by Marco Gaiarin, 12 years ago

After my bug report, i've tried different VirtualBOX versions (4.1 and 4.2) on different hardware (anyway, same OS: debian squeeze), and played with different value of '/proc/sys/vm/min_free_kbytes'.

Seems to me that:

the page allocation failure error appear more frequently when the host are under pressure (high load)
playing with min_free_kbytes seems to help, but rise too much min_free_kbytes lead to more frequent load spikes (probably the scheduler have to use more swap, and so load rise)

So i've setup by trial and error an optimal min_free_kbytes value (for every server) and now the error happen rarely.

But indeed i've not not resolved it, nor i've understood what really happens.

Last edited 12 years ago by Marco Gaiarin (previous) (diff)

comment:9 by Frank Mehnert, 12 years ago

I would like to see a VBox.log file of an active VM session when this page allocation error happens.

The 'vboxnetflt' message is only informal and does not show any error condition.

by entilza, 12 years ago

Attachment:	VBox.log.3 added

Log of guest that has potential to trace page allocation

by Marco Gaiarin, 12 years ago

Attachment:	l.txt added

Syslog exerpt of an error

by Marco Gaiarin, 12 years ago

Attachment:	VBox.log added

Companion VBox.log

comment:10 by entilza, 12 years ago

The VBox.log.3 attachment begins Jan-10,2013.

So far the increase to min_free_kbytes is holding but it's still too early to say at least on my end.

All the applications on our server use very low memory, the guests use the highest memory by far.

Since I have 16GB on the server adding 1GB to min_free_kbytes doesn't cause an issue, just I am surprised I need to do this as I now have 1 Gig of unused memory :)

             total       used       free     shared    buffers     cached
Mem:      16329376   14914340    1415036          0    4743608    4951692
-/+ buffers/cache:    5219040   11110336
Swap:      9764860      33500    9731360

comment:11 by entilza, 12 years ago

Gaio, I looked at your log, I assume you are not using network bonding because it says eth0. So it may be safe to assume this has nothing to do with network bonding.

comment:12 by Marco Gaiarin, 12 years ago

I can confirm, i don't use bonding, on every installation i use.

comment:13 by entilza, 12 years ago

Jan 22, 2013:

Update: Had (4) full days with 0 allocation errors.

Again, increasing min_free_kbytes only change. However I made it extremely large (1Gb)

Is this something with our Hardware, or Kernel version? My Kernel is non standard for 10.04 but I believe Gaio's is a standard kernel.

This is very reproducable by reducing min_free_kbytes does this not give any hints as to why this occurs?

Last edited 12 years ago by entilza (previous) (diff)

comment:14 by entilza, 12 years ago

If anyone has experienced this error please post your kernel version:

Currently the 2 kernels in our example are:

2.6.39.3 (Ubuntu - This is a non standard kernel I used when I setup the server)

2.6.32-5 (Debian)

Last edited 12 years ago by entilza (previous) (diff)

comment:15 by entilza, 11 years ago

Ok after about a week I got 3 new page allocation errors however a bit different as the driver for the network [e1000e] is also included with [vboxnetflt]

All 3 occurred early in the morning where there's almost no system load.

This could be totally unrelated to the min_free_kbytes fix though since its now showing my network driver, for this I am looking into that 'gso' setting for the e1000e to see if this particular issue is related to that.

/var/log/messages (cut out extra lines)

kworker/0:0: page allocation failure. order:4, mode:0x4020
Pid: 0, comm: kworker/0:0 Not tainted 2.6.39.3-artcraft2 #1
Call Trace:
<IRQ>  [<ffffffff810fe6ac>] __alloc_pages_nodemask+0x6bc/0x830
.
.
[<ffffffffa00a8614>] vboxNetFltLinuxPacketHandler+0xb4/0x610 [vboxnetflt]
.
.
[<ffffffffa015f34e>] e1000_receive_skb+0x5e/0x80 [e1000e]
[<ffffffffa0162659>] e1000_clean_rx_irq+0x289/0x460 [e1000e]
[<ffffffffa0169c3d>] e1000e_poll+0xbd/0x380 [e1000e]
<EOI>  [<ffffffff8133123f>] ? acpi_idle_enter_bm+0x269/0x2a1
[<ffffffff81331238>] ? acpi_idle_enter_bm+0x262/0x2a1
[<ffffffff814804d0>] cpuidle_idle_call+0xc0/0x240
[<ffffffff8100b077>] cpu_idle+0xb7/0x110
[<ffffffff8159b73d>] start_secondary+0x1dc/0x1e3

Update: With the e1000e and GSO off I set min_free_kbytes back to a normal low value and I began to receive page allocation errors again. The same as originally reported containing [vboxnetflt] only.

So I increased min_free_kbytes again...

Last edited 11 years ago by entilza (previous) (diff)

comment:16 by entilza, 11 years ago

Just an update after a month,

I have had 0 page allocations after continuing with the initial fix of increasing min_free_kbytes.

I set it to 768MB (Down from 1 GB)

So the suggestion to tune min_free_kbytes for your system is essentially the trick right now until the developers can determine why these allocs are failing when min_free_kbytes is set it's regular default lower value.

PS. I noticed both examples here have the E3-12xx series Xeon processors.. Any relevance?

Last edited 11 years ago by entilza (previous) (diff)

comment:17 by Jan Klamta, 11 years ago

Hello, thanks for all comments, they were usefull for me.
It looks like disabling of nestedpaging solved the 'page allocation errors' on my virtualbox.
guest OS: 3xWinXP and 2x Debian

AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
MemTotal:        3094112 kB

kernel: 2.6.32-5-amd64
OS: Debian squeeze
Oracle VM VirtualBox Manager 4.2.6

Example of typical configuration of WinXP guest:
Memory size:     512MB
Page Fusion:     off
VRAM size:       32MB
CPU exec cap:    100%
HPET:            off
Chipset:         piix3
Firmware:        BIOS
Number of CPUs:  1
Synthetic Cpu:   off
CPUID overrides: None
... boot options ...
ACPI:            on
IOAPIC:          on
PAE:             on
Time offset:     0ms
RTC:             UTC
Hardw. virt.ext: off
Hardw. virt.ext exclusive: off
Nested Paging:   off
Large Pages:     on
VT-x VPID:       on
State:           running (since 2014-01-13T11:56:29.126000000)
Monitor count:   1
3D Acceleration: off
2D Video Acceleration: on
...

Maybe this post could help someone. ;)

Now i will try change piix3 chipset (I choosed this one as the most stable option, but there was the nestedpaging option enabled...), and search why AMD-V is still not working, even It is enabled in bios.

btw: I never changed the min_free_kbytes option...

Last edited 11 years ago by Jan Klamta (previous) (diff)

follow-ups: 22 27 comment:18 by Frank Mehnert, 11 years ago

Disabling nested paging makes no sense. The reason for the page allocation warnings of the Linux kernel in most cases is the usage of large pages -- see user manual section 8.8.1, VBoxManage modifyvm VM_NAME --largepages. If this option is enabled AND nested paging is used, VBox tries to allocate chunks of 512 contiguous pages which are then used as large pages to improve the performance (faster page table lookup). If a set of 512 contiguous pages cannot be allocated (i.e. memory too fragmented), VBox falls back to allocate 512 non-contiguous pages. This condition is not fatal, therefore a warning in the kernel does not make sense here. I will attach a patch against VBox 4.3.6 which disables these warnings.

If you don't want to apply this patch to the VBox kernel driver, disabling 'large pages' for your VM should be sufficient. Disabling nested paging prevents the warning as well but degrades your guest performance much more.

If you get such kernel warnings not for allocating large pages then it means that your host is really low in memory.

by Frank Mehnert, 11 years ago

Attachment:	diff_gfp_nowarn added

Diff against VBox 4.3.6 host kernel drivers

comment:19 by Frank Mehnert, 11 years ago

Summary:	repeating 'page allocation failure' error in debian squeeze 64bit → repeating 'page allocation failure' error in debian squeeze 64bit => Fixed in SVN

comment:20 by Harri, 11 years ago

Maybe I am too blind to see, but I haven't found a checkbox in the GUI to turn off largepages. Do you think a checkbox could be added to Settings --> System --> Acceleration ?

comment:21 by Frank Mehnert, 11 years ago

You are right, there is no checkbox in the GUI. Please use 'VBoxManage modifyvm VM_NAME --largepages off' to disable large pages.

in reply to: 18 comment:22 by Jan Klamta, 11 years ago

Replying to frank:

Disabling nested paging makes no sense. ... ... ...

Thanks a lot for your explanation! I will try it, disable the largepages and enable the nestedpaging today. But, it is interesting for me that the disabling only of nestedpaging solved my problem of crashes of my guests...

comment:23 by Frank Mehnert, 11 years ago

I did not see any report about guest crashes in this ticket. If you guest crashes and only disabling nested paging prevents these crashes then this is a different problem.

comment:24 by Frank Mehnert, 10 years ago

Resolution:	→ fixed
Status:	new → closed

Fix is part of VBox 4.3.8.

by 64bitten, 10 years ago

Attachment:	vbox-page_alloc-err.txt added

VirtualBox page allocation errors

comment:25 by 64bitten, 10 years ago

Must have missed a __GFP_NOWARN somewhere as I still see page allocation warnings (attached as vbox-page_alloc-err.txt). Running Virtualbox 4.3.14 on Linux kernel 3.10.17.

by Frank Mehnert, 10 years ago

Attachment:	diff_gfp_nowarn_2 added

comment:26 by Frank Mehnert, 10 years ago

64bitten, thanks for the report. Could you apply the diff_gfp_nowarn_2 patch I just attached and check if these warnings are now finally gone? Thank you!

in reply to: 18 comment:27 by AnnaJane, 3 years ago

Replying to:

Disabling nested paging makes no sense. The reason for the page allocation warnings of the Linux kernel in most cases is the usage of large pages -- see user manual section 8.8.1, VBoxManage modifyvm VM_NAME --largepages. If this option is enabled AND nested paging is used, VBox tries to allocate chunks of 512 contiguous pages which are then used as large pages to improve the performance (faster page table lookup). If a set of 512 contiguous dubai hot call girls pages cannot be allocated (i.e. memory too fragmented), VBox falls back to allocate 512 non-contiguous pages. This condition is not fatal, therefore a warning in the kernel does not make sense here. I will attach a patch against VBox 4.3.6 which disables these warnings.

If you don't want to apply this patch to the VBox kernel driver, disabling 'large pages' for your VM should be sufficient. Disabling nested paging prevents the warning as well but degrades your guest performance much more.

If you get such kernel warnings not for allocating large pages then it means that your host is really low in memory.

I looked at your log, I assume you are not using network bonding because it says eth0. So it may be safe to assume this has nothing to do with network bonding.

Note: See TracTickets for help on using tickets.

Download in other formats: