VirtualBox

Ticket #11171 (closed defect: fixed)

Opened 18 months ago

Last modified 8 weeks ago

repeating 'page allocation failure' error in debian squeeze 64bit => Fixed in SVN

Reported by: gaio Owned by:
Priority: major Component: other
Version: VirtualBox 4.2.4 Keywords: page allocation failure debian squeeze
Cc: Guest type: all
Host type: Linux

Description

I've upgraded a set of servers (all are different HP ProLiant ML350 ones, G6 or G7) from debian lenny (using debian official VirtualBox) to debian squeeze (using virtualbox 4.2 provided from repository).

On all servers i got a series (20-30 per day) of 'page allocation errors': the system seems to work fine, guest OS run and there's no other error, but i'm pretty sure that the trouble came from VirtualBox, because:

  • other similar server, upgraded to squeeze but without Virtualbox have no single error like that
  • if i stop the guest and unload the module, errors desappear at all.

Some more sparse info:

  • one of these server, same hardware, have VirtualBox 4.1 (4.1.22-80657~Debian~squeeze) and here the 'page allocation failure' happen for some hour/day after a reboot, then desappear.
  • as sayed, all the server run kernel 2.6.32-5-amd64
  • i've seen ticket #5260, i've a marvel eth adapter, but doing 'ethtool -K eth0 gso off' does not solve the trouble.
  • error are surely related to system/network load: happens only during workhour when the server works.

I'm currently using VirtualBox 4.2 (4.2.4-81684~Debian~squeeze), with only a guest machine (winXP), but as sayed i think that does not matter at all. kern.log attached.

Thanks.

Attachments

kern.log.bz2 Download (46.2 KB) - added by gaio 18 months ago.
Kernel log
VBox.log.3 Download (83.6 KB) - added by entilza 15 months ago.
Log of guest that has potential to trace page allocation
l.txt Download (28.6 KB) - added by gaio 15 months ago.
Syslog exerpt of an error
VBox.log Download (79.1 KB) - added by gaio 15 months ago.
Companion VBox.log
diff_gfp_nowarn Download (3.4 KB) - added by frank 3 months ago.
Diff against VBox 4.3.6 host kernel drivers

Change History

Changed 18 months ago by gaio

Kernel log

comment:1 Changed 18 months ago by frank

So these page allocation failures come from the host, right? This could be a memory leak of the VM(s) on the host so monitoring the memory consumption of the host processes over a longer time might give a hint.

comment:2 Changed 18 months ago by gaio

Ahem, sorry, from the host, sure. I've also forgot to say:

  • servers are not short in RAM, for example:
    neuromante:~# free
                 total       used       free     shared    buffers     cached
    Mem:       6118264    6019844      98420          0       7024    3610848
    -/+ buffers/cache:    2401972    3716292
    Swap:      8000328     431544    7568784
    

And im using SYSSTAT to monitor RAM consumption and seeems normal.

comment:3 Changed 17 months ago by gaio

I've reverted to 4.1.22-80657~Debian~squeeze as in other server, but trouble remains,

But a google search lead me to:

 http://www.linuxsmiths.com/blog/?p=527  http://lime-technology.com/forum/index.php?topic=23222.0

I've had /proc/sys/vm/min_free_kbytes to 9951, now i've set to 16384 and i'll try to increase it until (i hope) this trouble desappear...

comment:4 Changed 17 months ago by gaio

OK, seems that really i've to expand the RAM of my host machine.

If if increment too much /proc/sys/vm/min_free_kbytes, this trouble desappear but the system tend to trash (load sky high...).

comment:5 Changed 15 months ago by entilza

I've had the same issue:

I am running Ubuntu 10.04 -64bit, with virtualbox 4.2.4.

I am using a bonded e1000e adapter on the Server with bridged networking on VM guests.

While using Virtualbox I am getting page allocation errors:

Each entry has [vboxnetflt] as part of the cause.

The server has 16GB memory, and available memory is fine. The guest OS is just using 256MB of ram not much.

The system seems to work fine regardless of these messages. I did see one time a remote desktop terminated the connection when this happened. rsync transfers work fine even though they cause this error sometime.

I increased min_free_kbytes to 512MB!! and still get these errors

I've seen the following processes in the log:

VBoxHeadless: page allocation failure. order:4, mode:0x4020
smbd: page allocation failure. order:4, mode:0x4020
kworker/0:1: page allocation failure. order:4, mode:0x4020

Log snipets:

<IRQ>  [<ffffffff810fe6ac>] __alloc_pages_nodemask+0x6bc/0x830
 vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt]
 vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt]
 vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt]
 vboxNetFltLinuxStartXmitFilter+0x100/0x230 [vboxnetflt]


note: Jan 18, 2013: I am using intel e1000e driver 1.5.1 - I will test updating this tonight to see if that's part of the problem.

Last edited 15 months ago by entilza (previous) (diff)

comment:6 Changed 15 months ago by entilza

Update: Jan 19, 2013:

  • I updated e1000e drivers to 2.1.4
  • Updated VirtualBox to 4.2.6

SAME Problem. Able to reproduce with high network activity, ie: FTPing a 20gig file across host to backup server.

I added 1 GB (1000000) to /proc/sys/vm/min_free_kbytes to see if this helps.

Notes:

I am wondering if this is an issue with bonding ethernet and virtualbox. I have been using round robin mode (mode 0). Yesterday I tried balanced-alb (Mode 6) and had massive packet loss, I had to revert back to round robin. Mode: 0

From what I see when you have a bonded connection it changes the mode to the controllers to "promiscuous". When in round robin it does both (eth0, eth1) When I used balanced-alb it just did it to one as if virtualbox only connected to one ethernet controller and that explained the packet loss.

I am also noticing "Unkown Protocols" statistics being reported by the Windows guests with "netstat -e".

My next test will be to move the guests to another computer and reduce min_free_kbytes and see if I can reproduce any more allocation errors from the host with no virtualbox running.

Afterwards I will test with just 1 ethernet and no bonding.

Last edited 15 months ago by entilza (previous) (diff)

comment:7 Changed 15 months ago by entilza

Update: Jan 20, 2013:

I created a VM guest on our backup machine (different architecture)

The only thing I was curious here was the results of "netstat -e" and the "Unknown Protocols"

It also appeared here, I then checked a physical Windows XP computer on our network and it also contained the same Uknown Protocols. So it appears not to be related. These Uknown Protocols seem to be common to Windows 2000 and Windows XP. Atleast on our network.

I got sucked into this by some strange /var/log/messages errors saying:

vboxnetflt: dropped 296483 out of 34021461 packets

When I shutdown the guest - Not sure how these develop yet...

My next test will be to move the VM from the production server to the guest, reduce min_free_kbytes then see the results.

However for now I will wait a couple days and see if my 1 GIG > /proc/sys/vm/min_free_kbytes solves the production server alloc errors.

Last edited 15 months ago by entilza (previous) (diff)

comment:8 Changed 15 months ago by gaio

After my bug report, i've tried different VirtualBOX versions (4.1 and 4.2) on different hardware (anyway, same OS: debian squeeze), and played with different value of '/proc/sys/vm/min_free_kbytes'.

Seems to me that:

  1. the page allocation failure error appear more frequently when the host are under pressure (high load)
  2. playing with min_free_kbytes seems to help, but rise too much min_free_kbytes lead to more frequent load spikes (probably the scheduler have to use more swap, and so load rise)

So i've setup by trial and error an optimal min_free_kbytes value (for every server) and now the error happen rarely.

But indeed i've not not resolved it, nor i've understood what really happens.

Last edited 15 months ago by gaio (previous) (diff)

comment:9 Changed 15 months ago by frank

I would like to see a VBox.log file of an active VM session when this page allocation error happens.

The 'vboxnetflt' message is only informal and does not show any error condition.

Changed 15 months ago by entilza

Log of guest that has potential to trace page allocation

Changed 15 months ago by gaio

Syslog exerpt of an error

Changed 15 months ago by gaio

Companion VBox.log

comment:10 Changed 15 months ago by entilza

The VBox.log.3 attachment begins Jan-10,2013.

So far the increase to min_free_kbytes is holding but it's still too early to say at least on my end.

All the applications on our server use very low memory, the guests use the highest memory by far.

Since I have 16GB on the server adding 1GB to min_free_kbytes doesn't cause an issue, just I am surprised I need to do this as I now have 1 Gig of unused memory :)

             total       used       free     shared    buffers     cached
Mem:      16329376   14914340    1415036          0    4743608    4951692
-/+ buffers/cache:    5219040   11110336
Swap:      9764860      33500    9731360

comment:11 Changed 15 months ago by entilza

Gaio, I looked at your log, I assume you are not using network bonding because it says eth0. So it may be safe to assume this has nothing to do with network bonding.

comment:12 Changed 15 months ago by gaio

I can confirm, i don't use bonding, on every installation i use.

comment:13 Changed 15 months ago by entilza

Jan 22, 2013:

Update: Had (4) full days with 0 allocation errors.

Again, increasing min_free_kbytes only change. However I made it extremely large (1Gb)

  • Is this something with our Hardware, or Kernel version? My Kernel is non standard for 10.04 but I believe Gaio's is a standard kernel.

This is very reproducable by reducing min_free_kbytes does this not give any hints as to why this occurs?

Last edited 15 months ago by entilza (previous) (diff)

comment:14 Changed 15 months ago by entilza

If anyone has experienced this error please post your kernel version:

Currently the 2 kernels in our example are:

2.6.39.3 (Ubuntu - This is a non standard kernel I used when I setup the server)

2.6.32-5 (Debian)

Last edited 15 months ago by entilza (previous) (diff)

comment:15 Changed 15 months ago by entilza

Ok after about a week I got 3 new page allocation errors however a bit different as the driver for the network [e1000e] is also included with [vboxnetflt]

All 3 occurred early in the morning where there's almost no system load.

This could be totally unrelated to the min_free_kbytes fix though since its now showing my network driver, for this I am looking into that 'gso' setting for the e1000e to see if this particular issue is related to that.

/var/log/messages (cut out extra lines)

kworker/0:0: page allocation failure. order:4, mode:0x4020
Pid: 0, comm: kworker/0:0 Not tainted 2.6.39.3-artcraft2 #1
Call Trace:
<IRQ>  [<ffffffff810fe6ac>] __alloc_pages_nodemask+0x6bc/0x830
.
.
[<ffffffffa00a8614>] vboxNetFltLinuxPacketHandler+0xb4/0x610 [vboxnetflt]
.
.
[<ffffffffa015f34e>] e1000_receive_skb+0x5e/0x80 [e1000e]
[<ffffffffa0162659>] e1000_clean_rx_irq+0x289/0x460 [e1000e]
[<ffffffffa0169c3d>] e1000e_poll+0xbd/0x380 [e1000e]
<EOI>  [<ffffffff8133123f>] ? acpi_idle_enter_bm+0x269/0x2a1
[<ffffffff81331238>] ? acpi_idle_enter_bm+0x262/0x2a1
[<ffffffff814804d0>] cpuidle_idle_call+0xc0/0x240
[<ffffffff8100b077>] cpu_idle+0xb7/0x110
[<ffffffff8159b73d>] start_secondary+0x1dc/0x1e3

Update: With the e1000e and GSO off I set min_free_kbytes back to a normal low value and I began to receive page allocation errors again. The same as originally reported containing [vboxnetflt] only.

So I increased min_free_kbytes again...

Last edited 15 months ago by entilza (previous) (diff)

comment:16 Changed 14 months ago by entilza

Just an update after a month,

I have had 0 page allocations after continuing with the initial fix of increasing min_free_kbytes.

I set it to 768MB (Down from 1 GB)

So the suggestion to tune min_free_kbytes for your system is essentially the trick right now until the developers can determine why these allocs are failing when min_free_kbytes is set it's regular default lower value.

PS. I noticed both examples here have the E3-12xx series Xeon processors.. Any relevance?

Last edited 14 months ago by entilza (previous) (diff)

comment:17 Changed 3 months ago by Ruprecht

Hello, thanks for all comments, they were usefull for me.
It looks like disabling of nestedpaging solved the 'page allocation errors' on my virtualbox.
guest OS: 3xWinXP and 2x Debian

AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
MemTotal:        3094112 kB

kernel: 2.6.32-5-amd64
OS: Debian squeeze
Oracle VM VirtualBox Manager 4.2.6

Example of typical configuration of WinXP guest:
Memory size:     512MB
Page Fusion:     off
VRAM size:       32MB
CPU exec cap:    100%
HPET:            off
Chipset:         piix3
Firmware:        BIOS
Number of CPUs:  1
Synthetic Cpu:   off
CPUID overrides: None
... boot options ...
ACPI:            on
IOAPIC:          on
PAE:             on
Time offset:     0ms
RTC:             UTC
Hardw. virt.ext: off
Hardw. virt.ext exclusive: off
Nested Paging:   off
Large Pages:     on
VT-x VPID:       on
State:           running (since 2014-01-13T11:56:29.126000000)
Monitor count:   1
3D Acceleration: off
2D Video Acceleration: on
...

Maybe this post could help someone. ;)

Now i will try change piix3 chipset (I choosed this one as the most stable option, but there was the nestedpaging option enabled...), and search why AMD-V is still not working, even It is enabled in bios.

btw: I never changed the min_free_kbytes option...

Last edited 3 months ago by Ruprecht (previous) (diff)

comment:18 follow-up: ↓ 22 Changed 3 months ago by frank

Disabling nested paging makes no sense. The reason for the page allocation warnings of the Linux kernel in most cases is the usage of large pages -- see user manual section 8.8.1, VBoxManage modifyvm VM_NAME --largepages. If this option is enabled AND nested paging is used, VBox tries to allocate chunks of 512 contiguous pages which are then used as large pages to improve the performance (faster page table lookup). If a set of 512 contiguous pages cannot be allocated (i.e. memory too fragmented), VBox falls back to allocate 512 non-contiguous pages. This condition is not fatal, therefore a warning in the kernel does not make sense here. I will attach a patch against VBox 4.3.6 which disables these warnings.

If you don't want to apply this patch to the VBox kernel driver, disabling 'large pages' for your VM should be sufficient. Disabling nested paging prevents the warning as well but degrades your guest performance much more.

If you get such kernel warnings not for allocating large pages then it means that your host is really low in memory.

Changed 3 months ago by frank

Diff against VBox 4.3.6 host kernel drivers

comment:19 Changed 3 months ago by frank

  • Summary changed from repeating 'page allocation failure' error in debian squeeze 64bit to repeating 'page allocation failure' error in debian squeeze 64bit => Fixed in SVN

comment:20 Changed 3 months ago by Harri

Maybe I am too blind to see, but I haven't found a checkbox in the GUI to turn off largepages. Do you think a checkbox could be added to Settings --> System --> Acceleration ?

comment:21 Changed 3 months ago by frank

You are right, there is no checkbox in the GUI. Please use 'VBoxManage modifyvm VM_NAME --largepages off' to disable large pages.

comment:22 in reply to: ↑ 18 Changed 3 months ago by Ruprecht

Replying to frank:

Disabling nested paging makes no sense. ... ... ...

Thanks a lot for your explanation! I will try it, disable the largepages and enable the nestedpaging today. But, it is interesting for me that the disabling only of nestedpaging solved my problem of crashes of my guests...

comment:23 Changed 3 months ago by frank

I did not see any report about guest crashes in this ticket. If you guest crashes and only disabling nested paging prevents these crashes then this is a different problem.

comment:24 Changed 8 weeks ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fix is part of VBox 4.3.8.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use