#11171 closed defect (fixed)
repeating 'page allocation failure' error in debian squeeze 64bit => Fixed in SVN
Reported by: | Marco Gaiarin | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 4.2.4 |
Keywords: | page allocation failure debian squeeze | Cc: | |
Guest type: | all | Host type: | Linux |
Description
I've upgraded a set of servers (all are different HP ProLiant ML350 ones, G6 or G7) from debian lenny (using debian official VirtualBox) to debian squeeze (using virtualbox 4.2 provided from repository).
On all servers i got a series (20-30 per day) of 'page allocation errors': the system seems to work fine, guest OS run and there's no other error, but i'm pretty sure that the trouble came from VirtualBox, because:
- other similar server, upgraded to squeeze but without Virtualbox have no single error like that
- if i stop the guest and unload the module, errors desappear at all.
Some more sparse info:
- one of these server, same hardware, have VirtualBox 4.1 (4.1.22-80657~Debian~squeeze) and here the 'page allocation failure' happen for some hour/day after a reboot, then desappear.
- as sayed, all the server run kernel 2.6.32-5-amd64
- i've seen ticket #5260, i've a marvel eth adapter, but doing 'ethtool -K eth0 gso off' does not solve the trouble.
- error are surely related to system/network load: happens only during workhour when the server works.
I'm currently using VirtualBox 4.2 (4.2.4-81684~Debian~squeeze), with only a guest machine (winXP), but as sayed i think that does not matter at all. kern.log attached.
Thanks.
Attachments (7)
Change History (34)
by , 12 years ago
Attachment: | kern.log.bz2 added |
---|
comment:1 by , 12 years ago
So these page allocation failures come from the host, right? This could be a memory leak of the VM(s) on the host so monitoring the memory consumption of the host processes over a longer time might give a hint.
comment:2 by , 12 years ago
Ahem, sorry, from the host, sure. I've also forgot to say:
- servers are not short in RAM, for example:
neuromante:~# free total used free shared buffers cached Mem: 6118264 6019844 98420 0 7024 3610848 -/+ buffers/cache: 2401972 3716292 Swap: 8000328 431544 7568784
And im using SYSSTAT to monitor RAM consumption and seeems normal.
comment:3 by , 12 years ago
I've reverted to 4.1.22-80657~Debian~squeeze as in other server, but trouble remains,
But a google search lead me to:
http://www.linuxsmiths.com/blog/?p=527 http://lime-technology.com/forum/index.php?topic=23222.0
I've had /proc/sys/vm/min_free_kbytes to 9951, now i've set to 16384 and i'll try to increase it until (i hope) this trouble desappear...
comment:4 by , 12 years ago
OK, seems that really i've to expand the RAM of my host machine.
If if increment too much /proc/sys/vm/min_free_kbytes, this trouble desappear but the system tend to trash (load sky high...).
comment:5 by , 12 years ago
I've had the same issue:
I am running Ubuntu 10.04 -64bit, with virtualbox 4.2.4.
I am using a bonded e1000e adapter on the Server with bridged networking on VM guests.
While using Virtualbox I am getting page allocation errors:
Each entry has [vboxnetflt] as part of the cause.
The server has 16GB memory, and available memory is fine. The guest OS is just using 256MB of ram not much.
The system seems to work fine regardless of these messages. I did see one time a remote desktop terminated the connection when this happened. rsync transfers work fine even though they cause this error sometime.
I increased min_free_kbytes to 512MB!! and still get these errors
I've seen the following processes in the log:
VBoxHeadless: page allocation failure. order:4, mode:0x4020 smbd: page allocation failure. order:4, mode:0x4020 kworker/0:1: page allocation failure. order:4, mode:0x4020
Log snipets:
<IRQ> [<ffffffff810fe6ac>] __alloc_pages_nodemask+0x6bc/0x830 vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt] vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt] vboxNetFltLinuxPacketHandler+0xa8/0x610 [vboxnetflt] vboxNetFltLinuxStartXmitFilter+0x100/0x230 [vboxnetflt]
note: Jan 18, 2013: I am using intel e1000e driver 1.5.1 - I will test updating this tonight to see if that's part of the problem.
comment:6 by , 12 years ago
Update: Jan 19, 2013:
- I updated e1000e drivers to 2.1.4
- Updated VirtualBox to 4.2.6
SAME Problem. Able to reproduce with high network activity, ie: FTPing a 20gig file across host to backup server.
I added 1 GB (1000000) to /proc/sys/vm/min_free_kbytes to see if this helps.
Notes:
I am wondering if this is an issue with bonding ethernet and virtualbox. I have been using round robin mode (mode 0). Yesterday I tried balanced-alb (Mode 6) and had massive packet loss, I had to revert back to round robin. Mode: 0
From what I see when you have a bonded connection it changes the mode to the controllers to "promiscuous". When in round robin it does both (eth0, eth1) When I used balanced-alb it just did it to one as if virtualbox only connected to one ethernet controller and that explained the packet loss.
I am also noticing "Unkown Protocols" statistics being reported by the Windows guests with "netstat -e".
My next test will be to move the guests to another computer and reduce min_free_kbytes and see if I can reproduce any more allocation errors from the host with no virtualbox running.
Afterwards I will test with just 1 ethernet and no bonding.
comment:7 by , 12 years ago
Update: Jan 20, 2013:
I created a VM guest on our backup machine (different architecture)
The only thing I was curious here was the results of "netstat -e" and the "Unknown Protocols"
It also appeared here, I then checked a physical Windows XP computer on our network and it also contained the same Uknown Protocols. So it appears not to be related. These Uknown Protocols seem to be common to Windows 2000 and Windows XP. Atleast on our network.
I got sucked into this by some strange /var/log/messages errors saying:
vboxnetflt: dropped 296483 out of 34021461 packets
When I shutdown the guest - Not sure how these develop yet...
My next test will be to move the VM from the production server to the guest, reduce min_free_kbytes then see the results.
However for now I will wait a couple days and see if my 1 GIG > /proc/sys/vm/min_free_kbytes solves the production server alloc errors.
comment:8 by , 12 years ago
After my bug report, i've tried different VirtualBOX versions (4.1 and 4.2) on different hardware (anyway, same OS: debian squeeze), and played with different value of '/proc/sys/vm/min_free_kbytes'.
Seems to me that:
- the page allocation failure error appear more frequently when the host are under pressure (high load)
- playing with min_free_kbytes seems to help, but rise too much min_free_kbytes lead to more frequent load spikes (probably the scheduler have to use more swap, and so load rise)
So i've setup by trial and error an optimal min_free_kbytes value (for every server) and now the error happen rarely.
But indeed i've not not resolved it, nor i've understood what really happens.
comment:9 by , 12 years ago
I would like to see a VBox.log file of an active VM session when this page allocation error happens.
The 'vboxnetflt' message is only informal and does not show any error condition.
by , 12 years ago
Attachment: | VBox.log.3 added |
---|
Log of guest that has potential to trace page allocation
comment:10 by , 12 years ago
The VBox.log.3 attachment begins Jan-10,2013.
So far the increase to min_free_kbytes is holding but it's still too early to say at least on my end.
All the applications on our server use very low memory, the guests use the highest memory by far.
Since I have 16GB on the server adding 1GB to min_free_kbytes doesn't cause an issue, just I am surprised I need to do this as I now have 1 Gig of unused memory :)
total used free shared buffers cached Mem: 16329376 14914340 1415036 0 4743608 4951692 -/+ buffers/cache: 5219040 11110336 Swap: 9764860 33500 9731360
comment:11 by , 12 years ago
Gaio, I looked at your log, I assume you are not using network bonding because it says eth0. So it may be safe to assume this has nothing to do with network bonding.
comment:13 by , 12 years ago
Jan 22, 2013:
Update: Had (4) full days with 0 allocation errors.
Again, increasing min_free_kbytes only change. However I made it extremely large (1Gb)
- Is this something with our Hardware, or Kernel version? My Kernel is non standard for 10.04 but I believe Gaio's is a standard kernel.
This is very reproducable by reducing min_free_kbytes does this not give any hints as to why this occurs?
comment:14 by , 12 years ago
If anyone has experienced this error please post your kernel version:
Currently the 2 kernels in our example are:
2.6.39.3 (Ubuntu - This is a non standard kernel I used when I setup the server)
2.6.32-5 (Debian)
comment:15 by , 12 years ago
Ok after about a week I got 3 new page allocation errors however a bit different as the driver for the network [e1000e] is also included with [vboxnetflt]
All 3 occurred early in the morning where there's almost no system load.
This could be totally unrelated to the min_free_kbytes fix though since its now showing my network driver, for this I am looking into that 'gso' setting for the e1000e to see if this particular issue is related to that.
/var/log/messages (cut out extra lines) kworker/0:0: page allocation failure. order:4, mode:0x4020 Pid: 0, comm: kworker/0:0 Not tainted 2.6.39.3-artcraft2 #1 Call Trace: <IRQ> [<ffffffff810fe6ac>] __alloc_pages_nodemask+0x6bc/0x830 . . [<ffffffffa00a8614>] vboxNetFltLinuxPacketHandler+0xb4/0x610 [vboxnetflt] . . [<ffffffffa015f34e>] e1000_receive_skb+0x5e/0x80 [e1000e] [<ffffffffa0162659>] e1000_clean_rx_irq+0x289/0x460 [e1000e] [<ffffffffa0169c3d>] e1000e_poll+0xbd/0x380 [e1000e] <EOI> [<ffffffff8133123f>] ? acpi_idle_enter_bm+0x269/0x2a1 [<ffffffff81331238>] ? acpi_idle_enter_bm+0x262/0x2a1 [<ffffffff814804d0>] cpuidle_idle_call+0xc0/0x240 [<ffffffff8100b077>] cpu_idle+0xb7/0x110 [<ffffffff8159b73d>] start_secondary+0x1dc/0x1e3
Update: With the e1000e and GSO off I set min_free_kbytes back to a normal low value and I began to receive page allocation errors again. The same as originally reported containing [vboxnetflt] only.
So I increased min_free_kbytes again...
comment:16 by , 12 years ago
Just an update after a month,
I have had 0 page allocations after continuing with the initial fix of increasing min_free_kbytes.
I set it to 768MB (Down from 1 GB)
So the suggestion to tune min_free_kbytes for your system is essentially the trick right now until the developers can determine why these allocs are failing when min_free_kbytes is set it's regular default lower value.
PS. I noticed both examples here have the E3-12xx series Xeon processors.. Any relevance?
comment:17 by , 11 years ago
Hello,
thanks for all comments, they were usefull for me.
It looks like disabling of nestedpaging solved the 'page allocation errors' on my virtualbox.
guest OS: 3xWinXP and 2x Debian
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ MemTotal: 3094112 kB kernel: 2.6.32-5-amd64 OS: Debian squeeze Oracle VM VirtualBox Manager 4.2.6 Example of typical configuration of WinXP guest: Memory size: 512MB Page Fusion: off VRAM size: 32MB CPU exec cap: 100% HPET: off Chipset: piix3 Firmware: BIOS Number of CPUs: 1 Synthetic Cpu: off CPUID overrides: None ... boot options ... ACPI: on IOAPIC: on PAE: on Time offset: 0ms RTC: UTC Hardw. virt.ext: off Hardw. virt.ext exclusive: off Nested Paging: off Large Pages: on VT-x VPID: on State: running (since 2014-01-13T11:56:29.126000000) Monitor count: 1 3D Acceleration: off 2D Video Acceleration: on ...
Maybe this post could help someone. ;)
Now i will try change piix3 chipset (I choosed this one as the most stable option, but there was the nestedpaging option enabled...), and search why AMD-V is still not working, even It is enabled in bios.
btw: I never changed the min_free_kbytes option...
follow-ups: 22 27 comment:18 by , 11 years ago
Disabling nested paging makes no sense. The reason for the page allocation warnings of the Linux kernel in most cases is the usage of large pages -- see user manual section 8.8.1, VBoxManage modifyvm VM_NAME --largepages. If this option is enabled AND nested paging is used, VBox tries to allocate chunks of 512 contiguous pages which are then used as large pages to improve the performance (faster page table lookup). If a set of 512 contiguous pages cannot be allocated (i.e. memory too fragmented), VBox falls back to allocate 512 non-contiguous pages. This condition is not fatal, therefore a warning in the kernel does not make sense here. I will attach a patch against VBox 4.3.6 which disables these warnings.
If you don't want to apply this patch to the VBox kernel driver, disabling 'large pages' for your VM should be sufficient. Disabling nested paging prevents the warning as well but degrades your guest performance much more.
If you get such kernel warnings not for allocating large pages then it means that your host is really low in memory.
comment:19 by , 11 years ago
Summary: | repeating 'page allocation failure' error in debian squeeze 64bit → repeating 'page allocation failure' error in debian squeeze 64bit => Fixed in SVN |
---|
comment:20 by , 11 years ago
Maybe I am too blind to see, but I haven't found a checkbox in the GUI to turn off largepages. Do you think a checkbox could be added to Settings --> System --> Acceleration ?
comment:21 by , 11 years ago
You are right, there is no checkbox in the GUI. Please use 'VBoxManage modifyvm VM_NAME --largepages off' to disable large pages.
comment:22 by , 11 years ago
Replying to frank:
Disabling nested paging makes no sense. ... ... ...
Thanks a lot for your explanation! I will try it, disable the largepages and enable the nestedpaging today. But, it is interesting for me that the disabling only of nestedpaging solved my problem of crashes of my guests...
comment:23 by , 11 years ago
I did not see any report about guest crashes in this ticket. If you guest crashes and only disabling nested paging prevents these crashes then this is a different problem.
comment:25 by , 10 years ago
Must have missed a __GFP_NOWARN
somewhere as I still see page allocation warnings (attached as vbox-page_alloc-err.txt). Running Virtualbox 4.3.14 on Linux kernel 3.10.17.
by , 10 years ago
Attachment: | diff_gfp_nowarn_2 added |
---|
comment:26 by , 10 years ago
64bitten, thanks for the report. Could you apply the diff_gfp_nowarn_2 patch I just attached and check if these warnings are now finally gone? Thank you!
comment:27 by , 3 years ago
Replying to:
Disabling nested paging makes no sense. The reason for the page allocation warnings of the Linux kernel in most cases is the usage of large pages -- see user manual section 8.8.1, VBoxManage modifyvm VM_NAME --largepages. If this option is enabled AND nested paging is used, VBox tries to allocate chunks of 512 contiguous pages which are then used as large pages to improve the performance (faster page table lookup). If a set of 512 contiguous dubai hot call girls pages cannot be allocated (i.e. memory too fragmented), VBox falls back to allocate 512 non-contiguous pages. This condition is not fatal, therefore a warning in the kernel does not make sense here. I will attach a patch against VBox 4.3.6 which disables these warnings.
If you don't want to apply this patch to the VBox kernel driver, disabling 'large pages' for your VM should be sufficient. Disabling nested paging prevents the warning as well but degrades your guest performance much more.
If you get such kernel warnings not for allocating large pages then it means that your host is really low in memory.
I looked at your log, I assume you are not using network bonding because it says eth0. So it may be safe to assume this has nothing to do with network bonding.
Kernel log