VirtualBox

Ticket #2370 (closed defect: fixed)

Opened 6 years ago

Last modified 5 years ago

Memory leak with Solaris Host for 2.0.2 => Fixed in 2.0.6

Reported by: djones666 Owned by:
Priority: critical Component: other
Version: VirtualBox 2.0.4 Keywords:
Cc: Guest type: Solaris
Host type: Solaris

Description (last modified by frank) (diff)

HostOS: SunOS jcc-one 5.10 Generic_137112-07 i86pc i386 i86pc
GuestOS: OpenSolaris 200805

I start off with the long term stable memory usage of

jcc-one> memcheck
Total memory = 3068 MB
 Used memory = 1652 MB
 Free memory = 1416 MB
jcc-one>

Then I fire up VirtualBox and after a couple of minutes it settles to look like this

jcc-one> memcheck
Total memory = 3068 MB
 Used memory = 2460 MB
 Free memory = 608 MB
jcc-one>

which is what I would expect given the 1024MB memory size configured in VirtualBox.

Let the system sit idle for about 6 hours and we get

jcc-one> memcheck
Total memory = 3068 MB
 Used memory = 2796 MB
 Free memory = 272 MB
jcc-one>

and if I let it continue on for another 6 we hit the death of swap city.

FYI - memcheck is this simple beastie that I wrote long ago to track database thrash on a large web site:

jcc-one> cat memcheck.c
#include <stdio.h>
#include <unistd.h>
void main ()
{
        long total;
        long free;
        long page_size;

        /* convert pages to kilo-pages */
        total = sysconf(_SC_PHYS_PAGES) / 1024;
        free = sysconf(_SC_AVPHYS_PAGES) / 1024;

        /* convert page size to KB */
        page_size = sysconf(_SC_PAGESIZE) / 1024;

        total *= page_size;
        free *= page_size;

        printf ("Total memory = %ld MB\n", total);
        printf (" Used memory = %ld MB\n", total - free);
        printf (" Free memory = %ld MB\n", free);

        exit(0);
}

Change History

comment:1 Changed 6 years ago by frank

  • Description modified (diff)

comment:2 Changed 6 years ago by troydm

I've had the same problem with memory but my vbox drives where located on ZFS partition so ZFS was using memory for cache of the virtual box drive that was running... I've solved this setting arc cache limit on ZFS. google for ZFS Evil Tune Guide

comment:3 Changed 6 years ago by djones666

I don't think that is the problem since I have no zfs file systems on the host box. Running memcheck in the VirtualBox shows that the memory usage is rock stable inside the virtual machine, so it seems that the leak is in the vitual box itself. If I get time, I may try to instrument the beast an see if I can track down where the memory is going. Note that the host OS is Solaris 10 u5 with all current patches.

comment:4 Changed 6 years ago by djones666

Almost forgot - thank you Frank for formatting my rather rushed original report. You made it not only readable, but beautiful. {*grin*}

comment:5 Changed 6 years ago by kjard_us

I have this same problem. THough it was zfs too but it is not. I dumped my zfs pools and this still happens. It is an odd bit too... seems like a single machine running can go about 24 hours before it swallows up all of my 12gs of ram. When the box gets down near 2gb it freezes and reboots. Solaris 10 u5 all patched up. Version 2.0.2

please please please fix this!!!!

Oct 11 13:47:24 bio2 genunix: [ID 655072 kern.notice] fffffe800093ba70 vboxflt:vboxNetFltSolarisRecv+390 () Oct 11 13:47:24 bio2 genunix: [ID 655072 kern.notice] fffffe800093bab0 vboxflt:VBoxNetFltSolarisModReadPut+e3 () Oct 11 13:50:45 bio2 pseudo: [ID 129642 kern.info] pseudo-device: vboxdrv0 Oct 11 13:50:45 bio2 genunix: [ID 936769 kern.info] vboxdrv0 is /pseudo/vboxdrv@0 Oct 11 13:50:49 bio2 pseudo: [ID 129642 kern.info] pseudo-device: vboxflt0 Oct 11 13:50:49 bio2 genunix: [ID 936769 kern.info] vboxflt0 is /pseudo/vboxflt@0 Oct 11 13:51:55 bio2 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe800093b890 addr=10 occurred in module "vboxflt" due to a NULL pointer dereference

comment:6 Changed 5 years ago by dirkw

I seem to be experiencing this problem, too. Depending on the load the virtual machine generates, the problem seems to arise earlier or later. The machine becomes entirely unresponsive, with very high disk activity. So far I was unable to hunt in down to memory leakage. but the stack trace from a kernel panic of tonight looks similar to the one kjard_us posted.

Host is SunOS zorro 5.10 Generic_127128-11 i86pc i386 i86pc, Guest is Windows XP. Host Interface Networking, AMD-V active.

Oct 21 22:03:34 zorro genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe80000b97c0 addr=10 occurred in module "vboxflt" due to a NULL pointer dereference
Oct 21 22:03:34 zorro unix: [ID 100000 kern.notice]
Oct 21 22:03:34 zorro unix: [ID 839527 kern.notice] sched:
Oct 21 22:03:34 zorro unix: [ID 753105 kern.notice] #pf Page fault
Oct 21 22:03:34 zorro unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x10
Oct 21 22:03:34 zorro unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffff0833960, sp=0xfffffe80000b98b0, eflags=0x10246
Oct 21 22:03:34 zorro unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>
Oct 21 22:03:34 zorro unix: [ID 354241 kern.notice] cr2: 10 cr3: 1359c2000 cr8: c
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]     rdi:                0 rsi: fffffe80000b98b0 rdx:                1
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]     rcx:                0  r8:                1  r9:                0
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]     rax:                0 rbx: fffffe829647c1f6 rbp: fffffe80000b99a0
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]     r10:                1 r11:             3ad0 r12: fffffe8296439280
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]     r13: fffffe8296273c00 r14:                1 r15: ffffffff97a5eb70
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]     fsb: fffffd7ffef62a00 gsb: fffffffffbc24e40  ds:               43
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]      es:               43  fs:              1bb  gs:                0
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]     trp:                e err:                0 rip: fffffffff0833960
Oct 21 22:03:34 zorro unix: [ID 592667 kern.notice]      cs:               28 rfl:            10246 rsp: fffffe80000b98b0
Oct 21 22:03:34 zorro unix: [ID 266532 kern.notice]      ss:               30
Oct 21 22:03:34 zorro unix: [ID 100000 kern.notice]
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b96d0 unix:die+da ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b97b0 unix:trap+5e6 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b97c0 unix:cmntrap+140 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b99a0 vboxflt:vboxNetFltSolarisRecv+390 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b99e0 vboxflt:VBoxNetFltSolarisModReadPut+e3 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9a40 unix:putnext+1f1 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9a80 dld:dld_str_rx_raw+2f ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9b40 dls:i_dls_link_rx+18c ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9b90 mac:mac_rx+71 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9bc0 nge:nge_receive+54 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9be0 nge:nge_intr_handle+14f ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9c10 nge:nge_chip_intr+7c ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9c60 unix:av_dispatch_autovect+78 ()
Oct 21 22:03:34 zorro genunix: [ID 655072 kern.notice] fffffe80000b9c70 unix:intr_thread+5f ()

Searching the Ticket db, I found this ticket. Now I tried and logged vmstat output un when starting Windows XP in VBox 2.0.2, and yes, it eats all my memory away:

bash-3.00# vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd cd cd   in   sy   cs us sy id
 0 0 0 10882232 4648864 120 587 961 1 1 0 107 58 44 11 11 2896 5393 1506 6 4 91
 0 0 0 10174164 3941356 19 38 0 0  0  0  0  0  0 30 29 5504 10281 3203 4 15 81
 0 0 0 10114264 3888716 0 7  0  0  0  0  0  0  0 137 125 8151 9728 4059 4 21 75
[...]
 0 0 0 7942348 1865032 0  0  0  0  0  0  0  0  0  0  0 5595 54167 5739 10 50 40
 0 0 0 7936012 1858696 0  0  0  0  0  0  0  0  0  0  0 1202 215308 1523 18 28 54
 0 0 0 7935988 1858672 0  1  0  0  0  0  0  0  0 53 56 7212 211924 5160 38 27 35
 0 0 0 7930564 1853248 0  0  0  0  0  0  0  0  0  0  0 1353 237776 1404 18 20 62
 0 0 0 7928976 1851660 0  0  0  0  0  0  0  0  0  0  0 1472 243399 1756 19 20 60

after stopping vm, not all memory seems to be reclaimed:
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd cd cd   in   sy   cs us sy id
 0 0 0 8285508 2007164 13 28 39 0  0  0  0  5  4  0  0  920 2299 1129  2  1 97
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  877 2320 1062  1  1 98
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  851 2211 1099  1  0 98
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  856 2142 1133  2  1 97
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  845 2183 1057  1  1 98
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  1  1  0  0  851 2078 1057  1  1 98
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  847 2140 1069  1  0 98
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  849 2109 1051  1  0 98
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  899 2144 1046  2  1 97
 0 0 0 8285508 2007164 0  0  0  0  0  0  0  0  0  0  0  985 2953 1269  1  1 98


comment:7 Changed 5 years ago by ramshankar

The issues with the vboxflt (the NetFilter kernel driver) have been fixed, but as far as I remember and refer to the code for 2.0.2 the memory leak shouldn't be from vboxflt.

Could you try to see if you get the same memory situation without using host interface networking i.e with the safer NAT please.

comment:8 Changed 5 years ago by djones666

I tried it with NAT and with networking disabled - same result for either as for the host networking. I measured memory growth averaging about 75MB an hour for an idling Virtual Box.

I am thus pretty sure it is not due to the networking in the VB.

comment:9 Changed 5 years ago by kjard_us

Same here, doesn't matter nat or host. Using nat isn't an option for me (srss, etc)...

I am seeing the same rate of loss here. Gives me about 4 days before I have to reboot.

comment:10 Changed 5 years ago by frank

Please check if VirtualBox 2.0.4 fixes the problem.

comment:11 Changed 5 years ago by kjard_us

So far everything looks really good!

Fingers crossed!

I am going to try installing a vm guest and see what happens. That was always a quick way to bring the system to its knees.

comment:12 Changed 5 years ago by djones666

Looks like it is leaking at approximately the same rate as before.

Installed 2.0.4, fired up in same configuration, gave it an hour to stabilize and started tracking.

Net result is a 60-75MB per hour leakage with an idling opensolaris 2008.5 guest.

Any other suggestions to help track down the problem?

comment:13 Changed 5 years ago by frank

  • Priority changed from major to critical
  • Version changed from VirtualBox 2.0.2 to VirtualBox 2.0.4

comment:14 Changed 5 years ago by kjard_us

Well, mine still appears stable. I have had centos running all day and installed nevada while centos was going. Memory was returned to the system when machines were stopped. Centos is consuming about 1200 mb (set to 2gb of ram). I am using host interface, as well as sata, amd-vt, etc. Running as virtualbox as root.

I guess more detail can only help.

This is version 2.0.4 running on sol 10 u5 with the latest patches (including the recent kernel patch).

I am running this on a x2200 m2 with 12gb of ram and two 2.8ghz amd dual core chips. No zfs, but home directories coming from a nfs server.

comment:15 Changed 5 years ago by djones666

Memory is returned on VB exit, but grows continuously while running here. I am also patched to latest patches on sol 10 u5.

One difference - I'm running single core AMD chips.

Here is the output of my monitor during the test run:

Hardware: IBM eServer 325, dual AMD procs, 3GB physical Memory
HostOS: SunOS jcc-one 5.10 Generic_137112-08 i86pc i386 i86pc
No zfs or NFS in use
GuestOS: opensolaris 2008.5 (idling at login)

11:32:03; Used memory = 924 MB
<<<Start VirtualBox>>>
<<<Let idle to settle in>>>
13:12:04; Used memory = 1848 MB
13:22:04; Used memory = 1860 MB
13:32:04; Used memory = 1876 MB
13:42:04; Used memory = 1892 MB
13:52:04; Used memory = 1908 MB
14:02:04; Used memory = 1920 MB
14:12:04; Used memory = 1940 MB
14:22:04; Used memory = 1952 MB
14:32:04; Used memory = 1972 MB
14:42:04; Used memory = 1984 MB
14:52:04; Used memory = 2000 MB
15:02:04; Used memory = 2024 MB
15:12:04; Used memory = 2036 MB
15:22:04; Used memory = 2052 MB
15:32:04; Used memory = 2064 MB
15:42:04; Used memory = 2084 MB
15:52:04; Used memory = 2096 MB
16:02:05; Used memory = 2104 MB
<<<Shutdown VirtualBox>>>
16:22:05; Used memory = 948 MB

comment:16 Changed 5 years ago by kjard_us

I just checked the server. After sitting overnight with the centos guest running uninterrupted the system is still stable.

Current free memory is 9342. When I left work it was running around 9210.

comment:17 Changed 5 years ago by kjard_us

Is there a log or something that might be helpful for comparison purposes?

comment:18 Changed 5 years ago by sandervl73

Could you try again with 2.0.6?

comment:19 Changed 5 years ago by djones666

I have now had 2.0.6 running for at least 2x as long as was possible for 2.0.4.

Seems to be memory stable now. No grow and thrash to death like 2.0.2 and 2.0.4.

I'll run it for a few more days just to be sure, but I think we can finally close this out.

comment:20 Changed 5 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed
  • Summary changed from Memory leak with Solaris Host for 2.0.2 to Memory leak with Solaris Host for 2.0.2 => Fixed in 2.0.6

Thanks for the feedback!

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use