VirtualBox

Ticket #6279 (closed defect: invalid)

Opened 4 years ago

Last modified 4 years ago

Kernel panic when running VMs with VBoxHeadless

Reported by: EdoFede Owned by:
Priority: critical Component: other
Version: VirtualBox 3.1.4 Keywords: kernel panic VBoxHeadless
Cc: Guest type: other
Host type: Solaris

Description

Hello to all.

I've got an issue using VirtualBox with the VBoxHeadless mode in Solaris host.

My host is running on a Dell 400SC hardware with two 120GB disks in zfs mirror (rpool) and 6x 1TB external drive in zfs radiz (datas). Host running Solaris 10 10/08 s10x_u6wos_07b X86 with 142901-04 kernel (32bit). and I've tried both Virtualbox 3.1.2 and 3.1.4 with the same result.

I've got two linux VMs: IPCop with 2.4 kernel and Debian with 2.6 kernel

If I start and use the VMs with VirtualBox GUI over VNC or local console, all works perfectly.

Even If I try to use the VMs with the VBoxHeadless mode, the host machine crashes with kernel panic after few seconds from guest OS is loaded. Not immediately, but after few second (during logon via console or ssh on the guest OS, but it happens even if I did not logon on the guest)

No error in the VM logs, but here it is the crash log I've found in /var/adm/messages:

Feb 24 00:22:35 srv unix: [ID 836849 kern.notice]
Feb 24 00:22:35 srv ^Mpanic[cpu1]/thread=cbb34dc0:
Feb 24 00:22:35 srv genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=cbb34c2c addr=14a240cb occurred in module "genunix" due to an illegal access to a user address
Feb 24 00:22:35 srv unix: [ID 100000 kern.notice]
Feb 24 00:22:35 srv unix: [ID 839527 kern.notice] sched:
Feb 24 00:22:35 srv unix: [ID 753105 kern.notice] #pf Page fault
Feb 24 00:22:35 srv unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x14a240cb
Feb 24 00:22:35 srv unix: [ID 243837 kern.notice] pid=0, pc=0xfe8afb35, sp=0xe2b1d9f8, eflags=0x10202
Feb 24 00:22:35 srv unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
Feb 24 00:22:35 srv unix: [ID 936844 kern.notice] cr2: 14a240cb cr3: db89000
Feb 24 00:22:35 srv unix: [ID 537610 kern.notice]        gs: cc0d01b0  fs: e1070000  es:      160  ds:  d520160
Feb 24 00:22:35 srv unix: [ID 537610 kern.notice]       edi:        6 esi: fead4ffc ebp: cbb34c7c esp: cbb34c64
Feb 24 00:22:35 srv unix: [ID 537610 kern.notice]       ebx: deb9cdf8 edx:        0 ecx: 14a240cb eax:        0
Feb 24 00:22:35 srv unix: [ID 537610 kern.notice]       trp:        e err:        0 eip: fe8afb35  cs:      158
Feb 24 00:22:35 srv unix: [ID 717149 kern.notice]       efl:    10202 usp: e2b1d9f8  ss: cbb34ca4
Feb 24 00:22:35 srv unix: [ID 100000 kern.notice]
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34b8c unix:die+a7 (e, cbb34c2c, 14a240)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34c18 unix:trap+1130 (cbb34c2c, 14a240cb,)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34c2c unix:cmntrap+9b (cc0d01b0, e1070000,)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34c7c genunix:avl_walk+2d (cc0d6598, e2b1d9f8,)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34ca4 zfs:space_map_walk+45 (cc0d6598, fead4ffc,)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34cec zfs:metaslab_sync+1b5 (cc0d6340, 87d7a, 0)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34d14 zfs:vdev_sync+a8 (cb657200, 87d7a, 0)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34d5c zfs:spa_sync+38e (d209c680, 87d7a, 0)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34da8 zfs:txg_sync_thread+22c (d28fa080, 0)
Feb 24 00:22:35 srv genunix: [ID 353471 kern.notice] cbb34db8 unix:thread_start+8 ()
Feb 24 00:22:35 srv unix: [ID 100000 kern.notice]
Feb 24 00:22:35 srv genunix: [ID 672855 kern.notice] syncing file systems...
Feb 24 00:22:35 srv genunix: [ID 904073 kern.notice]  done
Feb 24 00:22:36 srv genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Feb 24 00:23:25 srv genunix: [ID 409368 kern.notice] ^M100% done: 140661 pages dumped, compression ratio 2.00,
Feb 24 00:23:25 srv genunix: [ID 851671 kern.notice] dump succeeded

My Solaris version:

root@srv{~}> uname -a
SunOS srv 5.10 Generic_142901-04 i86pc i386 i86pc


root@srv{~}> cat /etc/release
                       Solaris 10 10/08 s10x_u6wos_07b X86
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 27 October 2008

In attachment some logs and infos about my system. YAMJ.xml.txt is the xml config file of the VM.

Attachments

dmesg.txt Download (17.2 KB) - added by EdoFede 4 years ago.
dmesg from host
messages_extract.txt Download (2.9 KB) - added by EdoFede 4 years ago.
/var/adm/messages extract of the error
prtconf_v.txt Download (201.6 KB) - added by EdoFede 4 years ago.
prtconf -v of the host
VBox.log.txt Download (65.0 KB) - added by EdoFede 4 years ago.
VBox.log.1.txt Download (36.7 KB) - added by EdoFede 4 years ago.
YAMJ.xml.txt Download (6.0 KB) - added by EdoFede 4 years ago.
VM XML config file
zfs_zpool_status.txt Download (3.0 KB) - added by EdoFede 4 years ago.
infos about my zfs/zpool config
last_var_adm_messages.txt Download (3.7 KB) - added by EdoFede 4 years ago.
2nd /var/adm/messages extract

Change History

Changed 4 years ago by EdoFede

dmesg from host

Changed 4 years ago by EdoFede

/var/adm/messages extract of the error

Changed 4 years ago by EdoFede

prtconf -v of the host

Changed 4 years ago by EdoFede

Changed 4 years ago by EdoFede

Changed 4 years ago by EdoFede

VM XML config file

Changed 4 years ago by EdoFede

infos about my zfs/zpool config

comment:1 Changed 4 years ago by ramshankar

Could you please try using NAT for the VMs in question instead of bridged networking and see if it makes a difference? Additionally could you please enable core dumps as mentioned here:  http://www.virtualbox.org/wiki/Core_dump ?

comment:2 Changed 4 years ago by EdoFede

Hello.

I've tried with core dumps enabled (bridged network) as specified in the page, but no core dump files was created. I think that the OS crashes before the core dump can be generated.

-bash-3.00$ id
uid=1001(vbox) gid=1(other)
-bash-3.00$ coreadm
     global core file pattern: /var/core/core.%f.%p
     global core file content: all
       init core file pattern: %f.%p
       init core file content: all
            global core dumps: enabled
       per-process core dumps: enabled
      global setid core dumps: enabled
 per-process setid core dumps: enabled
     global core dump logging: enabled
-bash-3.00$ svcs | grep coreadm
online         11:44:41 svc:/system/coreadm:default

Now I'm running the VM with NAT network and all seems to be ok, but I have no possibility to stress the VM. With NAT network I can't mount the NFS share from the host to run the catalog program (yamj) and reproduce the same condition as before.

Since the other VM is an IPCop I think is possible that the problem is with bridged network.

I've tried also with PC-net guest network adapter, instead of the intel pro/1000 MT, but the result is the same.

I attach the last /var/adm/message and the /var/crash/srv/unix.3 (the vmcore is too large - 580MB)

Can I do other tests?

Thank you.

Changed 4 years ago by EdoFede

2nd /var/adm/messages extract

comment:3 follow-up: ↓ 7 Changed 4 years ago by EdoFede

Sorry, I can't attach the /var/crash/srv/unix.3 to this ticket. The file is 1,7MB but the max attachment size is 400KB.

comment:4 Changed 4 years ago by ramshankar

Oh wait this is Solaris 10, how much of swap space do you have?

comment:5 Changed 4 years ago by EdoFede

Yes, Solaris 10 :)

I have 4GB of swap space.

root@srv{~}> swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 181,1       8 4194288 4194288
root@srv{~}> swap -s
total: 142092k bytes allocated + 26500k reserved = 168592k used, 4839532k available

comment:6 Changed 4 years ago by ramshankar

From the looks of it it seems to be a ZFS issue with relinquishing ARC memory, are you running the latest patches? Also could you try limiting the ZFS arc cache in /etc/system to ~1.5 Gigs:

set zfs:zfs_arch_max = 1610612736

comment:7 in reply to: ↑ 3 Changed 4 years ago by ramshankar

Replying to EdoFede:

Sorry, I can't attach the /var/crash/srv/unix.3 to this ticket. The file is 1,7MB but the max attachment size is 400KB.

The proper core files (here "bounds","unix.3","vmcore.3") should be a few hundred megs so it looks like the core is not valid.

comment:8 Changed 4 years ago by EdoFede

Hello.

It's strange that this problem appear only with Headless mode and only with bridged network, not? Anyway, during this weekend I've upgraded to Solaris 10/09, totally patched the system with last kernel udates and limited che ZFS ARC as suggested Now the issue seems to be solved. I've runned one of the VMs for one day without any problem.

Thanks for the suggestion.

Bye, Edoardo.

comment:9 Changed 4 years ago by ramshankar

  • Status changed from new to closed
  • Resolution set to invalid

Since this is solved with applying the latest patches to Solaris I'll close the defect. Please reopen if required.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use