VirtualBox

Ticket #9905 (closed defect: worksforme)

Opened 2 years ago

Last modified 22 months ago

ESXi image as guest crashes Solaris host

Reported by: joe42 Owned by:
Priority: major Component: other
Version: VirtualBox 4.1.6 Keywords:
Cc: Guest type: other
Host type: Solaris

Description (last modified by frank) (diff)

We have been running an ESXi "server" as a guest under VirtualBox (no this does not make sense in production but it is very useful for testing ESXi API interactions in a development setting).

However, starting up the ESXi image with VBoxHeadless will often cause the entire hypervisor host to crash; the Solaris kernel dumps a core and reboots.

If the ESXi image starts up successfully it seems to be able to continue to run. But most of the times when you start up the image, the host system will be taken down.

This happened on 4.1.6 and 4.1.4.

Attachments

VBox.log.1 Download (49.6 KB) - added by joe42 2 years ago.
VBox.log from one of the ESXi runs that crashed

Change History

Changed 2 years ago by joe42

VBox.log from one of the ESXi runs that crashed

comment:1 Changed 2 years ago by Technologov

devs: Please up "Priority" to "Blocker".

-Technologov

comment:2 Changed 2 years ago by ramshankar

If the host is panicing it should leave a system core dump in /var/cores/<hostname>/ and also have a backtrace in the syslog (/var/adm/messages).

Could you paste the backtrace here?

Hopefully the version of Solaris you're using has proper "dumpadm" settings that doesn't prevent the above.

comment:3 Changed 2 years ago by joe42

The back-trace is here (and all three crashes showed essentially the same stack trace):

Nov 11 08:44:58 turkey unix: [ID 836849 kern.notice] 
Nov 11 08:44:58 turkey ^Mpanic[cpu7]/thread=fffffe951675aac0: 
Nov 11 08:44:58 turkey genunix: [ID 683410 kern.notice] BAD TRAP: type=0 (#de Divide error) rp=fffffe8003f507a0 addr=0
Nov 11 08:44:58 turkey unix: [ID 100000 kern.notice] 
Nov 11 08:44:58 turkey unix: [ID 839527 kern.notice] VBoxHeadless: 
Nov 11 08:44:58 turkey unix: [ID 753105 kern.notice] #de Divide error
Nov 11 08:44:58 turkey unix: [ID 243837 kern.notice] pid=4090, pc=0xffffffffef138f8b, sp=0xfffffe8003f50890, eflags=0x10a47
Nov 11 08:44:58 turkey unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 26f8<xmme,fxsr,pge,mce,pae,pse,de>
Nov 11 08:44:58 turkey unix: [ID 354241 kern.notice] cr2: 0 cr3: 7edfda000 cr8: c
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]    rdi:            f4240 rsi:  3f4e4ca00da7a1b rdx:           10788a
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]    rcx:          429b17f  r8:          31a6be7  r9:    1b3ea2591a73c
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]    rax: 150ab7b375223e65 rbx: fffffd7ffd054720 rbp: fffffe8003f508a0
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]    r10:          15c9220 r11:           9a2112 r12: fffffd7ffd054748
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]    r13: fffffd7ffd054720 r14:          3f4e602 r15: fffffd7ffd054748
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]    fsb: fffffd7ffef05200 gsb: ffffffff9d47e000  ds:               43
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]     es:               43  fs:                0  gs:                0
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]    trp:                0 err:                0 rip: ffffffffef138f8b
Nov 11 08:44:58 turkey unix: [ID 592667 kern.notice]     cs:               28 rfl:            10a47 rsp: fffffe8003f50890
Nov 11 08:44:58 turkey unix: [ID 266532 kern.notice]     ss:               30
Nov 11 08:44:58 turkey unix: [ID 100000 kern.notice] 
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f506b0 unix:real_mode_end+8071 ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50790 unix:trap+dda ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f507a0 unix:cmntrap+140 ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f508a0 ffffffffef138f8b ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f508e0 ffffffffef139517 ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f50920 ffffffffef139f40 ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f50930 ffffffffef0e1d9d ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f50a30 ffffffffef0e25a0 ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f50a80 ffffffffef0d9582 ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f50c50 ffffffffef0d6daf ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f50c70 ffffffffef10e867 ()
Nov 11 08:44:58 turkey genunix: [ID 802836 kern.notice] fffffe8003f50cc0 ffffffffef0da964 ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50cf0 vboxdrv:supdrvIOCtlFast+ac ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50dc0 vboxdrv:VBoxDrvSolarisIOCtl+121 ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50dd0 genunix:cdev_ioctl+1d ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50df0 specfs:spec_ioctl+50 ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50e20 genunix:fop_ioctl+25 ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50f00 genunix:ioctl+ac ()
Nov 11 08:44:58 turkey genunix: [ID 655072 kern.notice] fffffe8003f50f10 unix:brand_sys_syscall+21d ()
Nov 11 08:44:58 turkey unix: [ID 100000 kern.notice] 

The dump files are 1.5GiB compressed, so if you want a dump file please let me know where.

comment:4 Changed 2 years ago by ramshankar

Does the VBox.log correspond with this backtrace? If not could you please upload the corresponding VBox.log for this crash?

comment:5 Changed 2 years ago by joe42

Actually, the backtrace posted was from the 4.0.12 VirtualBox under which this crash happened the first time.

The backtrace that corresponds to the uploaded log file is here:

Nov 11 11:39:35 turkey unix: [ID 836849 kern.notice] 
Nov 11 11:39:35 turkey ^Mpanic[cpu15]/thread=fffffe94863f01a0: 
Nov 11 11:39:35 turkey genunix: [ID 683410 kern.notice] BAD TRAP: type=0 (#de Divide error) rp=fffffe80031ed7a0 addr=0
Nov 11 11:39:35 turkey unix: [ID 100000 kern.notice] 
Nov 11 11:39:35 turkey unix: [ID 839527 kern.notice] VBoxHeadless: 
Nov 11 11:39:35 turkey unix: [ID 753105 kern.notice] #de Divide error
Nov 11 11:39:35 turkey unix: [ID 243837 kern.notice] pid=1264, pc=0xffffffffef138f8b, sp=0xfffffe80031ed890, eflags=0x10a47
Nov 11 11:39:35 turkey unix: [ID 211416 kern.notice] cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 26f8<xmme,fxsr,pge,mce,pae,pse,de>
Nov 11 11:39:35 turkey unix: [ID 354241 kern.notice] cr2: 0 cr3: f0c0c2000 cr8: c
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]    rdi:            f4240 rsi:  3ec34bb00da7a20 rdx:           10545f
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]    rcx:          429b17f  r8:          311bb7e  r9:      6e05a8b9c3c
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]    rax: a09a014289f2b5e0 rbx: fffffd7ffd054720 rbp: fffffe80031ed8a0
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]    r10:          15c9500 r11:           9a2112 r12: fffffd7ffd054748
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]    r13: fffffd7ffd054720 r14:          3ec359e r15: fffffd7ffd054748
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]    fsb: fffffd7ffef05200 gsb: ffffffff9d8e4000  ds:               43
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]     es:               43  fs:                0  gs:                0
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]    trp:                0 err:                0 rip: ffffffffef138f8b
Nov 11 11:39:35 turkey unix: [ID 592667 kern.notice]     cs:               28 rfl:            10a47 rsp: fffffe80031ed890
Nov 11 11:39:35 turkey unix: [ID 266532 kern.notice]     ss:               30
Nov 11 11:39:35 turkey unix: [ID 100000 kern.notice] 
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031ed6b0 unix:real_mode_end+8071 ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031ed790 unix:trap+dda ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031ed7a0 unix:cmntrap+140 ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031ed8a0 ffffffffef138f8b ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031ed8e0 ffffffffef139517 ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031ed920 ffffffffef139f40 ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031ed930 ffffffffef0e1d9d ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031eda30 ffffffffef0e25a0 ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031eda80 ffffffffef0d9582 ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031edc50 ffffffffef0d6daf ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031edc70 ffffffffef10e867 ()
Nov 11 11:39:35 turkey genunix: [ID 802836 kern.notice] fffffe80031edcc0 ffffffffef0da964 ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031edcf0 vboxdrv:supdrvIOCtlFast+ac ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031eddc0 vboxdrv:VBoxDrvSolarisIOCtl+121 ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031eddd0 genunix:cdev_ioctl+1d ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031eddf0 specfs:spec_ioctl+50 ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031ede20 genunix:fop_ioctl+25 ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031edf00 genunix:ioctl+ac ()
Nov 11 11:39:35 turkey genunix: [ID 655072 kern.notice] fffffe80031edf10 unix:brand_sys_syscall+21d ()

comment:6 Changed 2 years ago by ramshankar

The log says 4.1.6 r74727 but the symbols don't seem to match the ones in the core with the load address for VMMR0. Could you do the following and paste/upload the text result (as root):

# cd /var/crash/<hostname>/
# ls
bounds unix.0 vmcore.0

Assuming "*.0" is your core file:

On Solaris 11

echo "::msgbuf\ng_DevExt+0x28/J\ng_VBoxDrvSolarisModule::print -t 'struct modldrv'\n$C" | mdb 0

Edit: Gah, this doesn't work on S10's old MDB. grr. So split it as follows:

On Solaris 10

mdb 0
::msgbuf
g_DevExt+0x28/J
g_VBoxDrvSolarisModule::print -t 'struct modldrv'
$C

comment:7 Changed 2 years ago by joe42

Apologies for the late reply.

I am not having a lot of luck with the commands you sent me. I am using savecore to get a unix and vmcore file from the vmdump file on my system. The dump is in vmdump.2 and this is what I do:

-bash-3.00# savecore -f vmdump.2
savecore: incomplete dump on dump device
savecore: System dump time: Fri Nov 11 11:39:37 2011

savecore: saving system crash dump in /var/crash//{unix,vmcore}.2
Constructing namelist /var/crash//unix.2
Constructing corefile /var/crash//vmcore.2
pfn 17241602 not found for as=fffffffffbc29020, va=fffffe8000700000
pfn 17255043 not found for as=fffffffffbc29020, va=fffffe8000701000
pfn 17245956 not found for as=fffffffffbc29020, va=fffffe8000702000
pfn 17256837 not found for as=fffffffffbc29020, va=fffffe8000703000
pfn 17251974 not found for as=fffffffffbc29020, va=fffffe8000704000
pfn 8593282 not found for as=fffffffffbc29020, va=fffffe8002400000
pfn 8593283 not found for as=fffffffffbc29020, va=fffffe8002401000
pfn 8593156 not found for as=fffffffffbc29020, va=fffffe8002402000
pfn 8593285 not found for as=fffffffffbc29020, va=fffffe8002403000
pfn 8593286 not found for as=fffffffffbc29020, va=fffffe8002404000
 1:42  99% done: 2162053 of 2172266 pages saved
savecore: bad data after page 2162053

I have not done this before so I don't know if the errors above are to be expected... Anyway, this is what happens when I continue with the commands you sent (adjusting '0' to '2' to accomodate the different file names):

-bash-3.00# mdb 2
mdb: failed to read module at ffffffffa54b6680
mdb: failed to read modctl at ffffffff8f1f0dd0: no mapping for address
Loading modules: [ unix krtldmdb: couldn't read cache at ffffffff808a3008: no mapping for address
 genunix specfs cpu.generic uppc pcplusmp ]
> ::msgbuf
mdb: failed to read mblk at ffffffffa522b3c0: no mapping for address
> g_DevExt+0x28/J
mdb: failed to read data from target: no mapping for address
g_DevExt+0x28:  
> g_VBoxDrvSolarisModule::print -t 'struct modldrv'
mdb: failed to read 470364 bytes of debug data for genunix at ffffffffa53aa000: no mapping for address
mdb: failed to read 470364 bytes of debug data for genunix at ffffffffa53aa000: no mapping for address
mdb: failed to look up type g_VBoxDrvSolarisModule: no mapping for address
> $C
fffffe80031ed8a0 0xffffffffef138f8b()
>

If I should have run the savecore differently or if I did something else wrong, please point it out - I have (luckily) not had to deal with Solaris dumps before.

comment:8 Changed 2 years ago by ramshankar

Looks like the core is missing several pages. Is there enough space on the partition where the dump is being written (usually swap), check output of dumpadm for "dump device".

comment:9 Changed 2 years ago by ramshankar

Also delete old cores to free up space if you need to.

comment:10 Changed 22 months ago by frank

  • Status changed from new to closed
  • Resolution set to worksforme
  • Description modified (diff)

No response, closing.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use