VirtualBox

Opened 3 years ago

Last modified 3 years ago

#20271 new defect

nested virtualisation with kvm cpu hits 100% then L2 guest crashes

Reported by: aw125 Owned by:
Component: other Version: VirtualBox 6.1.18
Keywords: Cc:
Guest type: Linux Host type: Windows

Description

Hi,

It maybe that this is known not to work but in the 6.1 docs it mentions kvm as an L2 hypervisor so I thought I'd give it ago.

I have Intel I7 9th gen chip with 6 cores running windows 10 Enterprise build 18363 as my L0

I tried this originally on vbox 6.1.16 but upgraded to 6.1.18 after reading https://www.virtualbox.org/ticket/19315. This caused https://www.virtualbox.org/ticket/20199 so I'm currently running the 6.1.19 test build which fixes the crash but didnt resolve my problem first observed in 6.1.16 so I'm guessing its something different.

Redhat mention they support some combinations in tech preview when the L0 is RHEL8.2. They say others eg: VMware, Xen, Amazon AWS, Nutanix AHV, Oracle VM are not.

My use case for this is purely local lab environments mainly in vagrant so I'm more interested in if it work or not then support.

I'm trying to use packer inside a centos7 L1 VM to build with the qemu builder with kvm accelerator.

Packer is able to launch the VM and it starts the install. Its pretty slow but eventually fails installing this glibc.i686 package

Installing glibc.i686 (684/694)
[ 1354.721435] double fault: 0000 [#1] SMP
[ 1354.722093] Modules linked in: xfs fcoe libfcoe libfc scsi_transport_fc scsi_tgt zram sg pcspkr joydev i2c_piix4 parport_pc parport ext4 mbcache jbd2 loop nls_utf8 isofs sr_mod cdrom 8021q garp mrp stp llc virtio_net virtio_blk net_failover failover drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_generic pata_acpi drm ata_piix libata serio_raw virtio_pci virtio_ring virtio drm_panel_orientation_quirks sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs floppy iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs edd
[ 1354.722093] CPU: 0 PID: 17227 Comm: glibc_post_upgr Not tainted 3.10.0-1160.el7.x86_64 #1
[ 1354.722093] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[ 1354.722093] task: ffff9fa564fe8000 ti: ffff9fa562bdc000 task.ti: ffff9fa562bdc000
[ 1354.722093] RIP: 0010:[<00000000b5f97d80>]  [<00000000b5f97d80>] 0xb5f97d80
[ 1354.722093] RSP: 0018:0000000000000000  EFLAGS: 00010082
[ 1354.722093] RAX: 000000000000007a RBX: 00000000ffcc49ca RCX: 0000000000000000
[ 1354.722093] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[ 1354.722093] RBP: 00000000ffcc495c R08: 0000000000000000 R09: 0000000000000000
[ 1354.722093] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1354.722093] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1354.722093] FS:  0000000000000000(0000) GS:ffff9fa5bcc00000(0000) knlGS:0000000000000000
[ 1354.722093] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 1354.722093] CR2: 00000000b5f97d80 CR3: 0000000022a7c000 CR4: 00000000000006f0
[ 1354.722093] Call Trace:
[ 1354.722093] Code:  Bad RIP value.
[ 1354.722093] RIP  [<00000000b5f97d80>] 0xb5f97d80
[ 1354.722093]  RSP <0000000000000000>
[ 1354.722093] ---[ end trace 6fd2b3e5f947f091 ]---
[ 1354.722093] Kernel panic - not syncing: Fatal exception
[ 1354.722093] Kernel Offset: 0x34800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

I see some people are able to get kvm to run fine in vbox. But I'm wondering if they have AMD chips on the L0 or some other fundamental difference.

Any suggestions on how to try to debug this further?

Change History (1)

comment:1 by aw125, 3 years ago

I tested this with ovirt 4.3 running on cento7 and I get the same behaviour (I expected this as it qemu-kvm underneath but I was hoping some of the build in qemu args would be the solution I'm missing).

Interestingly if I import an ova the VM starts ok and cpu usage is low until I do any IO on the L2 then it jumps to 100% on the L1. It doesn't crash though.

I've tried increasing the number of vcpus in the L2 but then the L2 will not even launch.

I've asked a few others to try but we all have intel cpus and get the same behaviour.

I'll probably have to come up with some other way to do what I need. Hyper-v worked with nested kvm but I dont like it and it breaks all my virtual box stuff.

I'll leave this open in case anyone else tries something similar and comes up with a solution.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use