#6100 closed defect (fixed)
2.6.32 stalls as guest in virtualbox -> fixed in SVN/3.1.4
Reported by: | Christoph Biedl | Owned by: | |
---|---|---|---|
Component: | VMM/RAW | Version: | VirtualBox 3.1.2 |
Keywords: | 2.6.32, cmpxchg8b | Cc: | |
Guest type: | Linux | Host type: | other |
Description
See my recent posting in LKML "2.6.32 stalls as guest in virtualbox" for more details that might be missing.
It seems a virtualbox guest running the Linux kernel 2.6.32 in certain configurations like the one attached cannot deal with the code created by the certain alternative_io for cmpxchg64 in arch/x86/include/asm/cmpxchg_32.h. This causes the kernel to stall rather early in the boot process as run as apply_alternatives is run. Reverting commit 152f9d0710a62708710161bce1b29fa8292c8c11 works around the problem by avoiding the code that calls cmpxchg64 and 'alternative_io("call cmpxchg8b_emu", "lock; cmpxchg8b (%%esi)" (...)' inside of it.
Workarounds:
- Disable ACPI (i.e. acpi=off in the kernel command line)
- Enable VT-x/AMD-V (reportedly, couldn't check)
- Change the CPU to CONFIG_M686
Some of my findings: It appears apply_alternatives confuses the virtualized kernel terribly.
The following patch
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index de7353c..48fbb20 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -210,6 +210,7 @@ void __init_or_module apply_alternatives(struct alt_instr *start, DPRINTK("%s: alt table %p -> %p\n", __func__, start, end); for (a = start; a < end; a++) { u8 *instr = a->instr; +printk("apply_alternatives at %p\n", instr); BUG_ON(a->replacementlen > a->instrlen); BUG_ON(a->instrlen > sizeof(insnbuf)); if (!boot_cpu_has(a->cpuid)) @@ -225,8 +226,11 @@ void __init_or_module apply_alternatives(struct alt_instr *start, memcpy(insnbuf, a->replacement, a->replacementlen); add_nops(insnbuf + a->replacementlen, a->instrlen - a->replacementlen); +printk("apply_alternatives: do it\n"); text_poke_early(instr, insnbuf, a->instrlen); +printk("apply_alternatives: done this\n"); } +printk("apply_alternatives: Here we go\n"); } #ifdef CONFIG_SMP
yields as last messages
(...) apply_alternatives at c1028dc1 apply_alternatives: do it apply_alternatives: done this apply_alternatives at c1028e5d apply_alternatives: do it apply_alternatives: done this apply_alternatives at c10290c2 apply_alternatives: do it
I.e. there was no return from text_poke_early after patching kernel/sched_clock.c
Checking vmlinux verifies the instruction at c10290c2 is indeed cmpxchg8b_emu in sched_clock_local (kernel/sched_clock.c).
Another bit, technically disabling the concept of alternative_io for cmpxchg8b by using the same code for what should be emulation and alternative as an ugly workaround:
--- a/arch/x86/include/asm/cmpxchg_32.h +++ b/arch/x86/include/asm/cmpxchg_32.h @@ -317,7 +317,7 @@ extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64); __typeof__(*(ptr)) __ret; \ __typeof__(*(ptr)) __old = (o); \ __typeof__(*(ptr)) __new = (n); \ - alternative_io("call cmpxchg8b_emu", \ + alternative_io("lock; cmpxchg8b (%%esi)", \ "lock; cmpxchg8b (%%esi)" , \ X86_FEATURE_CX8, \ "=A" (__ret), \
fixes the problem and seems to show the problem is in modification of the program, not writing it.
Now I'm stuck.
Version numbers and stuff:
- virtualbox-ose 3.0.8 (backport) running on Debian lenny, both 32bit and 64bit hosts
- Also verified on virtualbox-ose 3.1.2 running on Debian squeeze (32bit)
- guest kernel 2.6.32.6 (always 32bit), built using Debian squeeze. The config is attached
- Host CPU (32bit):
model name : Intel(R) Pentium(R) M processor 1600MHz flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 tm pbe bts est tm2
- Host CPU (64bit):
model name : Intel(R) Atom(TM) CPU 330 @ 1.60GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl tm2 ssse3 cx16 xtpr lahf_lm
Attachments (2)
Change History (16)
by , 15 years ago
comment:1 by , 15 years ago
Component: | other → VMM/RAW |
---|
comment:2 by , 15 years ago
Thanks for the details. The patching is either missed or we're hitting a mishandled edge case here. I'll have a look next week. Is there a bootable ISO image with this kernel that I could use?
comment:3 by , 15 years ago
ISO-Image is available, it took a while to tame mkisofs ... drop me a line how I should send it to you (e-mail, saft, IRC-DCC, whatever). Size is about 3 Mbyte.
Note that the image contains the kernel only; if the boot process does not stall - due to different environment or the like - the kernel will panic since there is no root filesystem.
comment:4 by , 15 years ago
I've sent you an email using the address you've registered this account with.
comment:6 by , 15 years ago
Summary: | 2.6.32 stalls as guest in virtualbox → 2.6.32 stalls as guest in virtualbox -> fixed in SVN/3.1.4 |
---|
For some reason it didn't hang on my win7 x64 host, but the same problem there. Fixed now. Thanks for the report.
comment:7 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:8 by , 15 years ago
Is there a discrete change fixing this issue, that may be backported as a patch?
comment:9 by , 15 years ago
As this has been reportedly fixed in SVN the patch should be extractable from there. However, the revisions numbers were not reported and I could not find a reference to this ticket in the commit messenges.
Besides that, thanks a lot to the VirtualBox guys, the speed of both the reaction and the fix was overwhelming. Much appreciated.
comment:11 by , 15 years ago
Thanks!
(test-patch available for ubuntu 9.10, see https://bugs.launchpad.net/ubuntu/+source/virtualbox-ose/+bug/510571 for info)
comment:12 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Running VirtualBox 3.1.4r57640 on Windows XP SP3 32 bits on a Pentium 4 with Debian Squeeze as a guest OS still crashes for me.
Tried disabling acpi but didn't help.
I get a "BUG soft lockup" every N seconds and the boot process stops.
Anyone can confirm ?
comment:13 by , 15 years ago
Nevermind and sorry for the noise. After removing a probably bad memory DIMM the system boots again.
comment:14 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Kernel configuration