VirtualBox

Ticket #6100 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

2.6.32 stalls as guest in virtualbox -> fixed in SVN/3.1.4

Reported by: cbiedl Owned by:
Priority: major Component: VMM/RAW
Version: VirtualBox 3.1.2 Keywords: 2.6.32, cmpxchg8b
Cc: Guest type: Linux
Host type: other

Description

See my recent posting in LKML "2.6.32 stalls as guest in virtualbox" for more details that might be missing.

It seems a virtualbox guest running the Linux kernel 2.6.32 in certain configurations like the one attached cannot deal with the code created by the certain alternative_io for cmpxchg64 in arch/x86/include/asm/cmpxchg_32.h. This causes the kernel to stall rather early in the boot process as run as apply_alternatives is run. Reverting commit 152f9d0710a62708710161bce1b29fa8292c8c11 works around the problem by avoiding the code that calls cmpxchg64 and 'alternative_io("call cmpxchg8b_emu", "lock; cmpxchg8b (%%esi)" (...)' inside of it.

Workarounds:

  • Disable ACPI (i.e. acpi=off in the kernel command line)
  • Enable VT-x/AMD-V (reportedly, couldn't check)
  • Change the CPU to CONFIG_M686

Some of my findings: It appears apply_alternatives confuses the virtualized kernel terribly.

The following patch

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index de7353c..48fbb20 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -210,6 +210,7 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
        DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
        for (a = start; a < end; a++) {
                u8 *instr = a->instr;
+printk("apply_alternatives at %p\n", instr);
                BUG_ON(a->replacementlen > a->instrlen);
                BUG_ON(a->instrlen > sizeof(insnbuf));
                if (!boot_cpu_has(a->cpuid))
@@ -225,8 +226,11 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
                memcpy(insnbuf, a->replacement, a->replacementlen);
                add_nops(insnbuf + a->replacementlen,
                         a->instrlen - a->replacementlen);
+printk("apply_alternatives: do it\n");
                text_poke_early(instr, insnbuf, a->instrlen);
+printk("apply_alternatives: done this\n");
        }
+printk("apply_alternatives: Here we go\n");
 }
 
 #ifdef CONFIG_SMP

yields as last messages

(...)
apply_alternatives at c1028dc1
apply_alternatives: do it
apply_alternatives: done this
apply_alternatives at c1028e5d
apply_alternatives: do it
apply_alternatives: done this
apply_alternatives at c10290c2
apply_alternatives: do it

I.e. there was no return from text_poke_early after patching kernel/sched_clock.c

Checking vmlinux verifies the instruction at c10290c2 is indeed cmpxchg8b_emu in sched_clock_local (kernel/sched_clock.c).

Another bit, technically disabling the concept of alternative_io for cmpxchg8b by using the same code for what should be emulation and alternative as an ugly workaround:

--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -317,7 +317,7 @@ extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);
        __typeof__(*(ptr)) __ret;                               \
        __typeof__(*(ptr)) __old = (o);                         \
        __typeof__(*(ptr)) __new = (n);                         \
-       alternative_io("call cmpxchg8b_emu",                    \
+       alternative_io("lock; cmpxchg8b (%%esi)",               \
                        "lock; cmpxchg8b (%%esi)" ,             \
                       X86_FEATURE_CX8,                         \
                       "=A" (__ret),                            \

fixes the problem and seems to show the problem is in modification of the program, not writing it.

Now I'm stuck.

Version numbers and stuff:

  • virtualbox-ose 3.0.8 (backport) running on Debian lenny, both 32bit and 64bit hosts
  • Also verified on virtualbox-ose 3.1.2 running on Debian squeeze (32bit)
  • guest kernel 2.6.32.6 (always 32bit), built using Debian squeeze. The config is attached
  • Host CPU (32bit):
    model name      : Intel(R) Pentium(R) M processor 1600MHz
    flags           : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 tm pbe bts est tm2
    
  • Host CPU (64bit):
    model name      :          Intel(R) Atom(TM) CPU  330   @ 1.60GHz
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl tm2 ssse3 cx16 xtpr lahf_lm
    

Attachments

DOTconfig Download (45.6 KB) - added by cbiedl 4 years ago.
Kernel configuration
VBox.log Download (67.4 KB) - added by cbiedl 4 years ago.
VBox.log (note to self: chmod 644 eases upload)

Change History

Changed 4 years ago by cbiedl

Kernel configuration

Changed 4 years ago by cbiedl

VBox.log (note to self: chmod 644 eases upload)

comment:1 Changed 4 years ago by sandervl73

  • Component changed from other to VMM/RAW

comment:2 Changed 4 years ago by sandervl73

Thanks for the details. The patching is either missed or we're hitting a mishandled edge case here. I'll have a look next week. Is there a bootable ISO image with this kernel that I could use?

comment:3 Changed 4 years ago by cbiedl

ISO-Image is available, it took a while to tame mkisofs ... drop me a line how I should send it to you (e-mail, saft, IRC-DCC, whatever). Size is about 3 Mbyte.

Note that the image contains the kernel only; if the boot process does not stall - due to different environment or the like - the kernel will panic since there is no root filesystem.

comment:4 Changed 4 years ago by sandervl73

I've sent you an email using the address you've registered this account with.

comment:5 Changed 4 years ago by sandervl73

Confirmed, but only on a 32 bits host.

comment:6 Changed 4 years ago by sandervl73

  • Summary changed from 2.6.32 stalls as guest in virtualbox to 2.6.32 stalls as guest in virtualbox -> fixed in SVN/3.1.4

For some reason it didn't hang on my win7 x64 host, but the same problem there. Fixed now. Thanks for the report.

comment:7 Changed 4 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

comment:8 Changed 4 years ago by arand

Is there a discrete change fixing this issue, that may be backported as a patch?

comment:9 Changed 4 years ago by cbiedl

As this has been reportedly fixed in SVN the patch should be extractable from there. However, the revisions numbers were not reported and I could not find a reference to this ticket in the commit messenges.

Besides that, thanks a lot to the VirtualBox guys, the speed of both the reaction and the fix was overwhelming. Much appreciated.

comment:10 Changed 4 years ago by frank

The relevant changesets are r26129 and r26130.

comment:11 Changed 4 years ago by arand

Thanks!

(test-patch available for ubuntu 9.10, see  https://bugs.launchpad.net/ubuntu/+source/virtualbox-ose/+bug/510571 for info)

comment:12 Changed 4 years ago by CHli

  • Status changed from closed to reopened
  • Resolution fixed deleted

Running VirtualBox 3.1.4r57640 on Windows XP SP3 32 bits on a Pentium 4 with Debian Squeeze as a guest OS still crashes for me.

Tried disabling acpi but didn't help.

I get a "BUG soft lockup" every N seconds and the boot process stops.

Anyone can confirm ?

comment:13 Changed 4 years ago by CHli

Nevermind and sorry for the noise. After removing a probably bad memory DIMM the system boots again.

comment:14 Changed 4 years ago by frank

  • Status changed from reopened to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use