[vbox-dev] Linux-4.12-rc1 and rc2 not working as vbox guest ?

Thu Jun 1 14:48:55 GMT 2017

Hi,

On 01-06-17 16:11, Frank Mehnert wrote:
> Hi Hans,
> 
> On Donnerstag, 1. Juni 2017 15:59:10 CEST Hans de Goede wrote:
>> Hi,
>>
>> On 01-06-17 15:42, Frank Mehnert wrote:
>>> Hi Hans,
>>>
>>> On Donnerstag, 1. Juni 2017 14:22:29 CEST Hans de Goede wrote:
>>>>> Again, this is not a VirtualBox bug although this panic will probably not
>>>>> happen on real hardware.
>>>>
>>>> [...]
>>>>
>>>> No as for the disagree-ing with this not being a VirtualBox bug,
>>>> even with 4.11 the oops is a problem and is IMHO a VirtualBox bug,
>>>
>>> why do you think it's a VirtualBox bug?
>>
>> Because it does not happen on real hardware and VirtualBox
>> is supposed to faithfully emulate real hardware.
> 
> Any prove that it does not happen on real hardware? :-) I think that would
> be more a guess.
> 
>> But I understand that modern hardware is so complex that it may be
>> an issue on the kernel side...
> 
> Right.
> 
>>>> I was sorta surprised when you said earlier in this thread that this
>>>> is normal and can be ignored. Oopses are never normal and should never
>>>> be ignored. Fedora / RHEL users will get a notification that the kernel
>>>> has hit a bug as soon as they login because of the oops, and oopses
>>>> are being actively tracked for people who opt-in to reporting crashes
>>>> to our backtrace server:
>>>>
>>>> https://retrace.fedoraproject.org/faf/problems/
>>>>
>>>> Now maybe this really is 2 kernel bugs, 1. The oops turning into a
>>>> hang with 4.12 and 2. The oops happening at all, but an oops is never
>>>> normal and really must be fixed. In case you believe that the oops
>>>> happening at all is also a kernel bug please file a bug for that
>>>> too and cross-reference the 2.
>>>
>>> I think so. The Linux kernel oops is about a wrong expectation of the
>>> kernel: VirtualBox exposes the size of the XSAVE area (same value as
>>> on the host, 0x3c0 = 960 bytes). VBox does not expose bits 0...3 of
>>> the eax register (XSAVEOPT, XSAVEC/XRSTOR, XGETBV, XSAVES/XRSTORS
>>> all not available). In general it's all about do_extra_xstate_size_checks().
>>> That function is a bit hard to follow without any additional debug
>>> output. All I can see is that
>>>
>>>     fpu_kernel_xstate_size = 0x440 while
>>>     paranoid_xstate_size   = 0x240
>>>
>>> and therefore the XSTATE_WARN_ON() statement triggers. As I said, the
>>> kernel expects different values in the CPUID registers and warns about
>>> mismatched expectations. I'm 99% sure that the kernel would run correctly
>>> if this warning is just ignored. The warning is paranoia.
>>
>> Ok, thank you for the explanation, lets see what the upstream devs
>> have to say. My main desire here is to not have the kernel oops,
>> independent of whether the fix is on the vbox or kernel side.
>>
>>> That XSAVE stuff is work in progress and future versions of VirtualBox
>>> will change the exposed features.
>>>
>>> I will create bug reports.
>>
>> Thank you.
> 
> Created https://bugzilla.kernel.org/show_bug.cgi?id=195961 for the first
> issue (hanging kernel during early init). I have to clear up my mind before
> I report the 2nd problem (the actual XSTATE warning).

Thanks, I've added myself to the Cc, and I've added a comment
to clarify that this is a kernel regression over 4.11.

Regards,

Hans