VirtualBox

Opened 7 years ago

Last modified 7 years ago

#16429 new defect

Hard lockup of Linux guests on Mac Host

Reported by: drewmoseley Owned by:
Component: other Version: VirtualBox 5.1.14
Keywords: lockup, hang Cc:
Guest type: Linux Host type: other

Description

I am running an Ubuntu 16.04 guest on a MacOS Sierra host. In my guest I am running a Yocto project build (https://yoctoproject.org/). This is a fairly substantial workload and eventually the target apparently locks up. I've try to get kernel logs out of dmesg, /var/log/syslog and the VT1 console but when the system is hung, there is nothing printed to any of those locations indicating any trouble.

I have seen this happen also on Ubuntu 14 and CentOS 7 targets.

It is reproducible 100% of the time with the Yocto project build. When it is hung, the GUI is still displayed but nothing updates, not even the clock app.

I tested with both VirtualBox 5.1.14 and 5.0.32 and could reproduce the issue with both versions. I'm testing now with the 5.1.15 nightly build

Attachments (6)

VBox.log (123.3 KB ) - added by drewmoseley 7 years ago.
VBox-single-core-no-APIC-no-failure.log (83.0 KB ) - added by drewmoseley 7 years ago.
Log file of my single build that did _not_ lock up
VBox-debug-info-apic-ioapic.log (81.5 KB ) - added by drewmoseley 7 years ago.
VBox log from hung session in debug mode.
lockup-info-apic-ioapic.png (242.2 KB ) - added by drewmoseley 7 years ago.
Debug window output of hung system.
multicore-hang-VBox.log (125.8 KB ) - added by drewmoseley 7 years ago.
VBox.log file from multicore guest hang
multicore-hang-debug-commands.txt (35.2 KB ) - added by drewmoseley 7 years ago.
Debug command output from multicore guest hang

Download all attachments as: .zip

Change History (24)

comment:1 by Frank Mehnert, 7 years ago

The absolut minimal required information is the VBox.log file of such a VM session.

Also, is your guest working on shared folders and is your guest configured to use more than 1 VCPU?

comment:2 by drewmoseley, 7 years ago

No luck with the 5.1.15 test build.

I do use shared folders as well as NFS.

I'll run a new build and trigger the failure and attach the VBox.log file.

After that I will try as a single-core system with IOAPIC disabled. This seems similar to https://www.virtualbox.org/ticket/15529.

comment:3 by drewmoseley, 7 years ago

Attached is a VBox.log file when the system is in the hung state.

Next I'll try a single core system with IOAPIC disabled.

by drewmoseley, 7 years ago

Attachment: VBox.log added

comment:4 by drewmoseley, 7 years ago

I forgot to point out that I reverted to the released 5.1.14 r112924 release for the current testing. The 5.1.15 test build seemed to have some issues with host-only networking stability.

by drewmoseley, 7 years ago

Log file of my single build that did _not_ lock up

comment:5 by drewmoseley, 7 years ago

Initial single core build with no IOAPIC succeeded. I'm going to bump up the parallelism of my build but leave it at unicore to stress the system a bit.

comment:6 by drewmoseley, 7 years ago

No failures with increased parallelism in my builds.

Using a single core build with IOAPIC explicitly enabled I am also unable to reproduce the system hang.

I'll put it back to a multi-core system and run with debugging enabled.

by drewmoseley, 7 years ago

VBox log from hung session in debug mode.

by drewmoseley, 7 years ago

Attachment: lockup-info-apic-ioapic.png added

Debug window output of hung system.

comment:7 by drewmoseley, 7 years ago

I've attached the log and debug window output from a hung system with debugging enabled. For some reason copy/paste doesn't work from the debug window in MacOS so I attached it as a png.

comment:8 by Frank Mehnert, 7 years ago

To me this looks like a duplicate of #14089.

comment:9 by drewmoseley, 7 years ago

I'm not doing significant vboxsf traffic. I can try disabling all file shares and removing the module to see if that has any effect.

Also, I get no kernel stack traces so it's hard to say for certain whether this is the same or not.

comment:10 by drewmoseley, 7 years ago

I removed all shared folders, unloaded the vboxsf module, and the system still hung.

comment:11 by Frank Mehnert, 7 years ago

Thanks for the additional testing. So it doesn't look like this is related to shared folders at all. Let me summarize: You observe the guest hang with multi-SMP guests as well as with a single SMP guest if the I/O-APIC is enabled. With I/O-APIC disabled you don't see the hang.

comment:12 by Frank Mehnert, 7 years ago

Could you provide additional information: Try to reproduce the hang with 1 VCPU with I/O-APIC enabled, then enter

info ioapic
detect
dmesg
info cpum
info cpumguest

into the debug console. I saw you already found out how to enable it. We would prefer if you could copy+paste the output from the console to a separate file and attach it to this ticket as text file not .png. And please do also attach the corresponding VBox.log file to the VM session where you took the commands for.

And it would also help if you could repeat the test with 2 VCPUs and attach the same amount of files (info ... from above + corresponding VBox.log file).

comment:13 by drewmoseley, 7 years ago

Frank, your summary is close. With a single core system, I have no guest OS hang with ioapic enabled or disabled. With a multicore system it seems to hang every time with no kernel messages or dmesg output on the guest OS.

As an experiment, I tried disabling nested paging and the system did not lock up. It ran extremely slowly though so I gave up after about 36h of building and my build was still only about 30% complete.

I'll try to get the debug info you requested. When I tried to get the debug info previously, copy/paste on my MacOS Sierra host from the debug window did not work. it worked for other apps, thus the png file rather than a text file. I'll give it another shot. Are there any tricks to using copy/paste from the debug window on MacOS Sierra?

in reply to:  13 comment:14 by Socratis, 7 years ago

When I tried to get the debug info previously, copy/paste on my MacOS Sierra host from the debug window did not work. It worked for other apps, thus the png file rather than a text file. I'll give it another shot. Are there any tricks to using copy/paste from the debug window on MacOS Sierra?

Doesn't the right-click menu working on 10.12? Or the ⌘-A, ⌘-C shortcuts? I'm on a 10.9.5 and it works as advertised.

comment:15 by drewmoseley, 7 years ago

Rereading the above, Frank your summary is identical to mine. Apologies for confusing things.

Regarding copy/paste in the debug window, it seems to be working now. Not sure why I had issues with it before. I'm in the process of pulling those logs for all three scenarios and will post them as soon as they are available.

Version 0, edited 7 years ago by drewmoseley (next)

by drewmoseley, 7 years ago

Attachment: multicore-hang-VBox.log added

VBox.log file from multicore guest hang

by drewmoseley, 7 years ago

Debug command output from multicore guest hang

comment:16 by drewmoseley, 7 years ago

I've attached both the VBox.log and Debug window output from my multicore guest when the system is in the hung state.

So far I've been unable to reproduce with a unicore guest. Is there value in providing the output of those systems?

comment:17 by Frank Mehnert, 7 years ago

Hmm, guest multicore hang but multicore-hang-debug-commands.txt shows only 1 VCPU?

comment:18 by drewmoseley, 7 years ago

Apologies for going silent on this. I got sidetracked by starting a new job. I am unable to reproduce this in one or two build tries using v5.1.22. I'm not sure if that fixed it or just made it more difficult to reproduce but for the time being I am unblocked.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use