VirtualBox

Ticket #16429 (new defect)

Opened 6 years ago

Last modified 5 years ago

Hard lockup of Linux guests on Mac Host

Reported by: drewmoseley Owned by:
Component: other Version: VirtualBox 5.1.14
Keywords: lockup, hang Cc:
Guest type: Linux Host type: other

Description

I am running an Ubuntu 16.04 guest on a MacOS Sierra host. In my guest I am running a Yocto project build ( https://yoctoproject.org/). This is a fairly substantial workload and eventually the target apparently locks up. I've try to get kernel logs out of dmesg, /var/log/syslog and the VT1 console but when the system is hung, there is nothing printed to any of those locations indicating any trouble.

I have seen this happen also on Ubuntu 14 and CentOS 7 targets.

It is reproducible 100% of the time with the Yocto project build. When it is hung, the GUI is still displayed but nothing updates, not even the clock app.

I tested with both VirtualBox 5.1.14 and 5.0.32 and could reproduce the issue with both versions. I'm testing now with the 5.1.15 nightly build

Attachments

VBox.log Download (123.3 KB) - added by drewmoseley 6 years ago.
VBox-single-core-no-APIC-no-failure.log Download (83.0 KB) - added by drewmoseley 6 years ago.
Log file of my single build that did _not_ lock up
VBox-debug-info-apic-ioapic.log Download (81.5 KB) - added by drewmoseley 6 years ago.
VBox log from hung session in debug mode.
lockup-info-apic-ioapic.png Download (242.2 KB) - added by drewmoseley 6 years ago.
Debug window output of hung system.
multicore-hang-VBox.log Download (125.8 KB) - added by drewmoseley 6 years ago.
VBox.log file from multicore guest hang
multicore-hang-debug-commands.txt Download (35.2 KB) - added by drewmoseley 6 years ago.
Debug command output from multicore guest hang

Change History

comment:1 Changed 6 years ago by frank

The absolut minimal required information is the VBox.log file of such a VM session.

Also, is your guest working on shared folders and is your guest configured to use more than 1 VCPU?

comment:2 Changed 6 years ago by drewmoseley

No luck with the 5.1.15 test build.

I do use shared folders as well as NFS.

I'll run a new build and trigger the failure and attach the VBox.log file.

After that I will try as a single-core system with IOAPIC disabled. This seems similar to https://www.virtualbox.org/ticket/15529.

comment:3 Changed 6 years ago by drewmoseley

Attached is a VBox.log file when the system is in the hung state.

Next I'll try a single core system with IOAPIC disabled.

Changed 6 years ago by drewmoseley

comment:4 Changed 6 years ago by drewmoseley

I forgot to point out that I reverted to the released 5.1.14 r112924 release for the current testing. The 5.1.15 test build seemed to have some issues with host-only networking stability.

Changed 6 years ago by drewmoseley

Log file of my single build that did _not_ lock up

comment:5 Changed 6 years ago by drewmoseley

Initial single core build with no IOAPIC succeeded. I'm going to bump up the parallelism of my build but leave it at unicore to stress the system a bit.

comment:6 Changed 6 years ago by drewmoseley

No failures with increased parallelism in my builds.

Using a single core build with IOAPIC explicitly enabled I am also unable to reproduce the system hang.

I'll put it back to a multi-core system and run with debugging enabled.

Changed 6 years ago by drewmoseley

VBox log from hung session in debug mode.

Changed 6 years ago by drewmoseley

Debug window output of hung system.

comment:7 Changed 6 years ago by drewmoseley

I've attached the log and debug window output from a hung system with debugging enabled. For some reason copy/paste doesn't work from the debug window in MacOS so I attached it as a png.

comment:8 Changed 6 years ago by frank

To me this looks like a duplicate of #14089.

comment:9 Changed 6 years ago by drewmoseley

I'm not doing significant vboxsf traffic. I can try disabling all file shares and removing the module to see if that has any effect.

Also, I get no kernel stack traces so it's hard to say for certain whether this is the same or not.

comment:10 Changed 6 years ago by drewmoseley

I removed all shared folders, unloaded the vboxsf module, and the system still hung.

comment:11 Changed 6 years ago by frank

Thanks for the additional testing. So it doesn't look like this is related to shared folders at all. Let me summarize: You observe the guest hang with multi-SMP guests as well as with a single SMP guest if the I/O-APIC is enabled. With I/O-APIC disabled you don't see the hang.

comment:12 Changed 6 years ago by frank

Could you provide additional information: Try to reproduce the hang with 1 VCPU with I/O-APIC enabled, then enter

info ioapic
detect
dmesg
info cpum
info cpumguest

into the debug console. I saw you already found out how to enable it. We would prefer if you could copy+paste the output from the console to a separate file and attach it to this ticket as text file not .png. And please do also attach the corresponding VBox.log file to the VM session where you took the commands for.

And it would also help if you could repeat the test with 2 VCPUs and attach the same amount of files (info ... from above + corresponding VBox.log file).

comment:13 follow-up: ↓ 14 Changed 6 years ago by drewmoseley

Frank, your summary is close. With a single core system, I have no guest OS hang with ioapic enabled or disabled. With a multicore system it seems to hang every time with no kernel messages or dmesg output on the guest OS.

As an experiment, I tried disabling nested paging and the system did not lock up. It ran extremely slowly though so I gave up after about 36h of building and my build was still only about 30% complete.

I'll try to get the debug info you requested. When I tried to get the debug info previously, copy/paste on my MacOS Sierra host from the debug window did not work. it worked for other apps, thus the png file rather than a text file. I'll give it another shot. Are there any tricks to using copy/paste from the debug window on MacOS Sierra?

comment:14 in reply to: ↑ 13 Changed 6 years ago by socratis

When I tried to get the debug info previously, copy/paste on my MacOS Sierra host from the debug window did not work. It worked for other apps, thus the png file rather than a text file. I'll give it another shot. Are there any tricks to using copy/paste from the debug window on MacOS Sierra?

Doesn't the right-click menu working on 10.12? Or the ⌘-A, ⌘-C shortcuts? I'm on a 10.9.5 and it works as advertised.

comment:15 Changed 6 years ago by drewmoseley

Regarding copy/paste in the debug window, it seems to be working now. Not sure why I had issues with it before. I'm in the process of pulling those logs for all three scenarios and will post them as soon as they are available.

Last edited 6 years ago by drewmoseley (previous) (diff)

Changed 6 years ago by drewmoseley

VBox.log file from multicore guest hang

Changed 6 years ago by drewmoseley

Debug command output from multicore guest hang

comment:16 Changed 6 years ago by drewmoseley

I've attached both the VBox.log and Debug window output from my multicore guest when the system is in the hung state.

So far I've been unable to reproduce with a unicore guest. Is there value in providing the output of those systems?

comment:17 Changed 5 years ago by frank

Hmm, guest multicore hang but multicore-hang-debug-commands.txt shows only 1 VCPU?

comment:18 Changed 5 years ago by drewmoseley

Apologies for going silent on this. I got sidetracked by starting a new job. I am unable to reproduce this in one or two build tries using v5.1.22. I'm not sure if that fixed it or just made it more difficult to reproduce but for the time being I am unblocked.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use