VirtualBox

Opened 10 years ago

Closed 7 years ago

#13956 closed defect (fixed)

VM hangs in AHCI when host busy since 4.3.x

Reported by: Bernd Hohman Owned by:
Component: virtual disk Version: VirtualBox 4.3.24
Keywords: AHCI, Host I/O Cache Cc:
Guest type: Linux Host type: Linux

Description

When the host disk (RAID-5 with slow HD) is moderate busy, several Debian Guests with AHCI and enabled Host I/O Cache stop working.

Today it happened between 11:05-11:15am GMT to 6 Guest in parallel. Other Guests on same Host with AHCI and disabled Host I/O Cache complained about AsyncCompletion but continued running.

No information in the Guest logfiles (syslog, kern.log), only VBox.log. No information in the Host logfiles (syslog, kern.log).

More information on "not working VMs": 'VBoxManage list runningvms' shows this VM as running. 'VBoxManage controlvm acpipowerbutton' shows no effect. 'VBoxManage controlvm poweroff' claims in VBox.log to have stopped the VM, but the command hangs displaying "0%...10%...20%...30%...40%..." and must be interrupted by C, then killing the VBoxHeadless process with "pkill".

All Guests were restarted around 02:30 GMT tonight and crashed around 11:10 GMT. Hanging VMs were restarted around 13:30 GMT

Description of attached files (renamed VBox.log files):

1) bad_blog.log: Guest with AHCI + Host I/O Cache problem. Hang after 08:33.32 runtime, then you can see the poweroff request.

2) bad_dns1.log: Guest with AHCI + Host I/O Cache problem. Hang after 08:33.59 runtime. Guest process was stopped with 'pkill', thats why it doesnt show the last logline with "AHCI#0P0: Canceled write at offset" as in bad_blog.log

3) good_oso.log: Guest with AHCI + no Host I/O Cache. Shows some AsyncCompletion problems but continues working

4) good_test.log: Same like (3)

5) Host version info

We removed Host I/O Cache on all Guests in the past which simply moved the problem to AsyncCompletion problems and crashes (and caused a very bad memory usage too).

Attachments (5)

bad_blog.log (74.5 KB ) - added by Bernd Hohman 10 years ago.
AHCI + Host I/O Cache crashed
bad_dns1.log (68.1 KB ) - added by Bernd Hohman 10 years ago.
AHCI + Host I/O Cache crashed
ok_oso.log (88.9 KB ) - added by Bernd Hohman 10 years ago.
AHCI + *no* Host I/O Cache working
ok_test.log (97.7 KB ) - added by Bernd Hohman 10 years ago.
AHCI + *no* Host I/O Cache working
version (198 bytes ) - added by Bernd Hohman 10 years ago.
Host Version Information

Download all attachments as: .zip

Change History (11)

by Bernd Hohman, 10 years ago

Attachment: bad_blog.log added

AHCI + Host I/O Cache crashed

by Bernd Hohman, 10 years ago

Attachment: bad_dns1.log added

AHCI + Host I/O Cache crashed

by Bernd Hohman, 10 years ago

Attachment: ok_oso.log added

AHCI + *no* Host I/O Cache working

by Bernd Hohman, 10 years ago

Attachment: ok_test.log added

AHCI + *no* Host I/O Cache working

by Bernd Hohman, 10 years ago

Attachment: version added

Host Version Information

comment:1 by Bernd Hohman, 10 years ago

This happens on all of our 3 Hosts after we upgraded from 4.2.x to 4.3.x. so it seems a regression. I'm switching to IDE host controller where ever its possible and will report the result.

comment:2 by Mihai Hanor, 10 years ago

Duplicate of #13105

in reply to:  2 comment:3 by Bernd Hohman, 10 years ago

Replying to mhanor:

Duplicate of #13105

Not really xD - my RAIDs are all fine.

comment:4 by Mihai Hanor, 10 years ago

The cause is the same, regardless if it's a bad sector or a timeout due to overload.

comment:5 by Bernd Hohman, 10 years ago

Well, at least changing to IDE doesnt help:

18:30:26.476015 PIIX3 ATA: Ctl#0: RESET, DevSel=0 AIOIf=0 CmdIf0=0xca (87557414 usec ago) CmdIf1=0xa0 (-1 usec ago)
18:30:30.050297 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting
18:30:31.588167 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting
18:30:31.598047 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting
18:30:54.313827 PIIX3 ATA: execution time for ATA command 0xca was 115 seconds
18:30:54.313846 PIIX3 ATA: Ctl#0: finished processing RESET

(Guest disk went to read-only after this I guess)

comment:6 by Frank Mehnert, 7 years ago

Resolution: fixed
Status: newclosed

Please reopen if still relevant with VBox 5.1.22.

Note: See TracTickets for help on using tickets.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette