#8311 reopened defect

SATA AHCI Errors: READ FPDMA QUEUED

Reported by:	andornaut	Owned by:
Component:	other	Version:	VirtualBox 4.0.2
Keywords:		Cc:
Guest type:	Linux	Host type:	Windows

Description (last modified by aeichner)

My guest OS hangs periodically with a harddrive command error followed by an attempt to reset the HD (reproduced with Ubuntu and ArcLinux). This issue does not occur in the host OS (Windows 7), only in the Guest OS.

Environment:

Ubuntu running in a Virtual Box 4.0.2 running on Windows 7 64bit.
Asus G53JW Laptop
Intel X25 SSD HD

Log excerpts:

Jan 17 14:48:32 vm kernel: [ 1513.120252] ata3: hard resetting link
Jan 17 14:48:33 vm kernel: [ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Jan 17 14:48:33 vm kernel: [ 1513.471232] ata3.00: configured for UDMA/133
Jan 17 14:48:33 vm kernel: [ 1513.471241] ata3.00: device reported invalid CHS
sector 0
Jan 17 14:48:33 vm kernel: [ 1513.471255] ata3: EH complete

[ 1513.120200] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[ 1513.120210] ata3.00: failed command: READ FPDMA QUEUED
[ 1513.120222] ata3.00: cmd 60/08:00:30:66:4c/00:00:00:00:00/40 tag 0 ncq 4096
in
[ 1513.120224]          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
(timeout)
[ 1513.120230] ata3.00: status: { DRDY }
[ 1513.120252] ata3: hard resetting link
[ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1513.471232] ata3.00: configured for UDMA/133
[ 1513.471241] ata3.00: device reported invalid CHS sector 0
[ 1513.471255] ata3: EH complete

Attachments (5)

dmesg-jackieku.txt (48.1 KB ) - added by JackieKu 13 years ago.
VBox.log (102.2 KB ) - added by pot 7 years ago.: Logs of the machine descriped in comment by pot
Host.messages (21.0 KB ) - added by Harry M 7 years ago.: CentOS Virtualbox Host /var/log/messages file
Devuan.messages (59.7 KB ) - added by Harry M 7 years ago.: Devuan (guest VM) /var/log/messages
Devuan-VM.log (512.0 KB ) - added by Harry M 7 years ago.: Virtualbox guest Log for Devuan VM

Download all attachments as: .zip

Change History (21)

comment:1 by cmeyer, 14 years ago

This issue is also occurring for me on Fedora 14 and Ubuntu 10.10. I have tried with using SATA, SCSI and SAS and the error still occurs.

Environment: DELL E6410 Windows 7 64bit

Guest OS's Tested: Ubuntu 10.10 i386 Fedora 14 x86_64

This issue does not seem to occur on a Windows XP 32 bit virtual machine.

comment:2 by andornaut, 14 years ago

I've been able to workaround (not to say "solve") this issue by using the following kernel boot option: nolapic_timer

This workaround is described in this bug: #7619

comment:3 by Frank Mehnert, 14 years ago

Resolution:	→ fixed
Status:	new → closed

Please reopen if still relevant with VBox 4.1.4. From your description this looks like an issue related to the virtual time and there were some fixes in between.

comment:4 by JackieKu, 13 years ago

Resolution:	fixed
Status:	closed → reopened

This issue still happens in 4.1.12 and 4.1.14. Added noapictimer kernel options seems to reduce (but not gone completely) the happen rate.

Several disks got reset when it occurs. I'm using SATA AHCI controller.

Environment:
Host: Windows 7 x64
Guest: Gentoo Linux x64 (linux 3.2.12)
HDD: 6 Seagate 1TB (ST31000524AS and ST31000528AS) whole disk raw access + 1 WD 80GB raw access with some partitions

Last edited 13 years ago by JackieKu (previous) (diff)

by JackieKu, 13 years ago

Attachment:	dmesg-jackieku.txt added

comment:5 by StarNamer, 10 years ago

I recently started testing ZFS using a VirtualBox VM and have started to see this and similar errors. My environment is VirtualBox 4.3.18_Debian r96516 running under Debian (Sid) kernel 3.16.0-4-amd64. The VM is also running kernel 3.16.0-4-amd64 with ZFS/SPL 0.6.3 (from ZFSforLinux). The host system is not reporting any errors.

Apr  6 10:12:59 model-zfs kernel: [290529.896417] ata7.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x6 frozen
Apr  6 10:12:59 model-zfs kernel: [290529.896434] ata7.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:12:59 model-zfs kernel: [290529.896449] ata7.00: cmd 61/01:48:f4:59:00/00:00:00:00:00/40 tag 9 ncq 512 out
Apr  6 10:12:59 model-zfs kernel: [290529.896449]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Apr  6 10:12:59 model-zfs kernel: [290529.896456] ata7.00: status: { DRDY }
Apr  6 10:12:59 model-zfs kernel: [290529.896476] ata7: hard resetting link
Apr  6 10:13:00 model-zfs kernel: [290530.216457] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:13:00 model-zfs kernel: [290530.217616] ata7.00: configured for UDMA/133
Apr  6 10:13:00 model-zfs kernel: [290530.217633] ata7.00: device reported invalid CHS sector 0
Apr  6 10:13:00 model-zfs kernel: [290530.217661] ata7: EH complete

Apr  6 10:24:56 model-zfs kernel: [291246.888769] ata4.00: exception Emask 0x0 SAct 0x400 SErr 0x0 action 0x6 frozen
Apr  6 10:24:56 model-zfs kernel: [291246.888801] ata4.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:24:56 model-zfs kernel: [291246.888830] ata4.00: cmd 61/19:50:2b:a7:01/00:00:00:00:00/40 tag 10 ncq 12800 out
Apr  6 10:24:56 model-zfs kernel: [291246.888830]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:24:56 model-zfs kernel: [291246.888852] ata4.00: status: { DRDY }
Apr  6 10:24:56 model-zfs kernel: [291246.888883] ata4: hard resetting link
Apr  6 10:24:57 model-zfs kernel: [291247.248428] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:24:57 model-zfs kernel: [291247.249216] ata4.00: configured for UDMA/133
Apr  6 10:24:57 model-zfs kernel: [291247.249229] ata4.00: device reported invalid CHS sector 0
Apr  6 10:24:57 model-zfs kernel: [291247.249254] ata4: EH complete

Apr  6 10:32:02 model-zfs kernel: [291672.872331] ata4.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen
Apr  6 10:32:02 model-zfs kernel: [291672.872348] ata4.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:32:02 model-zfs kernel: [291672.872364] ata4.00: cmd 61/02:80:92:03:00/00:00:00:00:00/40 tag 16 ncq 1024 out
Apr  6 10:32:02 model-zfs kernel: [291672.872364]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Apr  6 10:32:02 model-zfs kernel: [291672.872371] ata4.00: status: { DRDY }
Apr  6 10:32:02 model-zfs kernel: [291672.872390] ata4: hard resetting link
Apr  6 10:32:03 model-zfs kernel: [291673.193487] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:32:03 model-zfs kernel: [291673.194270] ata4.00: configured for UDMA/133
Apr  6 10:32:03 model-zfs kernel: [291673.194302] ata4.00: device reported invalid CHS sector 0
Apr  6 10:32:03 model-zfs kernel: [291673.194335] ata4: EH complete

Apr  6 10:36:45 model-zfs kernel: [291955.882996] ata2.00: exception Emask 0x0 SAct 0xc000000 SErr 0x0 action 0x6 frozen
Apr  6 10:36:45 model-zfs kernel: [291955.883014] ata2.00: failed command: READ FPDMA QUEUED
Apr  6 10:36:45 model-zfs kernel: [291955.883028] ata2.00: cmd 60/30:d0:b1:d7:75/00:00:01:00:00/40 tag 26 ncq 24576 in
Apr  6 10:36:45 model-zfs kernel: [291955.883028]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:36:45 model-zfs kernel: [291955.883036] ata2.00: status: { DRDY }
Apr  6 10:36:45 model-zfs kernel: [291955.883042] ata2.00: failed command: READ FPDMA QUEUED
Apr  6 10:36:45 model-zfs kernel: [291955.883053] ata2.00: cmd 60/10:d8:f1:d7:75/00:00:01:00:00/40 tag 27 ncq 8192 in
Apr  6 10:36:45 model-zfs kernel: [291955.883053]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:36:45 model-zfs kernel: [291955.883060] ata2.00: status: { DRDY }
Apr  6 10:36:45 model-zfs kernel: [291955.883077] ata2: hard resetting link
Apr  6 10:36:46 model-zfs kernel: [291956.200464] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:36:46 model-zfs kernel: [291956.201409] ata2.00: configured for UDMA/133
Apr  6 10:36:46 model-zfs kernel: [291956.201422] ata2.00: device reported invalid CHS sector 0
Apr  6 10:36:46 model-zfs kernel: [291956.201428] ata2.00: device reported invalid CHS sector 0
Apr  6 10:36:46 model-zfs kernel: [291956.201454] ata2: EH complete

Is this a known bug or simple the result of ZFS stressing the virtual system too much?

comment:6 by aeichner, 10 years ago

Description:	modified (diff)

Please upload the VBox.log of the problematic VM.

comment:7 by tigreci_2, 8 years ago

Regresion in 5.1.12 r112440(Qt5.6.2), same error ata3.00: failed command: WRITE FPDMA QUEUED ata3.00: cmd 61/88:88;78:8f:a5/03:00:00:00:00/40 tag 17 ncq 462848 out

res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

ata3.00: status: { DRDY }

comment:8 by aeichner, 8 years ago

Please upload a VBox.log of the affected VM.

comment:9 by pot, 7 years ago

I have an apparently similar problem. I have had a series of disk errors inside the guest box. This is the last one.

Host Debian buster
4.14.0-3-amd64
Virtualbox Debian 5.2.8-dfsg-2

Guest Debian 7.11
Linux kernel 3.2.0.5-686-pae

Guest system error log:

Mar  8 07:13:39 ala kernel: [129507.443035] ata4: failed to read log page 10h (errno=-5)
Mar  8 07:13:39 ala kernel: [129507.444165] ata4.00: exception Emask 0x1 SAct 0xc0 SErr 0x0 action 0x0
Mar  8 07:13:39 ala kernel: [129507.445373] ata4.00: irq_stat 0x40000008
Mar  8 07:13:39 ala kernel: [129507.446367] ata4.00: failed command: READ FPDMA QUEUED
Mar  8 07:13:39 ala kernel: [129507.446743] ata4.00: cmd 60/30:30:7e:2c:77/00:00:2f:00:00/40 tag 6 ncq 24576 in
Mar  8 07:13:39 ala kernel: [129507.446743]          res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Mar  8 07:13:39 ala kernel: [129507.450247] ata4.00: status: { DRDY }
Mar  8 07:13:39 ala kernel: [129507.450473] ata4.00: failed command: READ FPDMA QUEUED
Mar  8 07:13:39 ala kernel: [129507.450733] ata4.00: cmd 60/38:38:be:37:77/00:00:2f:00:00/40 tag 7 ncq 28672 in
Mar  8 07:13:39 ala kernel: [129507.450733]          res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Mar  8 07:13:39 ala kernel: [129507.451656] ata4.00: status: { DRDY }
Mar  8 07:13:39 ala kernel: [129507.452753] ata4.00: configured for UDMA/133
Mar  8 07:13:39 ala kernel: [129507.453856] ata4: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
Mar  8 07:13:39 ala kernel: [129507.455029] ata4: irq_stat 0x40000008

I have attached virtual box logs.

Last edited 7 years ago by pot (previous) (diff)

by pot, 7 years ago

Attachment:	VBox.log added

Logs of the machine descriped in comment by pot

comment:10 by Harry M, 7 years ago

Add me to the list of victims. I have a CentOS host with vbox 5.2.8 and multiple guests. My Devuan VM ended up with its /var filesystem (LVM ext4) mounted ro sometime after boot. I had run package updates successfully 3 times in the 10 days it had been up without incident, but failed upon an attempt to do a package update (apt uses /var) yesterday.

I ran fsck to fix /var as it had bad free inode and bad free block counts. After a reboot of the VM, all is well again. I can find corresponding errors in both the guest (Devuan) /var/log/messages and the VirtualBox log for the guest VM. Also, I have been able to ascertain that the errors seem to be unknown to the host OS (no related error messages at the same time). This squares with the other reporters' findings.

In the vbox log, note messages beginning at 22:14:24.001378 244:26:07.059612

In the Devuan /var/log/messages file, corresponding messages are around Mar 28 15:22

The host log does not seem to indicate any issues with disk access.

We are 8 hours west of GMT, and I think we are savings time, so actually -7 behind. The VM may be off several seconds from the host (I have them using ntp rather than host time sync).

Last edited 7 years ago by Harry M (previous) (diff)

by Harry M, 7 years ago

Attachment:	Host.messages added

CentOS Virtualbox Host /var/log/messages file

by Harry M, 7 years ago

Attachment:	Devuan.messages added

Devuan (guest VM) /var/log/messages

by Harry M, 7 years ago

Attachment:	Devuan-VM.log added

Virtualbox guest Log for Devuan VM

follow-up: 13 comment:11 by rst07, 7 years ago

the problem occurs if the disk is configured as dynamic, when you transfer the disk to a fixed size, the problem disappears

comment:12 by ckujau, 7 years ago

Same here with VirtualBox-5.2-5.2.20_125813_fedora26-1.x86_64 running on a Fedora 28 host. The VM is running Ubuntu 18.04.1 LTS (Linux 4.15) and has its disks configured as:

$ vboxmanage showvminfo ubuntu0 | grep Stora
Storage Controller Name (0):            SATA
Storage Controller Type (0):            IntelAhci
Storage Controller Instance Number (0): 0
Storage Controller Max Port Count (0):  30
Storage Controller Port Count (0):      3
Storage Controller Bootable (0):        on

On the host, no I/O errors whatsoever are logged, so the storage (an external USB disk) is assumed to be stable. When there's lots of disk I/O going on in the guest, the following is logged to Logs/VBox.log from time to time:

01:33:14.526566 VD#2: Write request was active for 40 seconds
01:33:14.526656 VD#2: Cancelling all active requests
01:33:14.529303 AHCI#0: Port 2 reset
01:33:14.532915 VD#2: Cancelling all active requests

And in the guest then:

Oct 23 14:30:55 ubuntu0 kernel: ata3.00: exception Emask 0x0 SAct 0x7dffffff SErr 0x0 action 0x6 frozen
Oct 23 14:30:55 ubuntu0 kernel: ata3.00: failed command: WRITE FPDMA QUEUED
Oct 23 14:30:55 ubuntu0 kernel: ata3.00: cmd 61/90:00:e8:5c:4d/03:00:08:00:00/40 tag 0 ncq dma 466944 out
                                         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 23 14:30:55 ubuntu0 kernel: ata3.00: status: { DRDY }
[...]
Oct 23 14:31:01 ubuntu0 kernel: ata3.00: device reported invalid CHS sector 0

The rsync process running in the guest (copying data from one vdisk to another) doesn't seem to notice though and is still running, ~~I wonder about data integrity though~~. OK, so the rsync process in the guest ended w/o errors (RC 0) and copied ~70 GB from one disk to another. But seeing these device errors in the guest I'm running now a btrfs scrub on the disk and here the errors are indeed visible:

$ btrfs scrub status /mnt
scrub status for 044722df-de28-4604-a48d-92779769f47c
        scrub started at Tue Oct 23 14:37:33 2018, running for 00:25:32
        total bytes scrubbed: 45.30GiB with 192 errors
        error details: csum=192
        corrected errors: 0, uncorrectable errors: 192, unverified errors: 0

And while scrubbing, the guest's syslog tell me which inodes (and files) have these checksum errors - all Git .pack files. That's exactly the reason why I copied these files from an xfs to a btrfs disk within this guest because git failed to read its objects before and now I know why :-\

Last edited 7 years ago by ckujau (previous) (diff)

in reply to: 11 comment:13 by ckujau, 6 years ago

Replying to rst07:

the problem occurs if the disk is configured as dynamic, when you transfer the disk to a fixed size, the problem disappears

Thanks for the hint! I can confirm that the problem went away with a Fixed variant disk.

comment:14 by coyotte, 6 years ago

Same here virtualbox 5.2 Host and Guest ubuntu 18.04

comment:15 by hwertz, 5 years ago

I've found a usable workaround to be switching my disks in VirtualBox to IDE. The virtual IDE "can" still time out, but IDE's multisector read and write mode max at like 16 sectors (8KB), or maybe 256 sectors (128KB), versus SATA allowing for 64MB or more via larger transfers and native command queueing; that's 1/512th the data in that queue. In days of yore, SATA gave a massive speed increase in VMs compared to IDE (the 8KB transfers plus mucking about with I/O ports meant a lot of virtualization overhead); with VT-X flipped on the difference is low enough; I haven't run some synthetic disk I/O benchmark but in real use I can't tell the difference in terms of speed or CPU usage. If my physical disk I/O is nice, the IDE runs nice; if the disk I/O is slow, the OS in the virtual box (Linux or Windows) simply run through their read and write queues pretty slowly, rather than spitting out large 64MB requests then timing out.

Root cause is the Linux write cache (on the host) becoming quite large, then disk I/O is quite slow during the time that cache flushes out. If any disk I/O from a VirtualBox takes over ~15 seconds, the Linux or Windows OS in the virtual machine assumes the SATA drive failed and counts it as a read or write error.

For a Linux VM, you just add the IDE controller, remove your disk (and CD if you want) from SATA and add them onto IDE. Remove the SATA controller if you want. Done.

For Windows 7, add IDE, boot into 7. Make sure it installs the IDE driver. Shut down and move the disks to IDE, remove SATA if you want.

For 10, google windows 10 sata to ide, and where they talk about mucking about in the BIOS muck about in VirtualBox storage settings instead; you have to add the IDE controller, boot, run a bcdedit command, power down, switch the disks to the IDE controller (and remove SATA if you want), it boots into a safe mode once, then run a bcdedit command again so it boots into normal mode.

Last edited 5 years ago by hwertz (previous) (diff)

comment:16 by PetroCC, 4 years ago

I ran into this but this afternoon on 6.1.22r144080 running on FreeBSD.

I didn't save the guest system logs, because they were just like the above.

I attempted to work around the problem by detaching the disks, removing the SATA controller, adding a SAS controller, and then attaching the disks to that.

That got a little further down the road, but then hit this:

Aug 23 04:27:59 mailserver kernel: [  406.717018] mptscsih: ioc0: attempting task abort! (sc=0000000016c22ccc)
Aug 23 04:27:59 mailserver kernel: [  406.717029] sd 2:0:0:0: [sda] tag#107 CDB: Read(10) 28 00 00 41 1d 80 00 00 08 00
Aug 23 04:27:59 mailserver kernel: [  406.889068] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=0000000016c22ccc)
Aug 23 04:27:59 mailserver kernel: [  406.889076] mptscsih: ioc0: attempting task abort! (sc=000000001b122e0b)
Aug 23 04:27:59 mailserver kernel: [  406.889085] sd 2:0:0:0: [sda] tag#104 CDB: Write(10) 2a 00 01 04 18 e8 00 00 08 00
Aug 23 04:28:00 mailserver kernel: [  407.056968] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=000000001b122e0b)
Aug 23 04:28:00 mailserver kernel: [  407.073044] mptscsih: ioc0: attempting target reset! (sc=0000000016c22ccc)
Aug 23 04:28:00 mailserver kernel: [  407.073055] sd 2:0:0:0: [sda] tag#107 CDB: Read(10) 28 00 00 41 1d 80 00 00 08 00
Aug 23 04:28:00 mailserver kernel: [  407.249019] mptscsih: ioc0: target reset: SUCCESS (sc=0000000016c22ccc)
Aug 23 04:28:22 mailserver PackageKit: daemon quit
Aug 23 04:28:22 mailserver systemd[1]: packagekit.service: Succeeded.
Aug 23 04:28:30 mailserver kernel: [  437.436685] mptscsih: ioc0: attempting task abort! (sc=000000005ef0906a)
Aug 23 04:28:30 mailserver kernel: [  437.436694] sd 2:0:0:0: [sda] tag#96 CDB: Write(10) 2a 00 00 c5 19 80 00 00 80 00
Aug 23 04:28:30 mailserver kernel: [  437.772752] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=000000005ef0906a)
Aug 23 04:28:30 mailserver kernel: [  437.827952] mptscsih: ioc0: attempting target reset! (sc=000000001b122e0b)
Aug 23 04:28:30 mailserver kernel: [  437.827976] sd 2:0:0:0: [sda] tag#108 CDB: Write(10) 2a 00 01 04 18 e8 00 00 08 00
Aug 23 04:28:30 mailserver kernel: [  437.992679] mptscsih: ioc0: target reset: SUCCESS (sc=000000001b122e0b)
Aug 23 04:29:01 mailserver kernel: [  468.160319] mptscsih: ioc0: attempting target reset! (sc=000000005ef0906a)
Aug 23 04:29:01 mailserver kernel: [  468.160328] sd 2:0:0:0: [sda] tag#97 CDB: Write(10) 2a 00 00 c5 19 80 00 00 80 00

I'm guessing that the bug is a little further down the stack than the SATA code.

Note: See TracTickets for help on using tickets.

Download in other formats: