Opened 14 years ago
Last modified 3 years ago
#8311 reopened defect
SATA AHCI Errors: READ FPDMA QUEUED
Reported by: | andornaut | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 4.0.2 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Windows |
Description (last modified by )
My guest OS hangs periodically with a harddrive command error followed by an attempt to reset the HD (reproduced with Ubuntu and ArcLinux). This issue does not occur in the host OS (Windows 7), only in the Guest OS.
Environment:
- Ubuntu running in a Virtual Box 4.0.2 running on Windows 7 64bit.
- Asus G53JW Laptop
- Intel X25 SSD HD
Log excerpts:
Jan 17 14:48:32 vm kernel: [ 1513.120252] ata3: hard resetting link Jan 17 14:48:33 vm kernel: [ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 17 14:48:33 vm kernel: [ 1513.471232] ata3.00: configured for UDMA/133 Jan 17 14:48:33 vm kernel: [ 1513.471241] ata3.00: device reported invalid CHS sector 0 Jan 17 14:48:33 vm kernel: [ 1513.471255] ata3: EH complete
[ 1513.120200] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [ 1513.120210] ata3.00: failed command: READ FPDMA QUEUED [ 1513.120222] ata3.00: cmd 60/08:00:30:66:4c/00:00:00:00:00/40 tag 0 ncq 4096 in [ 1513.120224] res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [ 1513.120230] ata3.00: status: { DRDY } [ 1513.120252] ata3: hard resetting link [ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1513.471232] ata3.00: configured for UDMA/133 [ 1513.471241] ata3.00: device reported invalid CHS sector 0 [ 1513.471255] ata3: EH complete
Attachments (5)
Change History (21)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
I've been able to workaround (not to say "solve") this issue by using the following kernel boot option: nolapic_timer
This workaround is described in this bug: #7619
comment:3 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Please reopen if still relevant with VBox 4.1.4. From your description this looks like an issue related to the virtual time and there were some fixes in between.
comment:4 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
This issue still happens in 4.1.12 and 4.1.14. Added noapictimer kernel options seems to reduce (but not gone completely) the happen rate.
Several disks got reset when it occurs. I'm using SATA AHCI controller.
Environment:
Host: Windows 7 x64
Guest: Gentoo Linux x64 (linux 3.2.12)
HDD: 6 Seagate 1TB (ST31000524AS and ST31000528AS) whole disk raw access + 1 WD 80GB raw access with some partitions
by , 12 years ago
Attachment: | dmesg-jackieku.txt added |
---|
comment:5 by , 9 years ago
I recently started testing ZFS using a VirtualBox VM and have started to see this and similar errors. My environment is VirtualBox 4.3.18_Debian r96516 running under Debian (Sid) kernel 3.16.0-4-amd64. The VM is also running kernel 3.16.0-4-amd64 with ZFS/SPL 0.6.3 (from ZFSforLinux). The host system is not reporting any errors.
Apr 6 10:12:59 model-zfs kernel: [290529.896417] ata7.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x6 frozen Apr 6 10:12:59 model-zfs kernel: [290529.896434] ata7.00: failed command: WRITE FPDMA QUEUED Apr 6 10:12:59 model-zfs kernel: [290529.896449] ata7.00: cmd 61/01:48:f4:59:00/00:00:00:00:00/40 tag 9 ncq 512 out Apr 6 10:12:59 model-zfs kernel: [290529.896449] res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Apr 6 10:12:59 model-zfs kernel: [290529.896456] ata7.00: status: { DRDY } Apr 6 10:12:59 model-zfs kernel: [290529.896476] ata7: hard resetting link Apr 6 10:13:00 model-zfs kernel: [290530.216457] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 6 10:13:00 model-zfs kernel: [290530.217616] ata7.00: configured for UDMA/133 Apr 6 10:13:00 model-zfs kernel: [290530.217633] ata7.00: device reported invalid CHS sector 0 Apr 6 10:13:00 model-zfs kernel: [290530.217661] ata7: EH complete Apr 6 10:24:56 model-zfs kernel: [291246.888769] ata4.00: exception Emask 0x0 SAct 0x400 SErr 0x0 action 0x6 frozen Apr 6 10:24:56 model-zfs kernel: [291246.888801] ata4.00: failed command: WRITE FPDMA QUEUED Apr 6 10:24:56 model-zfs kernel: [291246.888830] ata4.00: cmd 61/19:50:2b:a7:01/00:00:00:00:00/40 tag 10 ncq 12800 out Apr 6 10:24:56 model-zfs kernel: [291246.888830] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 6 10:24:56 model-zfs kernel: [291246.888852] ata4.00: status: { DRDY } Apr 6 10:24:56 model-zfs kernel: [291246.888883] ata4: hard resetting link Apr 6 10:24:57 model-zfs kernel: [291247.248428] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 6 10:24:57 model-zfs kernel: [291247.249216] ata4.00: configured for UDMA/133 Apr 6 10:24:57 model-zfs kernel: [291247.249229] ata4.00: device reported invalid CHS sector 0 Apr 6 10:24:57 model-zfs kernel: [291247.249254] ata4: EH complete Apr 6 10:32:02 model-zfs kernel: [291672.872331] ata4.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen Apr 6 10:32:02 model-zfs kernel: [291672.872348] ata4.00: failed command: WRITE FPDMA QUEUED Apr 6 10:32:02 model-zfs kernel: [291672.872364] ata4.00: cmd 61/02:80:92:03:00/00:00:00:00:00/40 tag 16 ncq 1024 out Apr 6 10:32:02 model-zfs kernel: [291672.872364] res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Apr 6 10:32:02 model-zfs kernel: [291672.872371] ata4.00: status: { DRDY } Apr 6 10:32:02 model-zfs kernel: [291672.872390] ata4: hard resetting link Apr 6 10:32:03 model-zfs kernel: [291673.193487] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 6 10:32:03 model-zfs kernel: [291673.194270] ata4.00: configured for UDMA/133 Apr 6 10:32:03 model-zfs kernel: [291673.194302] ata4.00: device reported invalid CHS sector 0 Apr 6 10:32:03 model-zfs kernel: [291673.194335] ata4: EH complete Apr 6 10:36:45 model-zfs kernel: [291955.882996] ata2.00: exception Emask 0x0 SAct 0xc000000 SErr 0x0 action 0x6 frozen Apr 6 10:36:45 model-zfs kernel: [291955.883014] ata2.00: failed command: READ FPDMA QUEUED Apr 6 10:36:45 model-zfs kernel: [291955.883028] ata2.00: cmd 60/30:d0:b1:d7:75/00:00:01:00:00/40 tag 26 ncq 24576 in Apr 6 10:36:45 model-zfs kernel: [291955.883028] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 6 10:36:45 model-zfs kernel: [291955.883036] ata2.00: status: { DRDY } Apr 6 10:36:45 model-zfs kernel: [291955.883042] ata2.00: failed command: READ FPDMA QUEUED Apr 6 10:36:45 model-zfs kernel: [291955.883053] ata2.00: cmd 60/10:d8:f1:d7:75/00:00:01:00:00/40 tag 27 ncq 8192 in Apr 6 10:36:45 model-zfs kernel: [291955.883053] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 6 10:36:45 model-zfs kernel: [291955.883060] ata2.00: status: { DRDY } Apr 6 10:36:45 model-zfs kernel: [291955.883077] ata2: hard resetting link Apr 6 10:36:46 model-zfs kernel: [291956.200464] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 6 10:36:46 model-zfs kernel: [291956.201409] ata2.00: configured for UDMA/133 Apr 6 10:36:46 model-zfs kernel: [291956.201422] ata2.00: device reported invalid CHS sector 0 Apr 6 10:36:46 model-zfs kernel: [291956.201428] ata2.00: device reported invalid CHS sector 0 Apr 6 10:36:46 model-zfs kernel: [291956.201454] ata2: EH complete
Is this a known bug or simple the result of ZFS stressing the virtual system too much?
comment:6 by , 9 years ago
Description: | modified (diff) |
---|
Please upload the VBox.log of the problematic VM.
comment:7 by , 8 years ago
Regresion in 5.1.12 r112440(Qt5.6.2), same error ata3.00: failed command: WRITE FPDMA QUEUED ata3.00: cmd 61/88:88;78:8f:a5/03:00:00:00:00/40 tag 17 ncq 462848 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
comment:9 by , 7 years ago
I have an apparently similar problem. I have had a series of disk errors inside the guest box. This is the last one.
Host Debian buster 4.14.0-3-amd64 Virtualbox Debian 5.2.8-dfsg-2
Guest Debian 7.11 Linux kernel 3.2.0.5-686-pae
Guest system error log:
Mar 8 07:13:39 ala kernel: [129507.443035] ata4: failed to read log page 10h (errno=-5) Mar 8 07:13:39 ala kernel: [129507.444165] ata4.00: exception Emask 0x1 SAct 0xc0 SErr 0x0 action 0x0 Mar 8 07:13:39 ala kernel: [129507.445373] ata4.00: irq_stat 0x40000008 Mar 8 07:13:39 ala kernel: [129507.446367] ata4.00: failed command: READ FPDMA QUEUED Mar 8 07:13:39 ala kernel: [129507.446743] ata4.00: cmd 60/30:30:7e:2c:77/00:00:2f:00:00/40 tag 6 ncq 24576 in Mar 8 07:13:39 ala kernel: [129507.446743] res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error) Mar 8 07:13:39 ala kernel: [129507.450247] ata4.00: status: { DRDY } Mar 8 07:13:39 ala kernel: [129507.450473] ata4.00: failed command: READ FPDMA QUEUED Mar 8 07:13:39 ala kernel: [129507.450733] ata4.00: cmd 60/38:38:be:37:77/00:00:2f:00:00/40 tag 7 ncq 28672 in Mar 8 07:13:39 ala kernel: [129507.450733] res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error) Mar 8 07:13:39 ala kernel: [129507.451656] ata4.00: status: { DRDY } Mar 8 07:13:39 ala kernel: [129507.452753] ata4.00: configured for UDMA/133 Mar 8 07:13:39 ala kernel: [129507.453856] ata4: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4 Mar 8 07:13:39 ala kernel: [129507.455029] ata4: irq_stat 0x40000008
I have attached virtual box logs.
comment:10 by , 6 years ago
Add me to the list of victims. I have a CentOS host with vbox 5.2.8 and multiple guests. My Devuan VM ended up with its /var filesystem (LVM ext4) mounted ro sometime after boot. I had run package updates successfully 3 times in the 10 days it had been up without incident, but failed upon an attempt to do a package update (apt uses /var) yesterday.
I ran fsck to fix /var as it had bad free inode and bad free block counts. After a reboot of the VM, all is well again. I can find corresponding errors in both the guest (Devuan) /var/log/messages and the VirtualBox log for the guest VM. Also, I have been able to ascertain that the errors seem to be unknown to the host OS (no related error messages at the same time). This squares with the other reporters' findings.
In the vbox log, note messages beginning at 22:14:24.001378 244:26:07.059612
In the Devuan /var/log/messages file, corresponding messages are around Mar 28 15:22
The host log does not seem to indicate any issues with disk access.
We are 8 hours west of GMT, and I think we are savings time, so actually -7 behind. The VM may be off several seconds from the host (I have them using ntp rather than host time sync).
follow-up: 13 comment:11 by , 6 years ago
the problem occurs if the disk is configured as dynamic, when you transfer the disk to a fixed size, the problem disappears
comment:12 by , 6 years ago
Same here with VirtualBox-5.2-5.2.20_125813_fedora26-1.x86_64
running on a Fedora 28 host. The VM is running Ubuntu 18.04.1 LTS (Linux 4.15) and has its disks configured as:
$ vboxmanage showvminfo ubuntu0 | grep Stora Storage Controller Name (0): SATA Storage Controller Type (0): IntelAhci Storage Controller Instance Number (0): 0 Storage Controller Max Port Count (0): 30 Storage Controller Port Count (0): 3 Storage Controller Bootable (0): on
On the host, no I/O errors whatsoever are logged, so the storage (an external USB disk) is assumed to be stable. When there's lots of disk I/O going on in the guest, the following is logged to Logs/VBox.log
from time to time:
01:33:14.526566 VD#2: Write request was active for 40 seconds 01:33:14.526656 VD#2: Cancelling all active requests 01:33:14.529303 AHCI#0: Port 2 reset 01:33:14.532915 VD#2: Cancelling all active requests
And in the guest then:
Oct 23 14:30:55 ubuntu0 kernel: ata3.00: exception Emask 0x0 SAct 0x7dffffff SErr 0x0 action 0x6 frozen Oct 23 14:30:55 ubuntu0 kernel: ata3.00: failed command: WRITE FPDMA QUEUED Oct 23 14:30:55 ubuntu0 kernel: ata3.00: cmd 61/90:00:e8:5c:4d/03:00:08:00:00/40 tag 0 ncq dma 466944 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 23 14:30:55 ubuntu0 kernel: ata3.00: status: { DRDY } [...] Oct 23 14:31:01 ubuntu0 kernel: ata3.00: device reported invalid CHS sector 0
The rsync
process running in the guest (copying data from one vdisk to another) doesn't seem to notice though and is still running, I wonder about data integrity though. OK, so the rsync
process in the guest ended w/o errors (RC 0) and copied ~70 GB from one disk to another. But seeing these device errors in the guest I'm running now a btrfs scrub on the disk and here the errors are indeed visible:
$ btrfs scrub status /mnt scrub status for 044722df-de28-4604-a48d-92779769f47c scrub started at Tue Oct 23 14:37:33 2018, running for 00:25:32 total bytes scrubbed: 45.30GiB with 192 errors error details: csum=192 corrected errors: 0, uncorrectable errors: 192, unverified errors: 0
And while scrubbing, the guest's syslog tell me which inodes (and files) have these checksum errors - all Git .pack
files. That's exactly the reason why I copied these files from an xfs
to a btrfs
disk within this guest because git
failed to read its objects before and now I know why :-\
comment:13 by , 6 years ago
Replying to rst07:
the problem occurs if the disk is configured as dynamic, when you transfer the disk to a fixed size, the problem disappears
Thanks for the hint! I can confirm that the problem went away with a Fixed
variant disk.
comment:15 by , 5 years ago
I've found a usable workaround to be switching my disks in VirtualBox to IDE. The virtual IDE "can" still time out, but IDE's multisector read and write mode max at like 16 sectors (8KB), or maybe 256 sectors (128KB), versus SATA allowing for 64MB or more via larger transfers and native command queueing; that's 1/512th the data in that queue. In days of yore, SATA gave a massive speed increase in VMs compared to IDE (the 8KB transfers plus mucking about with I/O ports meant a lot of virtualization overhead); with VT-X flipped on the difference is low enough; I haven't run some synthetic disk I/O benchmark but in real use I can't tell the difference in terms of speed or CPU usage. If my physical disk I/O is nice, the IDE runs nice; if the disk I/O is slow, the OS in the virtual box (Linux or Windows) simply run through their read and write queues pretty slowly, rather than spitting out large 64MB requests then timing out.
Root cause is the Linux write cache (on the host) becoming quite large, then disk I/O is quite slow during the time that cache flushes out. If any disk I/O from a VirtualBox takes over ~15 seconds, the Linux or Windows OS in the virtual machine assumes the SATA drive failed and counts it as a read or write error.
For a Linux VM, you just add the IDE controller, remove your disk (and CD if you want) from SATA and add them onto IDE. Remove the SATA controller if you want. Done.
For Windows 7, add IDE, boot into 7. Make sure it installs the IDE driver. Shut down and move the disks to IDE, remove SATA if you want.
For 10, google windows 10 sata to ide, and where they talk about mucking about in the BIOS muck about in VirtualBox storage settings instead; you have to add the IDE controller, boot, run a bcdedit command, power down, switch the disks to the IDE controller (and remove SATA if you want), it boots into a safe mode once, then run a bcdedit command again so it boots into normal mode.
comment:16 by , 3 years ago
I ran into this but this afternoon on 6.1.22r144080 running on FreeBSD.
I didn't save the guest system logs, because they were just like the above.
I attempted to work around the problem by detaching the disks, removing the SATA controller, adding a SAS controller, and then attaching the disks to that.
That got a little further down the road, but then hit this:
Aug 23 04:27:59 mailserver kernel: [ 406.717018] mptscsih: ioc0: attempting task abort! (sc=0000000016c22ccc) Aug 23 04:27:59 mailserver kernel: [ 406.717029] sd 2:0:0:0: [sda] tag#107 CDB: Read(10) 28 00 00 41 1d 80 00 00 08 00 Aug 23 04:27:59 mailserver kernel: [ 406.889068] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=0000000016c22ccc) Aug 23 04:27:59 mailserver kernel: [ 406.889076] mptscsih: ioc0: attempting task abort! (sc=000000001b122e0b) Aug 23 04:27:59 mailserver kernel: [ 406.889085] sd 2:0:0:0: [sda] tag#104 CDB: Write(10) 2a 00 01 04 18 e8 00 00 08 00 Aug 23 04:28:00 mailserver kernel: [ 407.056968] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=000000001b122e0b) Aug 23 04:28:00 mailserver kernel: [ 407.073044] mptscsih: ioc0: attempting target reset! (sc=0000000016c22ccc) Aug 23 04:28:00 mailserver kernel: [ 407.073055] sd 2:0:0:0: [sda] tag#107 CDB: Read(10) 28 00 00 41 1d 80 00 00 08 00 Aug 23 04:28:00 mailserver kernel: [ 407.249019] mptscsih: ioc0: target reset: SUCCESS (sc=0000000016c22ccc) Aug 23 04:28:22 mailserver PackageKit: daemon quit Aug 23 04:28:22 mailserver systemd[1]: packagekit.service: Succeeded. Aug 23 04:28:30 mailserver kernel: [ 437.436685] mptscsih: ioc0: attempting task abort! (sc=000000005ef0906a) Aug 23 04:28:30 mailserver kernel: [ 437.436694] sd 2:0:0:0: [sda] tag#96 CDB: Write(10) 2a 00 00 c5 19 80 00 00 80 00 Aug 23 04:28:30 mailserver kernel: [ 437.772752] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=000000005ef0906a) Aug 23 04:28:30 mailserver kernel: [ 437.827952] mptscsih: ioc0: attempting target reset! (sc=000000001b122e0b) Aug 23 04:28:30 mailserver kernel: [ 437.827976] sd 2:0:0:0: [sda] tag#108 CDB: Write(10) 2a 00 01 04 18 e8 00 00 08 00 Aug 23 04:28:30 mailserver kernel: [ 437.992679] mptscsih: ioc0: target reset: SUCCESS (sc=000000001b122e0b) Aug 23 04:29:01 mailserver kernel: [ 468.160319] mptscsih: ioc0: attempting target reset! (sc=000000005ef0906a) Aug 23 04:29:01 mailserver kernel: [ 468.160328] sd 2:0:0:0: [sda] tag#97 CDB: Write(10) 2a 00 00 c5 19 80 00 00 80 00
I'm guessing that the bug is a little further down the stack than the SATA code.
This issue is also occurring for me on Fedora 14 and Ubuntu 10.10. I have tried with using SATA, SCSI and SAS and the error still occurs.
Environment: DELL E6410 Windows 7 64bit
Guest OS's Tested: Ubuntu 10.10 i386 Fedora 14 x86_64
This issue does not seem to occur on a Windows XP 32 bit virtual machine.