VirtualBox

Ticket #8311 (reopened defect)

Opened 8 years ago

Last modified 6 weeks ago

SATA AHCI Errors: READ FPDMA QUEUED

Reported by: andornaut Owned by:
Priority: major Component: other
Version: VirtualBox 4.0.2 Keywords:
Cc: Guest type: Linux
Host type: Windows

Description (last modified by aeichner) (diff)

My guest OS hangs periodically with a harddrive command error followed by an attempt to reset the HD (reproduced with Ubuntu and ArcLinux). This issue does not occur in the host OS (Windows 7), only in the Guest OS.

Environment:

  • Ubuntu running in a Virtual Box 4.0.2 running on Windows 7 64bit.
  • Asus G53JW Laptop
  • Intel X25 SSD HD

Log excerpts:

Jan 17 14:48:32 vm kernel: [ 1513.120252] ata3: hard resetting link
Jan 17 14:48:33 vm kernel: [ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Jan 17 14:48:33 vm kernel: [ 1513.471232] ata3.00: configured for UDMA/133
Jan 17 14:48:33 vm kernel: [ 1513.471241] ata3.00: device reported invalid CHS
sector 0
Jan 17 14:48:33 vm kernel: [ 1513.471255] ata3: EH complete
[ 1513.120200] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[ 1513.120210] ata3.00: failed command: READ FPDMA QUEUED
[ 1513.120222] ata3.00: cmd 60/08:00:30:66:4c/00:00:00:00:00/40 tag 0 ncq 4096
in
[ 1513.120224]          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
(timeout)
[ 1513.120230] ata3.00: status: { DRDY }
[ 1513.120252] ata3: hard resetting link
[ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1513.471232] ata3.00: configured for UDMA/133
[ 1513.471241] ata3.00: device reported invalid CHS sector 0
[ 1513.471255] ata3: EH complete

Attachments

dmesg-jackieku.txt Download (48.1 KB) - added by JackieKu 7 years ago.
VBox.log Download (102.2 KB) - added by pot 9 months ago.
Logs of the machine descriped in comment by pot
Host.messages Download (21.0 KB) - added by HarryM 9 months ago.
CentOS Virtualbox Host /var/log/messages file
Devuan.messages Download (59.7 KB) - added by HarryM 9 months ago.
Devuan (guest VM) /var/log/messages
Devuan-VM.log Download (512.0 KB) - added by HarryM 9 months ago.
Virtualbox guest Log for Devuan VM

Change History

comment:1 Changed 8 years ago by cmeyer

This issue is also occurring for me on Fedora 14 and Ubuntu 10.10. I have tried with using SATA, SCSI and SAS and the error still occurs.

Environment: DELL E6410 Windows 7 64bit

Guest OS's Tested: Ubuntu 10.10 i386 Fedora 14 x86_64

This issue does not seem to occur on a Windows XP 32 bit virtual machine.

comment:2 Changed 8 years ago by andornaut

I've been able to workaround (not to say "solve") this issue by using the following kernel boot option: nolapic_timer

This workaround is described in this bug: #7619

comment:3 Changed 7 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Please reopen if still relevant with VBox 4.1.4. From your description this looks like an issue related to the virtual time and there were some fixes in between.

comment:4 Changed 7 years ago by JackieKu

  • Status changed from closed to reopened
  • Resolution fixed deleted

This issue still happens in 4.1.12 and 4.1.14. Added noapictimer kernel options seems to reduce (but not gone completely) the happen rate.

Several disks got reset when it occurs. I'm using SATA AHCI controller.

Environment:
Host: Windows 7 x64
Guest: Gentoo Linux x64 (linux 3.2.12)
HDD: 6 Seagate 1TB (ST31000524AS and ST31000528AS) whole disk raw access + 1 WD 80GB raw access with some partitions

Last edited 7 years ago by JackieKu (previous) (diff)

Changed 7 years ago by JackieKu

comment:5 Changed 4 years ago by StarNamer

I recently started testing ZFS using a VirtualBox VM and have started to see this and similar errors. My environment is VirtualBox 4.3.18_Debian r96516 running under Debian (Sid) kernel 3.16.0-4-amd64. The VM is also running kernel 3.16.0-4-amd64 with ZFS/SPL 0.6.3 (from ZFSforLinux). The host system is not reporting any errors.

Apr  6 10:12:59 model-zfs kernel: [290529.896417] ata7.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x6 frozen
Apr  6 10:12:59 model-zfs kernel: [290529.896434] ata7.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:12:59 model-zfs kernel: [290529.896449] ata7.00: cmd 61/01:48:f4:59:00/00:00:00:00:00/40 tag 9 ncq 512 out
Apr  6 10:12:59 model-zfs kernel: [290529.896449]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Apr  6 10:12:59 model-zfs kernel: [290529.896456] ata7.00: status: { DRDY }
Apr  6 10:12:59 model-zfs kernel: [290529.896476] ata7: hard resetting link
Apr  6 10:13:00 model-zfs kernel: [290530.216457] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:13:00 model-zfs kernel: [290530.217616] ata7.00: configured for UDMA/133
Apr  6 10:13:00 model-zfs kernel: [290530.217633] ata7.00: device reported invalid CHS sector 0
Apr  6 10:13:00 model-zfs kernel: [290530.217661] ata7: EH complete

Apr  6 10:24:56 model-zfs kernel: [291246.888769] ata4.00: exception Emask 0x0 SAct 0x400 SErr 0x0 action 0x6 frozen
Apr  6 10:24:56 model-zfs kernel: [291246.888801] ata4.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:24:56 model-zfs kernel: [291246.888830] ata4.00: cmd 61/19:50:2b:a7:01/00:00:00:00:00/40 tag 10 ncq 12800 out
Apr  6 10:24:56 model-zfs kernel: [291246.888830]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:24:56 model-zfs kernel: [291246.888852] ata4.00: status: { DRDY }
Apr  6 10:24:56 model-zfs kernel: [291246.888883] ata4: hard resetting link
Apr  6 10:24:57 model-zfs kernel: [291247.248428] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:24:57 model-zfs kernel: [291247.249216] ata4.00: configured for UDMA/133
Apr  6 10:24:57 model-zfs kernel: [291247.249229] ata4.00: device reported invalid CHS sector 0
Apr  6 10:24:57 model-zfs kernel: [291247.249254] ata4: EH complete

Apr  6 10:32:02 model-zfs kernel: [291672.872331] ata4.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen
Apr  6 10:32:02 model-zfs kernel: [291672.872348] ata4.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:32:02 model-zfs kernel: [291672.872364] ata4.00: cmd 61/02:80:92:03:00/00:00:00:00:00/40 tag 16 ncq 1024 out
Apr  6 10:32:02 model-zfs kernel: [291672.872364]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Apr  6 10:32:02 model-zfs kernel: [291672.872371] ata4.00: status: { DRDY }
Apr  6 10:32:02 model-zfs kernel: [291672.872390] ata4: hard resetting link
Apr  6 10:32:03 model-zfs kernel: [291673.193487] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:32:03 model-zfs kernel: [291673.194270] ata4.00: configured for UDMA/133
Apr  6 10:32:03 model-zfs kernel: [291673.194302] ata4.00: device reported invalid CHS sector 0
Apr  6 10:32:03 model-zfs kernel: [291673.194335] ata4: EH complete

Apr  6 10:36:45 model-zfs kernel: [291955.882996] ata2.00: exception Emask 0x0 SAct 0xc000000 SErr 0x0 action 0x6 frozen
Apr  6 10:36:45 model-zfs kernel: [291955.883014] ata2.00: failed command: READ FPDMA QUEUED
Apr  6 10:36:45 model-zfs kernel: [291955.883028] ata2.00: cmd 60/30:d0:b1:d7:75/00:00:01:00:00/40 tag 26 ncq 24576 in
Apr  6 10:36:45 model-zfs kernel: [291955.883028]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:36:45 model-zfs kernel: [291955.883036] ata2.00: status: { DRDY }
Apr  6 10:36:45 model-zfs kernel: [291955.883042] ata2.00: failed command: READ FPDMA QUEUED
Apr  6 10:36:45 model-zfs kernel: [291955.883053] ata2.00: cmd 60/10:d8:f1:d7:75/00:00:01:00:00/40 tag 27 ncq 8192 in
Apr  6 10:36:45 model-zfs kernel: [291955.883053]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:36:45 model-zfs kernel: [291955.883060] ata2.00: status: { DRDY }
Apr  6 10:36:45 model-zfs kernel: [291955.883077] ata2: hard resetting link
Apr  6 10:36:46 model-zfs kernel: [291956.200464] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:36:46 model-zfs kernel: [291956.201409] ata2.00: configured for UDMA/133
Apr  6 10:36:46 model-zfs kernel: [291956.201422] ata2.00: device reported invalid CHS sector 0
Apr  6 10:36:46 model-zfs kernel: [291956.201428] ata2.00: device reported invalid CHS sector 0
Apr  6 10:36:46 model-zfs kernel: [291956.201454] ata2: EH complete

Is this a known bug or simple the result of ZFS stressing the virtual system too much?

comment:6 Changed 4 years ago by aeichner

  • Description modified (diff)

Please upload the VBox.log of the problematic VM.

comment:7 Changed 2 years ago by tigreci_2

Regresion in 5.1.12 r112440(Qt5.6.2), same error ata3.00: failed command: WRITE FPDMA QUEUED ata3.00: cmd 61/88:88;78:8f:a5/03:00:00:00:00/40 tag 17 ncq 462848 out

res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

ata3.00: status: { DRDY }

comment:8 Changed 2 years ago by aeichner

Please upload a VBox.log of the affected VM.

comment:9 Changed 9 months ago by pot

I have an apparently similar problem. I have had a series of disk errors inside the guest box. This is the last one.

Host Debian buster
4.14.0-3-amd64
Virtualbox Debian 5.2.8-dfsg-2
Guest Debian 7.11
Linux kernel 3.2.0.5-686-pae

Guest system error log:

Mar  8 07:13:39 ala kernel: [129507.443035] ata4: failed to read log page 10h (errno=-5)
Mar  8 07:13:39 ala kernel: [129507.444165] ata4.00: exception Emask 0x1 SAct 0xc0 SErr 0x0 action 0x0
Mar  8 07:13:39 ala kernel: [129507.445373] ata4.00: irq_stat 0x40000008
Mar  8 07:13:39 ala kernel: [129507.446367] ata4.00: failed command: READ FPDMA QUEUED
Mar  8 07:13:39 ala kernel: [129507.446743] ata4.00: cmd 60/30:30:7e:2c:77/00:00:2f:00:00/40 tag 6 ncq 24576 in
Mar  8 07:13:39 ala kernel: [129507.446743]          res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Mar  8 07:13:39 ala kernel: [129507.450247] ata4.00: status: { DRDY }
Mar  8 07:13:39 ala kernel: [129507.450473] ata4.00: failed command: READ FPDMA QUEUED
Mar  8 07:13:39 ala kernel: [129507.450733] ata4.00: cmd 60/38:38:be:37:77/00:00:2f:00:00/40 tag 7 ncq 28672 in
Mar  8 07:13:39 ala kernel: [129507.450733]          res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Mar  8 07:13:39 ala kernel: [129507.451656] ata4.00: status: { DRDY }
Mar  8 07:13:39 ala kernel: [129507.452753] ata4.00: configured for UDMA/133
Mar  8 07:13:39 ala kernel: [129507.453856] ata4: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
Mar  8 07:13:39 ala kernel: [129507.455029] ata4: irq_stat 0x40000008

I have attached virtual box logs.

Last edited 9 months ago by pot (previous) (diff)

Changed 9 months ago by pot

Logs of the machine descriped in comment by pot

comment:10 Changed 9 months ago by HarryM

Add me to the list of victims. I have a CentOS host with vbox 5.2.8 and multiple guests. My Devuan VM ended up with its /var filesystem (LVM ext4) mounted ro sometime after boot. I had run package updates successfully 3 times in the 10 days it had been up without incident, but failed upon an attempt to do a package update (apt uses /var) yesterday.

I ran fsck to fix /var as it had bad free inode and bad free block counts. After a reboot of the VM, all is well again. I can find corresponding errors in both the guest (Devuan) /var/log/messages and the VirtualBox log for the guest VM. Also, I have been able to ascertain that the errors seem to be unknown to the host OS (no related error messages at the same time). This squares with the other reporters' findings.

In the vbox log, note messages beginning at 22:14:24.001378 244:26:07.059612

In the Devuan /var/log/messages file, corresponding messages are around Mar 28 15:22

The host log does not seem to indicate any issues with disk access.

We are 8 hours west of GMT, and I think we are savings time, so actually -7 behind. The VM may be off several seconds from the host (I have them using ntp rather than host time sync).

Last edited 9 months ago by HarryM (previous) (diff)

Changed 9 months ago by HarryM

CentOS Virtualbox Host /var/log/messages file

Changed 9 months ago by HarryM

Devuan (guest VM) /var/log/messages

Changed 9 months ago by HarryM

Virtualbox guest Log for Devuan VM

comment:11 follow-up: ↓ 13 Changed 4 months ago by rst07

the problem occurs if the disk is configured as dynamic, when you transfer the disk to a fixed size, the problem disappears

comment:12 Changed 8 weeks ago by ckujau

Same here with VirtualBox-5.2-5.2.20_125813_fedora26-1.x86_64 running on a Fedora 28 host. The VM is running Ubuntu 18.04.1 LTS (Linux 4.15) and has its disks configured as:

$ vboxmanage showvminfo ubuntu0 | grep Stora
Storage Controller Name (0):            SATA
Storage Controller Type (0):            IntelAhci
Storage Controller Instance Number (0): 0
Storage Controller Max Port Count (0):  30
Storage Controller Port Count (0):      3
Storage Controller Bootable (0):        on

On the host, no I/O errors whatsoever are logged, so the storage (an external USB disk) is assumed to be stable. When there's lots of disk I/O going on in the guest, the following is logged to Logs/VBox.log from time to time:

01:33:14.526566 VD#2: Write request was active for 40 seconds
01:33:14.526656 VD#2: Cancelling all active requests
01:33:14.529303 AHCI#0: Port 2 reset
01:33:14.532915 VD#2: Cancelling all active requests

And in the guest then:

Oct 23 14:30:55 ubuntu0 kernel: ata3.00: exception Emask 0x0 SAct 0x7dffffff SErr 0x0 action 0x6 frozen
Oct 23 14:30:55 ubuntu0 kernel: ata3.00: failed command: WRITE FPDMA QUEUED
Oct 23 14:30:55 ubuntu0 kernel: ata3.00: cmd 61/90:00:e8:5c:4d/03:00:08:00:00/40 tag 0 ncq dma 466944 out
                                         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 23 14:30:55 ubuntu0 kernel: ata3.00: status: { DRDY }
[...]
Oct 23 14:31:01 ubuntu0 kernel: ata3.00: device reported invalid CHS sector 0

The rsync process running in the guest (copying data from one vdisk to another) doesn't seem to notice though and is still running, I wonder about data integrity though. OK, so the rsync process in the guest ended w/o errors (RC 0) and copied ~70 GB from one disk to another. But seeing these device errors in the guest I'm running now a  btrfs scrub on the disk and here the errors are indeed visible:

$ btrfs scrub status /mnt
scrub status for 044722df-de28-4604-a48d-92779769f47c
        scrub started at Tue Oct 23 14:37:33 2018, running for 00:25:32
        total bytes scrubbed: 45.30GiB with 192 errors
        error details: csum=192
        corrected errors: 0, uncorrectable errors: 192, unverified errors: 0

And while scrubbing, the guest's syslog tell me which inodes (and files) have these checksum errors - all Git .pack files. That's exactly the reason why I copied these files from an xfs to a btrfs disk within this guest because git failed to read its objects before and now I know why :-\

Last edited 8 weeks ago by ckujau (previous) (diff)

comment:13 in reply to: ↑ 11 Changed 6 weeks ago by ckujau

Replying to rst07:

the problem occurs if the disk is configured as dynamic, when you transfer the disk to a fixed size, the problem disappears

Thanks for the hint! I can confirm that the problem went away with a Fixed variant disk.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use