#13022 new defect

I/O errors reported by kernel for device backed by VDI

Reported by:	Chris Murphy	Owned by:
Component:	other	Version:	VirtualBox 4.3.10
Keywords:		Cc:
Guest type:	Linux	Host type:	Mac OS X

Description

/dev/sda backed by a VDI

/dev/sdb backed by a vmdk, created with the command: sudo VBoxManage internalcommands createrawvmdk -filename /Users/chris/VirtualBox\ VMs/Fedora\ Rawhide/320g.vmdk -rawdisk /dev/disk2

/media/sf_chris/ is a Shared Folder whose source is HFS+

During sustained copying from /media/sf_chris and writing to /dev/sdb, the kernel reports spurius read and write errors from /dev/sda which is essentially inactive in this process. Long file copies (10 minutes) eventually result in segfaults, the system will no longer respond to commands and I can't even get kernel messages out of the VM to see what's gone wrong other than it looks like the I/O errors are effectively corrupting the OS as it's running.

[ 222.202808] rawhide.localdomain kernel: sd 0:0:0:0: [sda] Unhandled error code [ 222.203003] rawhide.localdomain kernel: sd 0:0:0:0: [sda] [ 222.203287] rawhide.localdomain kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT [ 222.203407] rawhide.localdomain kernel: sd 0:0:0:0: [sda] CDB: [ 222.203528] rawhide.localdomain kernel: Write(10): 2a 00 00 58 a1 c8 00 02 00 00 [ 222.203644] rawhide.localdomain kernel: end_request: I/O error, dev sda, sector 5808584 [ 222.203746] rawhide.localdomain kernel: BTRFS: bdev /dev/sda3 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 [ 273.598086] rawhide.localdomain kernel: sd 0:0:0:0: [sda] Unhandled error code [ 273.598179] rawhide.localdomain kernel: sd 0:0:0:0: [sda] [ 273.598255] rawhide.localdomain kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT [ 273.598341] rawhide.localdomain kernel: sd 0:0:0:0: [sda] CDB: [ 273.598437] rawhide.localdomain kernel: Read(10): 28 00 01 17 73 40 00 00 20 00 [ 273.598533] rawhide.localdomain kernel: end_request: I/O error, dev sda, sector 18314048 [ 273.598618] rawhide.localdomain kernel: BTRFS: bdev /dev/sda3 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0

The host kernel messages report no IO related errors. But does report other things related to VirtualBox at the time the problem occurs:

May 8 23:04:16 ming kernel[0]: VBoxDrv: host_vmxoff -> vmx_use_count=0 May 8 23:04:44 ming kernel[0]: VirtualBox (map: 0xffffff8050dfec18) triggered DYLD shared region unnest for map: 0xffffff8050dfec18, region 0x7fff8ba00000->0x7fff8bc00000. While not abnormal for debuggers, this increases system memory footprint until the target exits. May 8 23:04:48 ming kernel[0]: VirtualBoxVM (map: 0xffffff803cc1b138) triggered DYLD shared region unnest for map: 0xffffff803cc1b138, region 0x7fff8ba00000->0x7fff8bc00000. While not abnormal for debuggers, this increases system memory footprint until the target exits. May 8 23:04:48 ming kernel[0]: VBoxDrv: host_vmxon -> vmx_use_count=1 rc=0 May 8 23:04:48 ming kernel[0]: VBoxDrv: host_vmxoff -> vmx_use_count=0 May 8 23:04:48 ming kernel[0]: VBoxDrv: host_vmxon -> vmx_use_count=1 rc=0 May 8 23:13:46 ming kernel[0]: VBoxDrv: host_vmxoff -> vmx_use_count=0

The vboxlog reports a lot of issues related to AHCI:

00:43:34.854982 AHCI#0P0: Cancelled task 7 00:43:34.855589 AHCI#0: Port 0 reset 00:43:50.165909 AHCI#0P0: Cancelled task 31 00:43:50.165928 AHCI#0: Port 0 reset 00:44:05.474666 AHCI#0P0: Cancelled task 31 00:44:05.474685 AHCI#0: Port 0 reset 00:44:35.781838 AHCI#0P0: Cancelled task 31 00:44:35.782117 AHCI#0: Port 0 reset 00:44:43.107712 AsyncCompletion: Task 0x00000131801a40 completed after 98 seconds 00:44:43.107728 AsyncCompletion: Task 0x00000131801ac0 completed after 98 seconds 00:44:43.107734 AsyncCompletion: Task 0x00000131801a80 completed after 98 seconds 00:44:43.108556 AsyncCompletion: Task 0x00000131801b40 completed after 98 seconds 00:44:43.108572 AsyncCompletion: Task 0x00000131801b00 completed after 98 seconds 00:44:43.108605 AsyncCompletion: Task 0x00000131801b80 completed after 98 seconds 00:44:43.148090 AHCI#0P0: Flush request was active for 98 seconds 00:44:43.148101 AHCI#0P0: Canceled flush returned rc=VINF_SUCCESS 00:44:43.148108 AsyncCompletion: Task 0x00000131801bc0 completed after 98 seconds 00:44:43.473168 AHCI#0P0: Flush request was active for 68 seconds 00:44:43.473183 AHCI#0P0: Canceled flush returned rc=VINF_SUCCESS 00:44:44.688476 AHCI#0P0: Flush request was active for 54 seconds 00:44:44.688495 AHCI#0P0: Canceled flush returned rc=VINF_SUCCESS 00:44:44.688838 AHCI#0P0: Flush request was active for 38 seconds 00:44:44.688850 AHCI#0P0: Canceled flush returned rc=VINF_SUCCESS

Since this is reporting the problem is with sda and not sdb, I don't think it's the physical external drive causing the problem, it's either a guest additions bug or how AHCI is implemented in VBox and interacting with Linux.

kernel-3.15.0-0.rc4.git0.1.fc21.x86_64

Attachments (13)

VBox.log (110.5 KB ) - added by Chris Murphy 11 years ago.: VBox.log
VBox.log.2 (119.4 KB ) - added by Chris Murphy 11 years ago.: VBox2.log
journal2.txt (238.7 KB ) - added by Chris Murphy 11 years ago.: journalctl -b copy2
journalctl_exceptionwhilewritingISOsto320GFW800.txt (256.5 KB ) - added by Chris Murphy 11 years ago.: journalctl -b copy1
osxdmesg.txt (10.6 KB ) - added by Chris Murphy 11 years ago.: osx dmesg 3 prior hours, filtered for kernel messages
dmesg_315rc4_01.txt (51.0 KB ) - added by Chris Murphy 11 years ago.: applies to comment 1
VBox_01.log (83.5 KB ) - added by Chris Murphy 11 years ago.: applies to comment 1
dmesg_3142_02.txt (44.9 KB ) - added by Chris Murphy 11 years ago.: applies to comment 2
VBox_02.log (78.2 KB ) - added by Chris Murphy 11 years ago.: applies to comment 2
dmesg_315rc4_04.txt (53.3 KB ) - added by Chris Murphy 11 years ago.
VBox_04.log (101.8 KB ) - added by Chris Murphy 11 years ago.
VBox.2.log (102.8 KB ) - added by Chris Murphy 11 years ago.: vboxlog_vbox4.3.14-linux3.16.0-guestadditions4.3.15.log
VBox.3.log (146.6 KB ) - added by Chris Murphy 11 years ago.: comment 9

Download all attachments as: .zip

Change History (25)

by Chris Murphy, 11 years ago

Attachment:	VBox.log added

VBox.log

by Chris Murphy, 11 years ago

Attachment:	VBox.log.2 added

VBox2.log

by Chris Murphy, 11 years ago

Attachment:	journal2.txt added

journalctl -b copy2

by Chris Murphy, 11 years ago

Attachment:	journalctl_exceptionwhilewritingISOsto320GFW800.txt added

journalctl -b copy1

by Chris Murphy, 11 years ago

Attachment:	osxdmesg.txt added

osx dmesg 3 prior hours, filtered for kernel messages

comment:1 by Chris Murphy, 11 years ago

With 3.15rc4, and guest additions, copying from shared folder /media/sf_chris to raw external device.

Problem does not occur with many small files (tens of thousands, all < 1MB). Problem does occur with many medium sized files (hundreds, 8-15MB each). Problem does occur with few very large files (several, each < 500MB).

Problem is with ata1.00 /dev/sda which is the boot/rootfs volume. It's not involved in the copy. Will attach dmesg_315rc4_01.txt to apply to this comment.

by Chris Murphy, 11 years ago

Attachment:	dmesg_315rc4_01.txt added

applies to comment 1

by Chris Murphy, 11 years ago

Attachment:	VBox_01.log added

applies to comment 1

comment:2 by Chris Murphy, 11 years ago

With 3.14.2-200.fc20.x86_64 and guest additions, the problem is also reproducible. Further the dmesg suggests some file system corruption is occurring as Btrfs is correcting for read errors, e.g.

[ 349.219711] BTRFS: read error corrected: ino 1 off 14071566336 (dev /dev/sda2 sector 29597064)

Will attach dmesg_3142_02.txt and VBox_02.log for this comment. Again the copy is from shared folder /media/sf_chris going to raw external device on /dev/sdc ata3.00. Yet the dmesg shows read/write errors with uninvolved device ata1.00 /dev/sda.

by Chris Murphy, 11 years ago

Attachment:	dmesg_3142_02.txt added

applies to comment 2

by Chris Murphy, 11 years ago

Attachment:	VBox_02.log added

applies to comment 2

comment:3 by Chris Murphy, 11 years ago

Populated a VDI with hundreds of ~10MB files, continued to use Linux 3.14.2 with guest additions installed. Then copied files from the populated VDI (/dev/sdb) to raw external device VMDK (/dev/sdc), and the problem is not reproducible. Sounds like the problem is with using the shared folder (as a source), and is manifesting with read/write errors on /dev/sda.

comment:4 by Chris Murphy, 11 years ago

OK no, it's not shared folders or guest additions. After removing all traces of vbox/VBox and rebooting, I can reproduce the problem by doing a btrfs scrub on /dev/sdc which is backed by VMDK file pointing to a raw external firewire 800 device. After ~30 seconds I start getting read/write errors from /dev/sda. So it sounds like ata3 being very busy is tying up ata1, and normal requests to rootfs are simplying hanging, the SCSI block layer in the kernel is timing out at 30 seconds like it should and then resets the interface. Attaching VBox_04.log and dmesg_315rc4_04.txt.

by Chris Murphy, 11 years ago

Attachment:	dmesg_315rc4_04.txt added

by Chris Murphy, 11 years ago

Attachment:	VBox_04.log added

comment:5 by aeichner, 11 years ago

This looks like an I/O problem with host hard drive actually. The VBox.log shows that several I/O requests almost take up to two minutes to complete which is way beyond the 30 seconds limit of the guest. There is nothing we can do as far as I can see. Do you use FileVault to encrypt your home directory?

comment:6 by Chris Murphy, 11 years ago

Which host drive are you referring to? There are two. /dev/sda and /media/sf_chris are both on an SSD, which is what OS X is booting from, and that's using FileVaul2 so that whole host disk is encrypted. The host remains completely responsive throughout these events so I don't see how this host drive is to blame or I'd see such delays from the host.

The firewire external drive is exclusively used by VBox, I haven't attempted nor wouldn't attempt to use it from the host. But that drive isn't the one the guest kernel is report IO errors for.

Rebooting the host to use the same Linux kernel version used in the vbox guest, I'm unable to reproduce the problem. I can even do a concurrent Btrfs scrub of the external firewire drive while copying multiple large files to or from the same drive. So I don't think it's an IO problem with either of the drives at a hardware level.

comment:7 by Chris Murphy, 11 years ago

copy from to	Linux IO Errors	VBox Log
/media/sf_chris to VDI	no	clean
VDI to VDI	no	clean
VDI to raw external FW	no	dirty [1]
/media/sf_chris to raw external FW	yes [2]	dirty [3]

So far all of the failures involve copies to (or scrubs of) the raw external firewire device, but in every case that is not the device report by either Linux kernel or in VBox.log. Both kernel messages and vbox.log report it's /dev/sda, ata1, AHCI#0P0 which is the boot/rootfs volume for the guest, and is a VDI not raw device.

[1] Multiple instances of:
00:01:58.381819 AsyncCompletion: Task 0x0000010b0d1a40 completed after 28 seconds
00:01:58.381824 AsyncCompletion: Task 0x0000010b0d1940 completed after 27 seconds

[2] Multiple instances of:
[ 222.202808] kernel: sd 0:0:0:0: [sda] Unhandled error code
[ 222.203003] kernel: sd 0:0:0:0: [sda]
[ 222.203287] kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 222.203407] kernel: sd 0:0:0:0: [sda] CDB:
[ 222.203528] kernel: Write(10): 2a 00 00 58 a1 c8 00 02 00 00
[ 222.203644] kernel: end_request: I/O error, dev sda, sector 5808584
But also include flush time outs, read and write errors.

[3] Multiple instances of:
00:55:49.874239 AHCI#0P0: Cancelled task 0
00:55:49.874264 AHCI#0: Port 0 reset
00:55:56.088805 AHCI#0P0: Read request was active for 36 seconds
00:55:56.088823 AHCI#0P0: Canceled read at offset 3597271040 (24576 bytes left) returned rc=VINF_SUCCESS
00:55:56.088837 AsyncCompletion: Task 0x0000010b0d1080 completed after 36 seconds

comment:8 by Chris Murphy, 11 years ago

This still happens with Virtual Box 4.3.14 on OS X host, and linux 3.16.0-1.fc21.x86_64 with VBoxGuestAdditions_4.3.15-95180.iso. It only happens via the raw device when copying from shared folder. Rebooting baremetal the same kernel, copies to the same drive don't have any errors.

Aug 11 22:01:35 twenty1v.localdomain kernel: ata1.00: exception Emask 0x0 SAct 0x200000 SErr 0x0 action 0x6 frozen
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1.00: cmd 60/08:a8:60:c7:1a/00:00:00:00:00/40 tag 21 ncq 4096 in
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1: hard resetting link
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1.00: configured for UDMA/133
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:01:35 twenty1v.localdomain kernel: ata1: EH complete

by Chris Murphy, 11 years ago

Attachment:	VBox.2.log added

vboxlog_vbox4.3.14-linux3.16.0-guestadditions4.3.15.log

comment:9 by Chris Murphy, 11 years ago

And then this just happened on a 1.5 GB file copy from shared folder to vmdk raw device (FireWire 400 hard drive). I will separately attach a new vbox.log, the events in that log time stamp 00:40:37.023014 AHCI#0: Port 0 reset is when the below events occurred in the guest OS.

Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: NCQ disabled due to excessive errors
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:00:f8:a0:2a/00:00:00:00:00/40 tag 0 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:08:90:a0:2a/00:00:00:00:00/40 tag 1 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:10:48:a0:2a/00:00:00:00:00/40 tag 2 ncq 4096 out
                                                      res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:18:70:9f:2a/00:00:00:00:00/40 tag 3 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:20:a0:9e:2a/00:00:00:00:00/40 tag 4 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:28:88:9e:2a/00:00:00:00:00/40 tag 5 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:30:60:9d:2a/00:00:00:00:00/40 tag 6 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/50:38:a0:9a:2a/00:00:00:00:00/40 tag 7 ncq 40960 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/28:40:70:9a:2a/00:00:00:00:00/40 tag 8 ncq 20480 out
                                                      res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:48:e0:95:2a/00:00:00:00:00/40 tag 9 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:50:f8:93:2a/00:00:00:00:00/40 tag 10 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/10:58:c0:93:2a/00:00:00:00:00/40 tag 11 ncq 8192 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:60:a8:93:2a/00:00:00:00:00/40 tag 12 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:68:88:93:2a/00:00:00:00:00/40 tag 13 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:70:a8:92:2a/00:00:00:00:00/40 tag 14 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:78:08:92:2a/00:00:00:00:00/40 tag 15 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:80:30:91:2a/00:00:00:00:00/40 tag 16 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:88:a0:90:2a/00:00:00:00:00/40 tag 17 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:90:70:90:2a/00:00:00:00:00/40 tag 18 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:98:60:90:2a/00:00:00:00:00/40 tag 19 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:a0:30:90:2a/00:00:00:00:00/40 tag 20 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:a8:f0:8f:2a/00:00:00:00:00/40 tag 21 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:b0:d8:8f:2a/00:00:00:00:00/40 tag 22 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:b8:a0:8f:2a/00:00:00:00:00/40 tag 23 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:c0:70:8f:2a/00:00:00:00:00/40 tag 24 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/10:c8:50:8f:2a/00:00:00:00:00/40 tag 25 ncq 8192 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:d0:28:8f:2a/00:00:00:00:00/40 tag 26 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:d8:a8:8e:2a/00:00:00:00:00/40 tag 27 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:e0:80:8e:2a/00:00:00:00:00/40 tag 28 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:e8:60:8e:2a/00:00:00:00:00/40 tag 29 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd 61/08:f0:50:8e:2a/00:00:00:00:00/40 tag 30 ncq 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1: hard resetting link
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: configured for UDMA/133
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1: EH complete
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: failed command: WRITE DMA
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: cmd ca/00:08:50:8e:2a/00:00:00:00:00/e0 tag 14 dma 4096 out
                                                      res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: status: { DRDY }
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1: hard resetting link
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: configured for UDMA/133
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 22:15:06 twenty1v.localdomain kernel: ata1: EH complete

by Chris Murphy, 11 years ago

Attachment:	VBox.3.log added

comment 9

comment:10 by Chris Murphy, 11 years ago

This problem appears resolvable by disabling block device NCQ with this:

echo 1 > /sys/block/sdb/device/queue_depth

It seems to me the bug is that vbox is telling the kernel this device supports NCQ; when either it doesn't, or doesn't pass it through correctly.

[    1.235799] twenty1v.localdomain kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    1.237877] twenty1v.localdomain kernel: ata2.00: ATA-6: VBOX HARDDISK, 1.0, max UDMA/133
[    1.239316] twenty1v.localdomain kernel: ata2.00: 976773168 sectors, multi 128: LBA48 NCQ (depth 31/32)
[    1.241620] twenty1v.localdomain kernel: ata2.00: configured for UDMA/133
[    1.243500] twenty1v.localdomain kernel: scsi 1:0:0:0: Direct-Access     ATA      VBOX HARDDISK    1.0  PQ: 0 ANSI: 5
[    1.246480] twenty1v.localdomain kernel: sd 1:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[    1.246646] twenty1v.localdomain kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
[    1.249312] twenty1v.localdomain kernel: sd 1:0:0:0: [sdb] Write Protect is off
[    1.250106] twenty1v.localdomain kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    1.250164] twenty1v.localdomain kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.252394] twenty1v.localdomain kernel:  sdb: unknown partition table
[    1.253511] twenty1v.localdomain kernel: sd 1:0:0:0: [sdb] Attached SCSI disk

The virtual SATA controller should instead inform the kernel that NCQ is disabled/not supported, and then line three would show a queue depth of 1/32 instead of 31/32.

comment:11 by Megachip, 10 years ago

Anything new on it? Seems to have a similar problem :(

comment:12 by aeichner, 10 years ago

It could be possible that your host can't cope with I/o load produced by the guest. As a workaround you could try to limit the number of requests the guest starts with

VBoxManage setextradata <VM name> "VBoxInternal/Devices/ahci/0/Config/CmdSlotsAvail" <nr>

<nr> needs to be a number between 1 and 32 indicating the number of requests which can be active simultaneously.

Note: See TracTickets for help on using tickets.

Download in other formats: