VirtualBox

Ticket #16677 (new defect) — at Initial Version

Opened 2 years ago

Last modified 2 years ago

A reliable way to hang AioMgr0-N when 'fs readahead'(hdparm -a) is on (and is 256(KiB)) but only when 'Use Host I/O Cache' is disabled, for IDE(iso) and SATA(vdi) devices

Reported by: xftroxgpx Owned by:
Component: virtual disk Version: VirtualBox 5.1.18
Keywords: Cc:
Guest type: Linux Host type: Linux

Description

A reliable way to hang AioMgr0-N when 'fs readahead'(hdparm -a) is on (and is 256(KiB)) but only when 'Use Host I/O Cache' is disabled

in virtualbox 5.1.18 r114002 (currently latest available) on Arch Linux 64bit, host OS. Basically, reading from guest SATA HDD and/or IDE CD will hang AioMgr0-N(until killed) on host OS and guest OS will fail to read/boot etc.

This seems to be virtualbox settings-agnostic, except for the 'Use Host I/O Cache' which has to be disabled for this to occurr AND guest OS readahead has to be enabled (usually set to 256KiB; able to repro with 16KiB too, with different dd 'skip' settings). So, this doesn't ocurr when hdparm -a0 /dev/sda was executed, or any value between 0 and 7, inclusive, (keeps readahead 'off')

Note:

hdparm -a0 /dev/sda

seems to do the same as

# echo 0 > /sys/class/block/sda/queue/read_ahead_kb

Steps to reproduce this issue on a guest HDD/vdi (as opposed to a guest CD/IDE/iso which is in the 'next section'):

  1. get archlinux-2017.03.01-dual.iso (or any current/latest iso, BUT then the dd repro will most likely not work at the same 'skip'!)

However according to this comment from frank https://www.virtualbox.org/ticket/15281#comment:8 , I shouldn't be posting any such links here, but then reproducing this(with accuracy) is going to be really hard without them, so I'm going to be obscuring them with base64, hopefully that's a good compromise between not posting them and posting them...

Pick a download location(476.0 MB):

archlinux webpage with torrent/magnet links, located here:

aHR0cHM6Ly93d3cuYXJjaGxpbnV4Lm9yZy9yZWxlbmcvcmVsZWFzZXMvMjAxNy4wMy4wMS8=

(my)google drive, directory with the iso file:

aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL29wZW4/aWQ9MEI2UC05VXlKRmp6S1ZsWkhUa1E0YUZWbGRUUQ==

sha1sum for that iso is:

f426866ca632a35a3eeae8e4080cff25ec8da614  archlinux-2017.03.01-dual.iso

  1. get my VM

the reproduction steps only work with my VM with that specific vdi file; if that vdi gets cloned(even if to same format: vdi) it will not reproduce anymore! The IDE CD repro. steps will still work and make it hang reliably though (see 'next section')

Pick one download location(1.9G):

dropbox:

aHR0cHM6Ly93d3cuZHJvcGJveC5jb20vc2gvaHFiMGIwNDB4aTYyMDN2L0FBRE50RVF5Z0tGU1duNnpRaDNZS1VVcmE/ZGw9MA==

(my)google drive:

aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL2RyaXZlL2ZvbGRlcnMvMEI2UC05VXlKRmp6S1Rrc3lZM0pZWlhSbU1Xcz91c3A9c2hhcmluZw==

download these 3 files: arch3.vbox arch3.vdi checksums

md5,sha1,sha256 and sha512 for those two are:

1aa8124771fd7c8761435ff26ee2d55b  ./arch3.vbox
f15c97a939cda4c6bdd413fb52a9519d  ./arch3.vdi
9c1a10c0ab0eb945b677faee22ef9ab2d1ba8511  ./arch3.vbox
a9ed9c0c3542b9d174a4b3fa2b89d1fd72f249f3  ./arch3.vdi
7ac7315a292311f19f59af86bbba687885813d8da7a3f263a0bb2031369fe1f1  ./arch3.vbox
d6141ff8690d1c114fd2e8067107473e456a8b30e196fac17d601f1da564f676  ./arch3.vdi
57742f80911f72186fe6de168e310da84e9d6f39c5683b3f7669abfdf57708f1965b923d5b7462d4af82dc75ddd83ea5cbbcef27330b833f8dabe7da7e5a0d9d  ./arch3.vbox
1e752ad68de113d41999591d105798acaff130716ec81f88325fc9474e160e761eef305fd87885bdf500f1e400e3ec31f32c43c4b5a924304f4027133611e880  ./arch3.vdi

  1. TODO: "import" the .vbox somehow in the VirtualBox UI! (export/import wouldn't have worked because it clones my vdi; and imports it as vmdk)
  2. ensure 'Controller: IDE' has 'Use Host I/O Cache' enabled (it does by default when creating new VM)

this means CD access won't hang

  1. ensure 'Controller: SATA' has 'Use Host I/O Cache' disabled (it's disabled by default when creating new VM)

this means HDD access will hang, below.

  1. start, Boot from CD/iso, you see grub "Boot Arch Linux (x86_64)", press Enter to boot from the iso

(do not boot HDD's OS, it will hang during boot sequence)

  1. you get to the root prompt, auto logged in.

info:

This is the default, 256KiB of readahead:

# hdparm -a /dev/sda



/dev/sda:
 readahead     = 256 (on)
# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
	Model Number:       VBOX HARDDISK                           
	Serial Number:      VB78ecd1fa-712db085 
	Firmware Revision:  1.0     
Standards:
	Used: ATA/ATAPI-6 published, ANSI INCITS 361-2002 
	Supported: 6 5 4 
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:    16514064
	LBA    user addressable sectors:    25165824
	LBA48  user addressable sectors:    25165824
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:       12288 MBytes
	device size with M = 1000*1000:       12884 MBytes (12 GB)
	cache/buffer size  = 256 KBytes (type=DualPortCache)
Capabilities:
	LBA, IORDY(cannot be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Vendor, no device specific minimum
	R/W multiple sector transfer: Max = 128	Current = 128
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	48-bit Address feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
Checksum: correct

  1. how to hang (boot session 3 is most interesting, below)

No hang when the following: ----

boot session 1:

dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=8

boot session 2:

dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=2

boot session 3:

dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=1
dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=1

Hang when the following: ----

boot session 1:

dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=9

boot session 2:

dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2

boot session 3:

dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2

To exit the hang, just kill the PID of AioMgr0-N, eg. in 'top', press Shift+H to show threads names (maybe pressing 'c' is needed to toggle between cmdline/names, depending on your 'top' settings) then see the PID of AioMgr0-N, it should be at the top of the list using CPU (if sorted by %CPU, press x to see sort column) it's using about 22% constantly, unless you have set 3 CPUs in VM, then it's 100% (aka 1 core used fully) That kills the VM window, but not the VirtualBox UI window, which is what I want anyway.

What you see in guest's dmesg after the hang:

[   16.159742] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
[  291.480202] ata1.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
[  291.480544] ata1.00: failed command: READ FPDMA QUEUED
[  291.480821] ata1.00: cmd 60/20:c0:e8:a4:00/00:00:00:00:00/40 tag 24 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  291.481437] ata1.00: status: { DRDY }
[  291.481653] ata1: hard resetting link
[  291.792860] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  291.793245] ata1.00: configured for UDMA/133
[  291.793251] ata1.00: device reported invalid CHS sector 0
[  291.793270] ata1: EH complete
[  323.906810] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen
[  323.907248] ata1.00: failed command: READ FPDMA QUEUED
[  323.907515] ata1.00: cmd 60/20:d8:e8:a4:00/00:00:00:00:00/40 tag 27 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  323.908164] ata1.00: status: { DRDY }
[  323.908868] ata1: hard resetting link
[  324.223058] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  324.224317] ata1.00: configured for UDMA/133
[  324.224323] ata1.00: device reported invalid CHS sector 0
[  324.224342] ata1: EH complete
[  354.626843] ata1.00: exception Emask 0x0 SAct 0x4000 SErr 0x0 action 0x6 frozen
[  354.628012] ata1.00: failed command: READ FPDMA QUEUED
[  354.628631] ata1.00: cmd 60/20:70:e8:a4:00/00:00:00:00:00/40 tag 14 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  354.631160] ata1.00: status: { DRDY }
[  354.631854] ata1: hard resetting link
[  354.942422] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  354.943605] ata1.00: configured for UDMA/133
[  354.943613] ata1.00: device reported invalid CHS sector 0
[  354.943637] ata1: EH complete
[  387.053504] ata1.00: NCQ disabled due to excessive errors
[  387.053508] ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
[  387.054265] ata1.00: failed command: READ FPDMA QUEUED
[  387.054859] ata1.00: cmd 60/20:08:e8:a4:00/00:00:00:00:00/40 tag 1 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  387.057247] ata1.00: status: { DRDY }
[  387.057842] ata1: hard resetting link
[  387.373510] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  387.374544] ata1.00: configured for UDMA/133
[  387.374553] ata1.00: device reported invalid CHS sector 0
[  387.374588] ata1: EH complete
[  419.480161] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  419.481049] ata1.00: failed command: READ DMA
[  419.481819] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 19 dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  419.485077] ata1.00: status: { DRDY }
[  419.485813] ata1: hard resetting link
[  419.799146] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  419.799713] ata1.00: configured for UDMA/133
[  419.799719] ata1.00: device reported invalid CHS sector 0
[  419.799736] ata1: EH complete
[  451.906813] ata1: limiting SATA link speed to 1.5 Gbps
[  451.906817] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  451.907568] ata1.00: failed command: READ DMA
[  451.908150] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 6 dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  451.911967] ata1.00: status: { DRDY }
[  451.912716] ata1: hard resetting link
[  452.225783] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  452.226126] ata1.00: configured for UDMA/133
[  452.226126] ata1.00: device reported invalid CHS sector 0
[  452.226126] sd 0:0:0:0: [sda] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[  452.226126] sd 0:0:0:0: [sda] tag#6 Sense Key : 0x5 [current] 
[  452.226127] sd 0:0:0:0: [sda] tag#6 ASC=0x21 ASCQ=0x4 
[  452.226127] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x28 28 00 00 00 a4 e8 00 00 20 00
[  452.226127] blk_update_request: I/O error, dev sda, sector 42216
[  452.227928] ata1: EH complete

In the real world when 'Use Host I/O Cache' is disabled:

  • for the VM's IDE CD drive: I cannot boot from the ArchLinux or Manjaro iso (it will hang similarily)
  • for the VM's SATA HDD drive: cannot boot existing arch linux install

(also seen this on other guest linux distros; it's not an ArchLinux thing)


next section:

For hanging when reading /dev/sr0 aka 'the guest OS CD drive with archlinux-2017.03.01-dual.iso', when booting the OS from HDD,

two ways to reproduce:

Note: You cannot reproduce if readahead is off, eg. hdparm -a0 /dev/sr0

(also note, that -a256 being the default, the same dd 'skip' values from below will not work for a different readahead value)

way 1.

a) to make HDD not hang AioMgr, clone it first (this magically gets rid of any AioMgr0-N hangs for some reason) $ VBoxManage clonehd --format VDI arch3.vdi /tmp/cloned.vdi (obvious: to clone, first remove arch3.vdi from the VM, clone, then add cloned.vdi to the VM) b) set both IDE and SATA to have disabled 'Use Host I/O Cache' c) boot from HDD(it boots from iso, at grub menu choose second option which is: 'Boot existing OS'); Don't boot iso's OS because it hangs instantly at Loading boot/x86_64/vmlinuz... and hangs AioMgr1-N with 99.9% cpu usage until you kill PID of AioMgr1-N which closes that VM's window.

way 2.

a) if you didn't clone the vdi b) set only IDE(where iso is) on disbled 'Use Host I/O Cache', but let SATA(where vdi is) on enabled 'Use Host I/O Cache' c) boot, it will boot from iso, choose 'Boot existing OS' to boot from HDD(vdi)

way1&2 commons:

d) at root prompt(you have to login as 'root', password is 'a'):

info:

[root@vm ~]# hdparm -a /dev/sr0

/dev/sr0:
 readahead     = 256 (on)
[root@vm ~]# hdparm -A /dev/sr0

/dev/sr0:
 HDIO_DRIVE_CMD(identify) failed: Input/output error

No hang when the following:

boot session 1:

dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=2
dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=2

boot session 2:

dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=1
dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=1
dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=1

Hang when the following:

boot session 1:

# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

boot session 2:

# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

way1:

# time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

dd: error reading '/dev/sr0': Input/output error
1+0 records in
1+0 records out
512 bytes copied, 101.679 s, 0.0 kB/s

real	1m41.692s
user	0m0.000s
sys	0m0.007s

top:

AioMgr1-N (yep, 1 not 0) constant CPU usage of 19.6% -> 21.6% ...even after dd finishes!

dmesg:

[   89.179123] systemd-logind[196]: New session c2 of user root.
[  129.346812] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  129.347540] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in
                        opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[  129.355906] ata3.00: status: { DRDY }
[  134.400076] ata3: link is slow to respond, please be patient (ready=0)
[  139.386826] ata3: device not ready (errno=-16), forcing hardreset
[  139.387505] ata3: soft resetting link
[  144.586725] ata3: link is slow to respond, please be patient (ready=0)
[  149.403407] ata3: SRST failed (errno=-16)
[  149.404249] ata3: soft resetting link
[  154.603401] ata3: link is slow to respond, please be patient (ready=0)
[  159.420725] ata3: SRST failed (errno=-16)
[  159.421558] ata3: soft resetting link
[  164.620145] ata3: link is slow to respond, please be patient (ready=0)
[  194.426716] ata3: SRST failed (errno=-16)
[  194.427477] ata3: soft resetting link
[  199.456775] ata3: SRST failed (errno=-16)
[  199.471749] ata3: reset failed, giving up
[  199.472636] ata3.00: disabled
[  199.474100] ata3: EH complete
[  199.474938] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  199.476392] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00
[  199.491977] blk_update_request: I/O error, dev sr0, sector 88040
[  199.494604] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  199.496175] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00
[  199.497683] blk_update_request: I/O error, dev sr0, sector 88040
[  199.498588] Buffer I/O error on dev sr0, logical block 11005, async page read


way2:

# time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2
dd: error reading '/dev/sr0': Input/output error
1+0 records in
1+0 records out
512 bytes copied, 102.138 s, 0.0 kB/s

real	1m42.147s
user	0m0.007s
sys	0m0.000s

top:

AioMgr0-N (0 here) constant CPU usage of 19.6% -> 21.6% (100% would mean 1 core is fully used) ...even after dd finishes!

dmesg:

[  102.492843] systemd-logind[202]: New session c2 of user root.
[  102.494941] systemd[1]: Started Session c2 of user root.
[  139.586919] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  139.587909] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in
                        opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[  139.592029] ata3.00: status: { DRDY }
[  144.633536] ata3: link is slow to respond, please be patient (ready=0)
[  149.620205] ata3: device not ready (errno=-16), forcing hardreset
[  149.621023] ata3: soft resetting link
[  154.820197] ata3: link is slow to respond, please be patient (ready=0)
[  159.636906] ata3: SRST failed (errno=-16)
[  159.637643] ata3: soft resetting link
[  164.836889] ata3: link is slow to respond, please be patient (ready=0)
[  169.653533] ata3: SRST failed (errno=-16)
[  169.654227] ata3: soft resetting link
[  174.854340] ata3: link is slow to respond, please be patient (ready=0)
[  204.660185] ata3: SRST failed (errno=-16)
[  204.660805] ata3: soft resetting link
[  209.690159] ata3: SRST failed (errno=-16)
[  209.703203] ata3: reset failed, giving up
[  209.704066] ata3.00: disabled
[  209.704786] ata3: EH complete
[  209.705448] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  209.719212] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00
[  209.720520] blk_update_request: I/O error, dev sr0, sector 88040
[  209.722434] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  209.724036] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00
[  209.725383] blk_update_request: I/O error, dev sr0, sector 88040
[  209.726154] Buffer I/O error on dev sr0, logical block 11005, async page read


my host is Lenovo Z575, kernel: Linux z5 4.11.0-rc7-g4f7d029b9bf0 #151 SMP PREEMPT Wed Apr 19 18:36:02 CEST 2017 x86_64 GNU/Linux

virtualbox 5.1.18 r114002 on Arch Linux 64bit

Potentially relevant tickets:

https://www.virtualbox.org/ticket/15281

https://www.virtualbox.org/ticket/13022

Change History

Changed 2 years ago by xftroxgpx

When booted from iso(/dev/sr0) and hanged HDD(/dev/sda) with dd 42215 count=2 (see step 8)

Changed 2 years ago by xftroxgpx

same log as before, but after 50 more minutes and after killing the hanged AioMgr0-N thread

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use