id summary reporter owner description type status component version resolution keywords cc guest host 16677 A reliable way to hang AioMgr0-N under certain conditions deletedgmailduetoJamesDamore_firing "A reliable way to hang AioMgr0-N when 'fs readahead'(hdparm -a) is on (and is 256(KiB)) but only when 'Use Host I/O Cache' is disabled in virtualbox 5.1.18 !r114002 (currently latest available) on Arch Linux 64bit, host OS. Basically, reading from guest SATA HDD and/or IDE CD will hang AioMgr0-N(until killed) on host OS and guest OS will fail to read/boot etc. This seems to be virtualbox settings-agnostic, except for the 'Use Host I/O Cache' which has to be disabled for this to occurr AND guest OS readahead has to be enabled (usually set to 256KiB; able to repro with 16KiB too, with different dd 'skip' settings). So, this doesn't ocurr when hdparm -a0 /dev/sda was executed, or any value between 0 and 7, inclusive, (keeps readahead 'off') Note: {{{ hdparm -a0 /dev/sda }}} seems to do the same as {{{ # echo 0 > /sys/class/block/sda/queue/read_ahead_kb }}} ---- Steps to reproduce this issue on a guest HDD/vdi (as opposed to a guest CD/IDE/iso which is in the 'next section'): 1. get archlinux-2017.03.01-dual.iso (or any current/latest iso, BUT then the dd repro will most likely not work at the same 'skip'!) However according to this comment from frank https://www.virtualbox.org/ticket/15281#comment:8 , I shouldn't be posting any such links here, but then reproducing this(with accuracy) is going to be really hard without them, so I'm going to be obscuring them with base64, hopefully that's a good compromise between not posting them and posting them... Pick a download location(476.0 MB): archlinux webpage with torrent/magnet links, located here: {{{ aHR0cHM6Ly93d3cuYXJjaGxpbnV4Lm9yZy9yZWxlbmcvcmVsZWFzZXMvMjAxNy4wMy4wMS8= }}} (my)google drive, directory with the iso file: {{{ aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL29wZW4/aWQ9MEI2UC05VXlKRmp6S1ZsWkhUa1E0YUZWbGRUUQ== }}} sha1sum for that iso is: {{{ f426866ca632a35a3eeae8e4080cff25ec8da614 archlinux-2017.03.01-dual.iso }}} 2. get my VM the reproduction steps only work with my VM with that specific vdi file; if that vdi gets cloned(even if to same format: vdi) it will not reproduce anymore! The IDE CD repro. steps will still work and make it hang reliably though (see 'next section') Pick one download location(1.9G): dropbox: {{{ aHR0cHM6Ly93d3cuZHJvcGJveC5jb20vc2gvaHFiMGIwNDB4aTYyMDN2L0FBRE50RVF5Z0tGU1duNnpRaDNZS1VVcmE/ZGw9MA== }}} (my)google drive: {{{ aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL2RyaXZlL2ZvbGRlcnMvMEI2UC05VXlKRmp6S1Rrc3lZM0pZWlhSbU1Xcz91c3A9c2hhcmluZw== }}} download these 3 files: ''arch3.vbox, arch3.vdi, checksums'' md5,sha1,sha256 and sha512 for those two are: {{{ 1aa8124771fd7c8761435ff26ee2d55b ./arch3.vbox f15c97a939cda4c6bdd413fb52a9519d ./arch3.vdi 9c1a10c0ab0eb945b677faee22ef9ab2d1ba8511 ./arch3.vbox a9ed9c0c3542b9d174a4b3fa2b89d1fd72f249f3 ./arch3.vdi 7ac7315a292311f19f59af86bbba687885813d8da7a3f263a0bb2031369fe1f1 ./arch3.vbox d6141ff8690d1c114fd2e8067107473e456a8b30e196fac17d601f1da564f676 ./arch3.vdi 57742f80911f72186fe6de168e310da84e9d6f39c5683b3f7669abfdf57708f1965b923d5b7462d4af82dc75ddd83ea5cbbcef27330b833f8dabe7da7e5a0d9d ./arch3.vbox 1e752ad68de113d41999591d105798acaff130716ec81f88325fc9474e160e761eef305fd87885bdf500f1e400e3ec31f32c43c4b5a924304f4027133611e880 ./arch3.vdi }}} 3. TODO: ""import"" the .vbox somehow in the VirtualBox UI! (export/import wouldn't have worked because it clones my vdi; and imports it as vmdk) 4. ensure 'Controller: IDE' has 'Use Host I/O Cache' enabled (it does by default when creating new VM) ^ this means CD access won't hang 5. ensure 'Controller: SATA' has 'Use Host I/O Cache' disabled (it's disabled by default when creating new VM) ^ this means HDD access will hang, below. 6. start, Boot from CD/iso, you see grub ""Boot Arch Linux (x86_64)"", press Enter to boot from the iso (do not boot HDD's OS, it will hang during boot sequence) 7. you get to the root prompt, auto logged in. info: This is the default, 256KiB of readahead: {{{ # hdparm -a /dev/sda /dev/sda: readahead = 256 (on) }}} {{{ # hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: VBOX HARDDISK Serial Number: VB78ecd1fa-712db085 Firmware Revision: 1.0 Standards: Used: ATA/ATAPI-6 published, ANSI INCITS 361-2002 Supported: 6 5 4 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 25165824 LBA48 user addressable sectors: 25165824 Logical/Physical Sector size: 512 bytes device size with M = 1024*1024: 12288 MBytes device size with M = 1000*1000: 12884 MBytes (12 GB) cache/buffer size = 256 KBytes (type=DualPortCache) Capabilities: LBA, IORDY(cannot be disabled) Queue depth: 32 Standby timer values: spec'd by Vendor, no device specific minimum R/W multiple sector transfer: Max = 128 Current = 128 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * Power Management feature set * Write cache * Look-ahead * 48-bit Address feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * Gen2 signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) Checksum: correct }}} ---- 8. how to hang (boot session 3 is most interesting, below) No hang when the following: ---- boot session 1: {{{ dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=8 }}} boot session 2: {{{ dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=2 }}} boot session 3: {{{ dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=1 dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=1 }}} Hang when the following: ---- boot session 1: {{{ dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=9 }}} boot session 2: {{{ dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2 }}} boot session 3: {{{ dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2 }}} To exit the hang, just kill the PID of AioMgr0-N, eg. in 'top', press Shift+H to show threads names (maybe pressing 'c' is needed to toggle between cmdline/names, depending on your 'top' settings) then see the PID of AioMgr0-N, it should be at the top of the list using CPU (if sorted by %CPU, press x to see sort column) it's using about 22% constantly, unless you have set 3 CPUs in VM, then it's 100% (aka 1 core used fully) That kills the VM window, but not the VirtualBox UI window, which is what I want anyway. What you see in guest's dmesg after the hang: {{{ [ 16.159742] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready [ 291.480202] ata1.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen [ 291.480544] ata1.00: failed command: READ FPDMA QUEUED [ 291.480821] ata1.00: cmd 60/20:c0:e8:a4:00/00:00:00:00:00/40 tag 24 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 291.481437] ata1.00: status: { DRDY } [ 291.481653] ata1: hard resetting link [ 291.792860] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 291.793245] ata1.00: configured for UDMA/133 [ 291.793251] ata1.00: device reported invalid CHS sector 0 [ 291.793270] ata1: EH complete [ 323.906810] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen [ 323.907248] ata1.00: failed command: READ FPDMA QUEUED [ 323.907515] ata1.00: cmd 60/20:d8:e8:a4:00/00:00:00:00:00/40 tag 27 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 323.908164] ata1.00: status: { DRDY } [ 323.908868] ata1: hard resetting link [ 324.223058] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 324.224317] ata1.00: configured for UDMA/133 [ 324.224323] ata1.00: device reported invalid CHS sector 0 [ 324.224342] ata1: EH complete [ 354.626843] ata1.00: exception Emask 0x0 SAct 0x4000 SErr 0x0 action 0x6 frozen [ 354.628012] ata1.00: failed command: READ FPDMA QUEUED [ 354.628631] ata1.00: cmd 60/20:70:e8:a4:00/00:00:00:00:00/40 tag 14 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 354.631160] ata1.00: status: { DRDY } [ 354.631854] ata1: hard resetting link [ 354.942422] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 354.943605] ata1.00: configured for UDMA/133 [ 354.943613] ata1.00: device reported invalid CHS sector 0 [ 354.943637] ata1: EH complete [ 387.053504] ata1.00: NCQ disabled due to excessive errors [ 387.053508] ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen [ 387.054265] ata1.00: failed command: READ FPDMA QUEUED [ 387.054859] ata1.00: cmd 60/20:08:e8:a4:00/00:00:00:00:00/40 tag 1 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 387.057247] ata1.00: status: { DRDY } [ 387.057842] ata1: hard resetting link [ 387.373510] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 387.374544] ata1.00: configured for UDMA/133 [ 387.374553] ata1.00: device reported invalid CHS sector 0 [ 387.374588] ata1: EH complete [ 419.480161] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 419.481049] ata1.00: failed command: READ DMA [ 419.481819] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 19 dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 419.485077] ata1.00: status: { DRDY } [ 419.485813] ata1: hard resetting link [ 419.799146] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 419.799713] ata1.00: configured for UDMA/133 [ 419.799719] ata1.00: device reported invalid CHS sector 0 [ 419.799736] ata1: EH complete [ 451.906813] ata1: limiting SATA link speed to 1.5 Gbps [ 451.906817] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 451.907568] ata1.00: failed command: READ DMA [ 451.908150] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 6 dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 451.911967] ata1.00: status: { DRDY } [ 451.912716] ata1: hard resetting link [ 452.225783] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 452.226126] ata1.00: configured for UDMA/133 [ 452.226126] ata1.00: device reported invalid CHS sector 0 [ 452.226126] sd 0:0:0:0: [sda] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 [ 452.226126] sd 0:0:0:0: [sda] tag#6 Sense Key : 0x5 [current] [ 452.226127] sd 0:0:0:0: [sda] tag#6 ASC=0x21 ASCQ=0x4 [ 452.226127] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x28 28 00 00 00 a4 e8 00 00 20 00 [ 452.226127] blk_update_request: I/O error, dev sda, sector 42216 [ 452.227928] ata1: EH complete }}} --------- In the real world when 'Use Host I/O Cache' is disabled: - for the VM's IDE CD drive: I cannot boot from the ArchLinux or Manjaro iso (it will hang similarily) - for the VM's SATA HDD drive: cannot boot existing arch linux install (also seen this on other guest linux distros; it's not an ArchLinux thing) --------- next section: For hanging when reading /dev/sr0 aka 'the guest OS CD drive with archlinux-2017.03.01-dual.iso', when booting the OS from HDD, two ways to reproduce: Note: You cannot reproduce if readahead is off, eg. hdparm -a0 /dev/sr0 (also note, that -a256 being the default, the same dd 'skip' values from below will not work for a different readahead value) way 1. a. to make HDD not hang AioMgr, clone it first (this magically gets rid of any AioMgr0-N hangs for some reason) $ VBoxManage clonehd --format VDI arch3.vdi /tmp/cloned.vdi (obvious: to clone, first remove arch3.vdi from the VM, clone, then add cloned.vdi to the VM) b. set both IDE and SATA to have disabled 'Use Host I/O Cache' c) boot from HDD(it boots from iso, at grub menu choose second option which is: 'Boot existing OS'); Don't boot iso's OS because it hangs instantly at Loading boot/x86_64/vmlinuz... and hangs AioMgr1-N with 99.9% cpu usage until you kill PID of AioMgr1-N which closes that VM's window. way 2. a. if you didn't clone the vdi b. set only IDE(where iso is) on disbled 'Use Host I/O Cache', but let SATA(where vdi is) on enabled 'Use Host I/O Cache' c. boot, it will boot from iso, choose 'Boot existing OS' to boot from HDD(vdi) way1&2 commons: d. at root prompt(you have to login as 'root', password is 'a'): info: {{{ [root@vm ~]# hdparm -a /dev/sr0 /dev/sr0: readahead = 256 (on) [root@vm ~]# hdparm -A /dev/sr0 /dev/sr0: HDIO_DRIVE_CMD(identify) failed: Input/output error }}} No hang when the following: boot session 1: {{{ dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=2 dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=2 }}} boot session 2: {{{ dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=1 dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=1 dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=1 }}} Hang when the following: boot session 1: {{{ # dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 }}} boot session 2: {{{ # dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 }}} ------- way1: {{{ # time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 512 bytes copied, 101.679 s, 0.0 kB/s real 1m41.692s user 0m0.000s sys 0m0.007s }}} top: AioMgr1-N (yep, 1 not 0) constant CPU usage of 19.6% -> 21.6% ...even after dd finishes! dmesg: {{{ [ 89.179123] systemd-logind[196]: New session c2 of user root. [ 129.346812] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 129.347540] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 129.355906] ata3.00: status: { DRDY } [ 134.400076] ata3: link is slow to respond, please be patient (ready=0) [ 139.386826] ata3: device not ready (errno=-16), forcing hardreset [ 139.387505] ata3: soft resetting link [ 144.586725] ata3: link is slow to respond, please be patient (ready=0) [ 149.403407] ata3: SRST failed (errno=-16) [ 149.404249] ata3: soft resetting link [ 154.603401] ata3: link is slow to respond, please be patient (ready=0) [ 159.420725] ata3: SRST failed (errno=-16) [ 159.421558] ata3: soft resetting link [ 164.620145] ata3: link is slow to respond, please be patient (ready=0) [ 194.426716] ata3: SRST failed (errno=-16) [ 194.427477] ata3: soft resetting link [ 199.456775] ata3: SRST failed (errno=-16) [ 199.471749] ata3: reset failed, giving up [ 199.472636] ata3.00: disabled [ 199.474100] ata3: EH complete [ 199.474938] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 199.476392] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00 [ 199.491977] blk_update_request: I/O error, dev sr0, sector 88040 [ 199.494604] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 199.496175] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00 [ 199.497683] blk_update_request: I/O error, dev sr0, sector 88040 [ 199.498588] Buffer I/O error on dev sr0, logical block 11005, async page read }}} ------- way2: {{{ # time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 512 bytes copied, 102.138 s, 0.0 kB/s real 1m42.147s user 0m0.007s sys 0m0.000s }}} top: AioMgr0-N (0 here) constant CPU usage of 19.6% -> 21.6% (100% would mean 1 core is fully used) ...even after dd finishes! dmesg: {{{ [ 102.492843] systemd-logind[202]: New session c2 of user root. [ 102.494941] systemd[1]: Started Session c2 of user root. [ 139.586919] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 139.587909] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 139.592029] ata3.00: status: { DRDY } [ 144.633536] ata3: link is slow to respond, please be patient (ready=0) [ 149.620205] ata3: device not ready (errno=-16), forcing hardreset [ 149.621023] ata3: soft resetting link [ 154.820197] ata3: link is slow to respond, please be patient (ready=0) [ 159.636906] ata3: SRST failed (errno=-16) [ 159.637643] ata3: soft resetting link [ 164.836889] ata3: link is slow to respond, please be patient (ready=0) [ 169.653533] ata3: SRST failed (errno=-16) [ 169.654227] ata3: soft resetting link [ 174.854340] ata3: link is slow to respond, please be patient (ready=0) [ 204.660185] ata3: SRST failed (errno=-16) [ 204.660805] ata3: soft resetting link [ 209.690159] ata3: SRST failed (errno=-16) [ 209.703203] ata3: reset failed, giving up [ 209.704066] ata3.00: disabled [ 209.704786] ata3: EH complete [ 209.705448] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 209.719212] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00 [ 209.720520] blk_update_request: I/O error, dev sr0, sector 88040 [ 209.722434] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 209.724036] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00 [ 209.725383] blk_update_request: I/O error, dev sr0, sector 88040 [ 209.726154] Buffer I/O error on dev sr0, logical block 11005, async page read }}} ------- my host is Lenovo Z575, kernel: Linux z5 4.11.0-rc7-g4f7d029b9bf0 #151 SMP PREEMPT Wed Apr 19 18:36:02 CEST 2017 x86_64 GNU/Linux virtualbox 5.1.18 !r114002 on Arch Linux 64bit Potentially relevant tickets: #15281, #13022 " defect new virtual disk VirtualBox 5.1.18 Linux Linux