#16677 closed defect (fixed)
A reliable way to hang AioMgr0-N under certain conditions
Reported by: | deletedgmailduetoJamesDamore_firing | Owned by: | |
---|---|---|---|
Component: | virtual disk | Version: | VirtualBox 5.1.18 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Linux |
Description (last modified by )
A reliable way to hang AioMgr0-N when 'fs readahead'(hdparm -a) is on (and is 256(KiB)) but only when 'Use Host I/O Cache' is disabled in virtualbox 5.1.18 r114002 (currently latest available) on Arch Linux 64bit, host OS. Basically, reading from guest SATA HDD and/or IDE CD will hang AioMgr0-N(until killed) on host OS and guest OS will fail to read/boot etc.
This seems to be virtualbox settings-agnostic, except for the 'Use Host I/O Cache' which has to be disabled for this to occurr AND guest OS readahead has to be enabled (usually set to 256KiB; able to repro with 16KiB too, with different dd 'skip' settings). So, this doesn't ocurr when hdparm -a0 /dev/sda was executed, or any value between 0 and 7, inclusive, (keeps readahead 'off')
Note:
hdparm -a0 /dev/sda
seems to do the same as
# echo 0 > /sys/class/block/sda/queue/read_ahead_kb
Steps to reproduce this issue on a guest HDD/vdi (as opposed to a guest CD/IDE/iso which is in the 'next section'):
- get archlinux-2017.03.01-dual.iso (or any current/latest iso, BUT then the dd repro will most likely not work at the same 'skip'!)
However according to this comment from frank https://www.virtualbox.org/ticket/15281#comment:8 , I shouldn't be posting any such links here, but then reproducing this(with accuracy) is going to be really hard without them, so I'm going to be obscuring them with base64, hopefully that's a good compromise between not posting them and posting them...
Pick a download location(476.0 MB):
archlinux webpage with torrent/magnet links, located here:
aHR0cHM6Ly93d3cuYXJjaGxpbnV4Lm9yZy9yZWxlbmcvcmVsZWFzZXMvMjAxNy4wMy4wMS8=
(my)google drive, directory with the iso file:
aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL29wZW4/aWQ9MEI2UC05VXlKRmp6S1ZsWkhUa1E0YUZWbGRUUQ==
sha1sum for that iso is:
f426866ca632a35a3eeae8e4080cff25ec8da614 archlinux-2017.03.01-dual.iso
- get my VM the reproduction steps only work with my VM with that specific vdi file; if that vdi gets cloned(even if to same format: vdi) it will not reproduce anymore! The IDE CD repro. steps will still work and make it hang reliably though (see 'next section')
Pick one download location(1.9G):
dropbox:
aHR0cHM6Ly93d3cuZHJvcGJveC5jb20vc2gvaHFiMGIwNDB4aTYyMDN2L0FBRE50RVF5Z0tGU1duNnpRaDNZS1VVcmE/ZGw9MA==
(my)google drive:
aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL2RyaXZlL2ZvbGRlcnMvMEI2UC05VXlKRmp6S1Rrc3lZM0pZWlhSbU1Xcz91c3A9c2hhcmluZw==
download these 3 files: arch3.vbox, arch3.vdi, checksums
md5,sha1,sha256 and sha512 for those two are:
1aa8124771fd7c8761435ff26ee2d55b ./arch3.vbox f15c97a939cda4c6bdd413fb52a9519d ./arch3.vdi 9c1a10c0ab0eb945b677faee22ef9ab2d1ba8511 ./arch3.vbox a9ed9c0c3542b9d174a4b3fa2b89d1fd72f249f3 ./arch3.vdi 7ac7315a292311f19f59af86bbba687885813d8da7a3f263a0bb2031369fe1f1 ./arch3.vbox d6141ff8690d1c114fd2e8067107473e456a8b30e196fac17d601f1da564f676 ./arch3.vdi 57742f80911f72186fe6de168e310da84e9d6f39c5683b3f7669abfdf57708f1965b923d5b7462d4af82dc75ddd83ea5cbbcef27330b833f8dabe7da7e5a0d9d ./arch3.vbox 1e752ad68de113d41999591d105798acaff130716ec81f88325fc9474e160e761eef305fd87885bdf500f1e400e3ec31f32c43c4b5a924304f4027133611e880 ./arch3.vdi
- TODO: "import" the .vbox somehow in the VirtualBox UI! (export/import wouldn't have worked because it clones my vdi; and imports it as vmdk)
- ensure 'Controller: IDE' has 'Use Host I/O Cache' enabled (it does by default when creating new VM) this means CD access won't hang
- ensure 'Controller: SATA' has 'Use Host I/O Cache' disabled (it's disabled by default when creating new VM) this means HDD access will hang, below.
- start, Boot from CD/iso, you see grub "Boot Arch Linux (x86_64)", press Enter to boot from the iso (do not boot HDD's OS, it will hang during boot sequence)
- you get to the root prompt, auto logged in.
info:
This is the default, 256KiB of readahead:
# hdparm -a /dev/sda /dev/sda: readahead = 256 (on)
and
# hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: VBOX HARDDISK Serial Number: VB78ecd1fa-712db085 Firmware Revision: 1.0 Standards: Used: ATA/ATAPI-6 published, ANSI INCITS 361-2002 Supported: 6 5 4 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 25165824 LBA48 user addressable sectors: 25165824 Logical/Physical Sector size: 512 bytes device size with M = 1024*1024: 12288 MBytes device size with M = 1000*1000: 12884 MBytes (12 GB) cache/buffer size = 256 KBytes (type=DualPortCache) Capabilities: LBA, IORDY(cannot be disabled) Queue depth: 32 Standby timer values: spec'd by Vendor, no device specific minimum R/W multiple sector transfer: Max = 128 Current = 128 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * Power Management feature set * Write cache * Look-ahead * 48-bit Address feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * Gen2 signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) Checksum: correct
- how to hang (boot session 3 is most interesting, below)
No hang when the following: ----
boot session 1:
dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=8
boot session 2:
dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=2
boot session 3:
dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=1 dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=1
Hang when the following: ----
boot session 1:
dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=9
boot session 2:
dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2
boot session 3:
dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2
To exit the hang, just kill the PID of AioMgr0-N, eg. in 'top', press Shift+H to show threads names (maybe pressing 'c' is needed to toggle between cmdline/names, depending on your 'top' settings) then see the PID of AioMgr0-N, it should be at the top of the list using CPU (if sorted by %CPU, press x to see sort column) it's using about 22% constantly, unless you have set 3 CPUs in VM, then it's 100% (aka 1 core used fully) That kills the VM window, but not the VirtualBox UI window, which is what I want anyway.
What you see in guest's dmesg after the hang:
[ 16.159742] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready [ 291.480202] ata1.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen [ 291.480544] ata1.00: failed command: READ FPDMA QUEUED [ 291.480821] ata1.00: cmd 60/20:c0:e8:a4:00/00:00:00:00:00/40 tag 24 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 291.481437] ata1.00: status: { DRDY } [ 291.481653] ata1: hard resetting link [ 291.792860] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 291.793245] ata1.00: configured for UDMA/133 [ 291.793251] ata1.00: device reported invalid CHS sector 0 [ 291.793270] ata1: EH complete [ 323.906810] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen [ 323.907248] ata1.00: failed command: READ FPDMA QUEUED [ 323.907515] ata1.00: cmd 60/20:d8:e8:a4:00/00:00:00:00:00/40 tag 27 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 323.908164] ata1.00: status: { DRDY } [ 323.908868] ata1: hard resetting link [ 324.223058] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 324.224317] ata1.00: configured for UDMA/133 [ 324.224323] ata1.00: device reported invalid CHS sector 0 [ 324.224342] ata1: EH complete [ 354.626843] ata1.00: exception Emask 0x0 SAct 0x4000 SErr 0x0 action 0x6 frozen [ 354.628012] ata1.00: failed command: READ FPDMA QUEUED [ 354.628631] ata1.00: cmd 60/20:70:e8:a4:00/00:00:00:00:00/40 tag 14 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 354.631160] ata1.00: status: { DRDY } [ 354.631854] ata1: hard resetting link [ 354.942422] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 354.943605] ata1.00: configured for UDMA/133 [ 354.943613] ata1.00: device reported invalid CHS sector 0 [ 354.943637] ata1: EH complete [ 387.053504] ata1.00: NCQ disabled due to excessive errors [ 387.053508] ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen [ 387.054265] ata1.00: failed command: READ FPDMA QUEUED [ 387.054859] ata1.00: cmd 60/20:08:e8:a4:00/00:00:00:00:00/40 tag 1 ncq dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 387.057247] ata1.00: status: { DRDY } [ 387.057842] ata1: hard resetting link [ 387.373510] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 387.374544] ata1.00: configured for UDMA/133 [ 387.374553] ata1.00: device reported invalid CHS sector 0 [ 387.374588] ata1: EH complete [ 419.480161] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 419.481049] ata1.00: failed command: READ DMA [ 419.481819] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 19 dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 419.485077] ata1.00: status: { DRDY } [ 419.485813] ata1: hard resetting link [ 419.799146] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 419.799713] ata1.00: configured for UDMA/133 [ 419.799719] ata1.00: device reported invalid CHS sector 0 [ 419.799736] ata1: EH complete [ 451.906813] ata1: limiting SATA link speed to 1.5 Gbps [ 451.906817] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 451.907568] ata1.00: failed command: READ DMA [ 451.908150] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 6 dma 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 451.911967] ata1.00: status: { DRDY } [ 451.912716] ata1: hard resetting link [ 452.225783] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 452.226126] ata1.00: configured for UDMA/133 [ 452.226126] ata1.00: device reported invalid CHS sector 0 [ 452.226126] sd 0:0:0:0: [sda] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 [ 452.226126] sd 0:0:0:0: [sda] tag#6 Sense Key : 0x5 [current] [ 452.226127] sd 0:0:0:0: [sda] tag#6 ASC=0x21 ASCQ=0x4 [ 452.226127] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x28 28 00 00 00 a4 e8 00 00 20 00 [ 452.226127] blk_update_request: I/O error, dev sda, sector 42216 [ 452.227928] ata1: EH complete
In the real world when 'Use Host I/O Cache' is disabled:
- for the VM's IDE CD drive: I cannot boot from the ArchLinux or Manjaro iso (it will hang similarily)
- for the VM's SATA HDD drive: cannot boot existing arch linux install
(also seen this on other guest linux distros; it's not an ArchLinux thing)
next section:
For hanging when reading /dev/sr0 aka 'the guest OS CD drive with archlinux-2017.03.01-dual.iso', when booting the OS from HDD,
two ways to reproduce:
Note: You cannot reproduce if readahead is off, eg. hdparm -a0 /dev/sr0
(also note, that -a256 being the default, the same dd 'skip' values from below will not work for a different readahead value)
way 1.
- to make HDD not hang AioMgr, clone it first (this magically gets rid of any AioMgr0-N hangs for some reason)
$ VBoxManage clonehd --format VDI arch3.vdi /tmp/cloned.vdi (obvious: to clone, first remove arch3.vdi from the VM, clone, then add cloned.vdi to the VM)
- set both IDE and SATA to have disabled 'Use Host I/O Cache'
c) boot from HDD(it boots from iso, at grub menu choose second option which is: 'Boot existing OS'); Don't boot iso's OS because it hangs instantly at Loading boot/x86_64/vmlinuz... and hangs AioMgr1-N with 99.9% cpu usage until you kill PID of AioMgr1-N which closes that VM's window.
way 2.
- if you didn't clone the vdi
- set only IDE(where iso is) on disbled 'Use Host I/O Cache', but let SATA(where vdi is) on enabled 'Use Host I/O Cache'
- boot, it will boot from iso, choose 'Boot existing OS' to boot from HDD(vdi)
way1&2 commons:
- at root prompt(you have to login as 'root', password is 'a'):
info:
[root@vm ~]# hdparm -a /dev/sr0 /dev/sr0: readahead = 256 (on) [root@vm ~]# hdparm -A /dev/sr0 /dev/sr0: HDIO_DRIVE_CMD(identify) failed: Input/output error
No hang when the following:
boot session 1:
dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=2 dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=2
boot session 2:
dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=1 dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=1 dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=1
Hang when the following:
boot session 1:
# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2
boot session 2:
# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2
way1:
# time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 512 bytes copied, 101.679 s, 0.0 kB/s real 1m41.692s user 0m0.000s sys 0m0.007s
top:
AioMgr1-N (yep, 1 not 0) constant CPU usage of 19.6% -> 21.6% ...even after dd finishes!
dmesg:
[ 89.179123] systemd-logind[196]: New session c2 of user root. [ 129.346812] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 129.347540] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 129.355906] ata3.00: status: { DRDY } [ 134.400076] ata3: link is slow to respond, please be patient (ready=0) [ 139.386826] ata3: device not ready (errno=-16), forcing hardreset [ 139.387505] ata3: soft resetting link [ 144.586725] ata3: link is slow to respond, please be patient (ready=0) [ 149.403407] ata3: SRST failed (errno=-16) [ 149.404249] ata3: soft resetting link [ 154.603401] ata3: link is slow to respond, please be patient (ready=0) [ 159.420725] ata3: SRST failed (errno=-16) [ 159.421558] ata3: soft resetting link [ 164.620145] ata3: link is slow to respond, please be patient (ready=0) [ 194.426716] ata3: SRST failed (errno=-16) [ 194.427477] ata3: soft resetting link [ 199.456775] ata3: SRST failed (errno=-16) [ 199.471749] ata3: reset failed, giving up [ 199.472636] ata3.00: disabled [ 199.474100] ata3: EH complete [ 199.474938] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 199.476392] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00 [ 199.491977] blk_update_request: I/O error, dev sr0, sector 88040 [ 199.494604] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 199.496175] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00 [ 199.497683] blk_update_request: I/O error, dev sr0, sector 88040 [ 199.498588] Buffer I/O error on dev sr0, logical block 11005, async page read
way2:
# time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 512 bytes copied, 102.138 s, 0.0 kB/s real 1m42.147s user 0m0.007s sys 0m0.000s
top:
AioMgr0-N (0 here) constant CPU usage of 19.6% -> 21.6% (100% would mean 1 core is fully used) ...even after dd finishes!
dmesg:
[ 102.492843] systemd-logind[202]: New session c2 of user root. [ 102.494941] systemd[1]: Started Session c2 of user root. [ 139.586919] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 139.587909] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 139.592029] ata3.00: status: { DRDY } [ 144.633536] ata3: link is slow to respond, please be patient (ready=0) [ 149.620205] ata3: device not ready (errno=-16), forcing hardreset [ 149.621023] ata3: soft resetting link [ 154.820197] ata3: link is slow to respond, please be patient (ready=0) [ 159.636906] ata3: SRST failed (errno=-16) [ 159.637643] ata3: soft resetting link [ 164.836889] ata3: link is slow to respond, please be patient (ready=0) [ 169.653533] ata3: SRST failed (errno=-16) [ 169.654227] ata3: soft resetting link [ 174.854340] ata3: link is slow to respond, please be patient (ready=0) [ 204.660185] ata3: SRST failed (errno=-16) [ 204.660805] ata3: soft resetting link [ 209.690159] ata3: SRST failed (errno=-16) [ 209.703203] ata3: reset failed, giving up [ 209.704066] ata3.00: disabled [ 209.704786] ata3: EH complete [ 209.705448] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 209.719212] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00 [ 209.720520] blk_update_request: I/O error, dev sr0, sector 88040 [ 209.722434] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 209.724036] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00 [ 209.725383] blk_update_request: I/O error, dev sr0, sector 88040 [ 209.726154] Buffer I/O error on dev sr0, logical block 11005, async page read
my host is Lenovo Z575, kernel: Linux z5 4.11.0-rc7-g4f7d029b9bf0 #151 SMP PREEMPT Wed Apr 19 18:36:02 CEST 2017 x86_64 GNU/Linux
virtualbox 5.1.18 r114002 on Arch Linux 64bit
Attachments (10)
Change History (30)
by , 7 years ago
Attachment: | arch3-2017-04-20-22-24-23.log added |
---|
by , 7 years ago
Attachment: | arch3-2017-04-20-23-14-44.log added |
---|
same log as before, but after 50 more minutes and after killing the hanged AioMgr0-N thread
comment:1 by , 7 years ago
Briefly tried to hang AioMgr1-N (with the IDE/iso steps from above in 'next section', boot session 2) with the just-released(in ArchLinux testing repo, I mean) VirtualBox 5.1.20 r114628 (but I did recompile it myself though)
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 512 bytes copied, 102.141 s, 0.0 kB/s
dmesg:
[ 47.309063] mandb[208]: Updating index cache for path `/usr/share/man/man3'. Wait... [ 110.573626] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 110.574843] ata2.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 110.586192] ata2.00: status: { DRDY } [ 115.633487] ata2: link is slow to respond, please be patient (ready=0) [ 120.623465] ata2: device not ready (errno=-16), forcing hardreset [ 120.624545] ata2: soft resetting link [ 125.823550] ata2: link is slow to respond, please be patient (ready=0) [ 130.640159] ata2: SRST failed (errno=-16) [ 130.641283] ata2: soft resetting link [ 135.840245] ata2: link is slow to respond, please be patient (ready=0) [ 140.690183] ata2: SRST failed (errno=-16) [ 140.691178] ata2: soft resetting link [ 145.890241] ata2: link is slow to respond, please be patient (ready=0) [ 165.753379] mandb[208]: /usr/bin/mandb: warning: /usr/share/man/man3/gnutls_x509_trust_list_add_system_trust.3.gz: whatis parse for gnutls_x509_trust_list_add_system_trust(3) failed [ 175.697630] ata2: SRST failed (errno=-16) [ 175.698734] ata2: soft resetting link [ 180.727614] ata2: SRST failed (errno=-16) [ 180.763306] ata2: reset failed, giving up [ 180.766737] ata2.00: disabled [ 180.767882] ata2: EH complete [ 180.768853] sr 1:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 180.771402] sr 1:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00 [ 180.773281] blk_update_request: I/O error, dev sr0, sector 88040 [ 180.784906] sr 1:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [ 180.786796] sr 1:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00 [ 180.788529] blk_update_request: I/O error, dev sr0, sector 88040 [ 180.789697] Buffer I/O error on dev sr0, logical block 11005, async page read
Killing the hung AioMgr1-N thread yielded no new lines in .log file, tried Refresh. Attaching log next...
by , 7 years ago
Attachment: | arch3_1-2017-04-21-00-18-30.log added |
---|
hung AioMgr1-N with the IDE/iso steps
comment:2 by , 7 years ago
Found more equivalents for this line:
dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2
as the following:
dd if=/dev/sr0 of=/dev/null bs=1 skip=45075968 count=513 dd if=/dev/sr0 of=/dev/null bs=1 skip=45076479 count=2
no hang with:
dd if=/dev/sr0 of=/dev/null bs=1 skip=45076479 count=1
The above are with the default:
hdparm -a256 /dev/sr0
Here's one with -a16 (actually just tested, the following works the same with -a256 too! but the above do not work with -a16)
no hang with:
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=1 1+0 records in 1+0 records out 1 byte copied, 0.00255341 s, 0.4 kB/s [root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=1 1+0 records in 1+0 records out 1 byte copied, 0.00293639 s, 0.3 kB/s [root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=2 2+0 records in 2+0 records out 2 bytes copied, 0.00257371 s, 0.8 kB/s
hang with:
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 1 byte copied, 101.377 s, 0.0 kB/s
hangs AioMgr1-N with 99.9% cpu usage! even after dd finished
killed the AioMgr1-N PID via 'kill number' but no new lines appeared in log(unlike the second .log I've posted in this thread, the one where I kill it after 50mins)
Attaching Vbox.log next.
by , 7 years ago
Attachment: | arch3_1-2017-04-21-00-51-21.log added |
---|
hung with -a16 for a 2 byte dd of /dev/sr0
comment:3 by , 7 years ago
btw, the above 'no hang with' and 'hang with' commands are all in the same boot session, which makes me wonder how come guest linux didn't cache the bytes/readahead-sectors(?) from the previous commands, so that the last command would then not fail. But, most likely I don't know how caching works :) Maybe it only happens for the file system...
The guest OS kernel is/was: Linux vm 4.10.9-1-ARCH #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST 2017 x86_64 GNU/Linux
Updated (pacman -Syu) and kernel is now: Linux vm 4.10.10-1-ARCH #1 SMP PREEMPT Wed Apr 12 18:50:28 CEST 2017 x86_64 GNU/Linux
Can still reproduce the previous comment:
[root@vm ~]# uname -a Linux vm 4.10.10-1-ARCH #1 SMP PREEMPT Wed Apr 12 18:50:28 CEST 2017 x86_64 GNU/Linux [root@vm ~]# hdparm -a16 /dev/sr0 /dev/sr0: setting fs readahead to 16 readahead = 16 (on) [root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=1 1+0 records in 1+0 records out 1 byte copied, 0.00228378 s, 0.4 kB/s [root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=1 1+0 records in 1+0 records out 1 byte copied, 0.00230094 s, 0.4 kB/s [root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=2 2+0 records in 2+0 records out 2 bytes copied, 0.00265822 s, 0.8 kB/s [root@vm ~]# time dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 1 byte copied, 101.113 s, 0.0 kB/s real 1m41.123s user 0m0.003s sys 0m0.003s [root@vm ~]#
by , 7 years ago
Attachment: | arch3_1-2017-04-21-01-11-40.log added |
---|
powereoff guest(didn't finish) after the above without killing AioMgr1-N, new status lines in log; no new lines after killing it
comment:4 by , 7 years ago
I've just discovered that the reason why cloning the vdi didn't trigger this issue anymore is because I kept the cloned vdi in tmpfs (/tmp). By moving it back from /tmp to SSD and removing it from Virtual Media Manager(release then remove), then adding it again to the same VM then, only now 8 hours later, tried booting from 'existing OS' AioMgr0-N hangs again during boot. Then, I copied it back to /tmp (tmpfs) and removed from Virtual Media Manager(release then remove) then added the /tmp one to the same VM => no hang! So this makes me think that the location/storage of the vdi is what's making AioMgr hang. There are no bad sectors on my SSD, though.
I've also just tested exporting the VM, then importing it; this creates a .vmdk disk file, stored on my SSD(on host) and booting from 'existing OS' does not cause the AioMgr hang. I then removed the vmdk from the VM via UI, cloned it via VBoxManage clonehd --format VDI arch3_1-disk002.vmdk /tmp/arch3_1-disk002.vdi and then added the clone to the VM, no hang because vdi is in tmpfs; then removed it from Virtual Media Manager(release and remove), moved the vdi on the SSD, added it to the VM, boot from existing OS => bam, hang!
So this tells me, vdi (but not vmdk) location or storage is a major factor to hanging AioMgr: hangs if vdi is on my SSD but not if it's on /tmp tmpfs.
comment:5 by , 7 years ago
The above is true for the iso hangs too: the iso was on my SSD until now and just tried to hang it to make sure it still hangs and it does (via dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=2); then copied the iso to /tmp (tmpfs) then removed the old iso from the CD drive of the VM, browsed for the one in /tmp and added it to the VM as CD, booted 'existing OS'(that doesn't hang) then tried to hang with the same dd, no hang, tried a full-on dd (via dd if=/dev/sr0 of=/dev/null) and no hang!
So, the OP with reproduction is then true only if the vdi and/or iso (depending on which you're trying to hang) are located on my host's SSD, but not when they are located in /tmp (tmpfs)
comment:6 by , 7 years ago
Setting queue depth from here https://www.virtualbox.org/ticket/13022#comment:12 has no effect on this issue and I confirmed it by cat /sys/block/sda/device/queue_depth
returning '1' (even though I only used the VBoxManage command to set it: VBoxManage setextradata arch3 "VBoxInternal/Devices/ahci/0/Config/CmdSlotsAvail" 1) and then was able to hang AioMgr with the dd line from step 8 boot session 3 from OP: dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2
.
comment:7 by , 7 years ago
Description: | modified (diff) |
---|---|
Summary: | A reliable way to hang AioMgr0-N when 'fs readahead'(hdparm -a) is on (and is 256(KiB)) but only when 'Use Host I/O Cache' is disabled, for IDE(iso) and SATA(vdi) devices → A reliable way to hang AioMgr0-N under certain conditions |
comment:8 by , 7 years ago
Description: | modified (diff) |
---|
comment:9 by , 7 years ago
Description: | modified (diff) |
---|
comment:10 by , 7 years ago
Figured out how to attach gdb to the hung PID of AioMgr1-N, when hanging /dev/sr0, looks like this:
$ sudo gdb -p 7577 [sudo] password for xftroxgpx: GNU gdb (GDB) 7.12.1 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". Attaching to process 7577 Reading symbols from /usr/lib/virtualbox/VirtualBox...done. Reading symbols from /usr/lib/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Reading symbols from /usr/lib/libdl.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libc.so.6...(no debugging symbols found)...done. Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/VBoxRT.so...done. Reading symbols from /usr/lib/libcrypt.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libcurl.so.4...(no debugging symbols found)...done. Reading symbols from /usr/lib/libssl.so.1.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libcrypto.so.1.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libstdc++.so.6...done. Reading symbols from /usr/lib/libgcc_s.so.1...done. Reading symbols from /usr/lib/liblzma.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libm.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libssh2.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libpsl.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libgssapi_krb5.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libkrb5.so.3...(no debugging symbols found)...done. Reading symbols from /usr/lib/libk5crypto.so.3...(no debugging symbols found)...done. Reading symbols from /usr/lib/libcom_err.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libicuuc.so.58...(no debugging symbols found)...done. Reading symbols from /usr/lib/libicudata.so.58...(no debugging symbols found)...done. Reading symbols from /usr/lib/libkrb5support.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libkeyutils.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libresolv.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/VirtualBox.so...done. Reading symbols from /usr/lib/libXinerama.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libGL.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/VBoxKeyboard.so...done. Reading symbols from /usr/lib/libXext.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libX11.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/VBoxXPCOM.so...done. Reading symbols from /usr/lib/libQt5X11Extras.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libQt5Core.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libQt5Gui.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libQt5Widgets.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libQt5PrintSupport.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libQt5OpenGL.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libXau.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libXdmcp.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libGLX.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libGLdispatch.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libicui18n.so.58...(no debugging symbols found)...done. Reading symbols from /usr/lib/libsystemd.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libpcre16.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libglib-2.0.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libpng16.so.16...(no debugging symbols found)...done. Reading symbols from /usr/lib/libharfbuzz.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libcap.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/liblz4.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libgcrypt.so.20...(no debugging symbols found)...done. Reading symbols from /usr/lib/libgpg-error.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libpcre.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libfreetype.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libgraphite2.so.3...(no debugging symbols found)...done. Reading symbols from /usr/lib/libbz2.so.1.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/qt/plugins/platforms/libqxcb.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/libQt5XcbQpa.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-xinerama.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libQt5DBus.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/libX11-xcb.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libXi.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libSM.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libICE.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-xkb.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-render-util.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-render.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-sync.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-xfixes.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-randr.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-image.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-shm.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-keysyms.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-icccm.so.4...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-shape.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxkbcommon-x11.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxkbcommon.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libfontconfig.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libdbus-1.so.3...(no debugging symbols found)...done. Reading symbols from /usr/lib/libuuid.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-util.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libexpat.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libXcursor.so.1...done. Reading symbols from /usr/lib/libXrender.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libXfixes.so.3...(no debugging symbols found)...done. Reading symbols from /usr/lib/qt/plugins/xcbglintegrations/libqxcb-glx-integration.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-glx.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/qt/plugins/platforminputcontexts/libcomposeplatforminputcontextplugin.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/libGLX_mesa.so.0...done. Reading symbols from /usr/lib/libxcb-dri3.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-present.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxshmfence.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libglapi.so.0...done. Reading symbols from /usr/lib/libXdamage.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxcb-dri2.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/libXxf86vm.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libdrm.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/xorg/modules/dri/r600_dri.so...done. Reading symbols from /usr/lib/libdrm_nouveau.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libdrm_radeon.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libdrm_amdgpu.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libelf.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libLLVM-3.9.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/../lib/libffi.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/../lib/libedit.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/../lib/libncursesw.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libtxc_dxtn.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/components/VBoxXPCOMIPCC.so...done. Reading symbols from /usr/lib/libnss_compat.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libnsl.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libnss_nis.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libnss_files.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/components/VBoxDDU.so...done. Reading symbols from /usr/lib/virtualbox/components/VBoxREM.so...done. Reading symbols from /usr/lib/virtualbox/VBoxVMM.so...done. Reading symbols from /usr/lib/virtualbox/components/VBoxRT.so...done. Reading symbols from /usr/lib/virtualbox/components/VBoxVMM.so...done. Reading symbols from /usr/lib/virtualbox/components/VBoxXPCOM.so...done. Reading symbols from /usr/lib/virtualbox/components/VBoxC.so...done. Reading symbols from /usr/lib/libvpx.so.4...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/components/VBoxSVCM.so...done. Reading symbols from /usr/lib/qt/plugins/imageformats/libqgif.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/qt/plugins/imageformats/libqico.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/qt/plugins/imageformats/libqjpeg.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/libjpeg.so.8...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/VBoxSharedClipboard.so...done. Reading symbols from /usr/lib/libXt.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/virtualbox/VBoxDragAndDropSvc.so...done. Reading symbols from /usr/lib/virtualbox/VBoxGuestPropSvc.so...done. Reading symbols from /usr/lib/virtualbox/VBoxGuestControlSvc.so...done. Reading symbols from /usr/lib/virtualbox/VBoxDD.so...done. Reading symbols from /usr/lib/virtualbox/VBoxDD2.so...done. Reading symbols from /usr/lib/virtualbox/VBoxSharedFolders.so...done. 0x00007f71d3414889 in syscall () from /usr/lib/libc.so.6 (gdb) set pagination off (gdb) thread apply all bt full Thread 1 (Thread 0x7f71709fa700 (LWP 7577)): #0 0x00007f71d3414889 in syscall () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007f71d304e147 in rtFileAsyncIoLinuxSubmit (pcSubmitted=<synthetic pointer>, ppIoCB=0x7f71709f9d60, cReqs=1, AioContext=<optimized out>) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/r3/linux/fileaio-linux.cpp:261 rc = <optimized out> #2 RTFileAioCtxSubmit (hAioCtx=0x7f71742fd160, pahReqs=pahReqs@entry=0x7f71709f9d60, cReqs=cReqs@entry=1) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/r3/linux/fileaio-linux.cpp:607 cReqsSubmitted = 0 rc = <optimized out> pCtxInt = 0x7f71742fd160 i = <optimized out> pReqInt = <optimized out> #3 0x00007f71b64485c0 in pdmacFileAioMgrNormalReqsEnqueue (cReqs=1, pahReqs=0x7f71709f9d60, pEndpoint=0x7f7174300830, pAioMgr=0x7f7174300c40) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/VMM/VMMR3/PDMAsyncCompletionFileNormal.cpp:557 rc = <optimized out> #4 pdmacFileAioMgrNormalReqCompleteRc (cbTransfered=4096, rcReq=0, hReq=<optimized out>, pAioMgr=0x7f7174300c40) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/VMM/VMMR3/PDMAsyncCompletionFileNormal.cpp:1509 offStart = <optimized out> cbToTransfer = <optimized out> pbBuf = <optimized out> rc = <optimized out> pEndpoint = <optimized out> pTask = <optimized out> #5 pdmacFileAioMgrNormalReqComplete (hReq=<optimized out>, pAioMgr=0x7f7174300c40) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/VMM/VMMR3/PDMAsyncCompletionFileNormal.cpp:1342 cbTransfered = 4096 rcReq = 0 #6 pdmacFileAioMgrNormal (hThreadSelf=<optimized out>, pvUser=0x7f7174300c40) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/VMM/VMMR3/PDMAsyncCompletionFileNormal.cpp:1674 i = 0 apReqs = {0x7f71480008c0, 0xffffffff, 0x7f71709f9e00, 0x1ba1250, 0x7f71709fa700, 0x7f7174301180, 0x7f71709f9e00, 0x7f71d304ae3c <RTSemRWRequestWriteNoResume(RTSEMRW, RTMSINTERVAL)+156>, 0x300000000, 0x396b675ed5f9600, 0x7f71709fa700, 0x1ba1250, 0x7f71709f9e00, 0x7f71d304b1ce <RTSemRWReleaseWrite(RTSEMRW)+142>, 0x74301a60, 0xffffffffffffffff, 0x7f71a4909348, 0x7f7174301180, 0x7f71709f9e40, 0x7f71d2fb349f <rtThreadInsert(PRTTHREADINT, RTNATIVETHREAD)+223>} cReqsCompleted = 1 cReqsWait = <optimized out> uMillisCurr = <optimized out> rc = <optimized out> pAioMgr = 0x7f7174300c40 uMillisEnd = <optimized out> __PRETTY_FUNCTION__ = "int pdmacFileAioMgrNormal(RTTHREAD, void*)" #7 0x00007f71d2fb40f0 in rtThreadMain (pThread=0x7f7174301180, NativeThread=<optimized out>, pszThreadName=<optimized out>) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/common/misc/thread.cpp:717 rc = <optimized out> #8 0x00007f71d304b94b in rtThreadNativeMain (pvArgs=0x7f7174301180) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/r3/posix/thread-posix.cpp:327 pThread = 0x7f7174301180 Self = 140125197543168 rc = <optimized out> __PRETTY_FUNCTION__ = "void* rtThreadNativeMain(void*)" #9 0x00007f71d38dc2e7 in start_thread () from /usr/lib/libpthread.so.0 No symbol table info available. #10 0x00007f71d341954f in clone () from /usr/lib/libc.so.6 No symbol table info available. (gdb)
$ pacman -Qo /usr/lib/libc.so.6 /usr/lib/libc.so.6 is owned by glibc 2.25-1
(gdb) continue Continuing.
After a few seconds of AioMgr1-N still hanging using CPU, I then did a: kill -2 7577 from another terminal.
Program received signal SIGINT, Interrupt. 0x00007f71d3414889 in syscall () from /usr/lib/libc.so.6 (gdb) thread apply all bt full Thread 1 (Thread 0x7f71709fa700 (LWP 7577)): #0 0x00007f71d3414889 in syscall () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007f71d304e44d in rtFileAsyncIoLinuxGetEvents (pTimeout=0x0, paIoResults=0x7f71709f94c0, cReqs=1, cReqsMin=1, AioContext=<optimized out>) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/r3/linux/fileaio-linux.cpp:289 rc = <optimized out> #2 RTFileAioCtxWait (hAioCtx=0x7f71742fd160, cMinReqs=cMinReqs@entry=1, cMillies=<optimized out>, cMillies@entry=4294967295, pahReqs=pahReqs@entry=0x7f71709f9d70, cReqs=1, pcReqs=pcReqs@entry=0x7f71709f9d50) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/r3/linux/fileaio-linux.cpp:709 aPortEvents = {{pvUser = 0x7f7144000950, pIoCB = 0x7f71480008c0, rc = 4096, rc2 = 0}, {pvUser = 0x0, pIoCB = 0x0, rc = 0, rc2 = 0} <repeats 21 times>, {pvUser = 0x0, pIoCB = 0x0, rc = 0, rc2 = 140126859542208}, {pvUser = 0x0, pIoCB = 0x7f0, rc = 28848304, rc2 = 140126846543672}, {pvUser = 0x7f71d2eacb48, pIoCB = 0x7f71d3afc7d3 <do_lookup_x+2019>, rc = 2032, rc2 = 140126846634824}, {pvUser = 0x1b830b0, pIoCB = 0x7f71709f9878, rc = 140125197539444, rc2 = 401}, {pvUser = 0x7f71b625f4b2, pIoCB = 0x7f71b624d770, rc = 140125197539448, rc2 = 3134804711}, {pvUser = 0x2eb654b, pIoCB = 0x7f71d2eacb48, rc = 140125197539664, rc2 = 140126846543672}, {pvUser = 0x7f71709f9874, pIoCB = 0x7f71709f9940, rc = 30333904, rc2 = 83}, {pvUser = 0x709f98c8, pIoCB = 0x68b9a337, rc = 27453068, rc2 = 0}, {pvUser = 0x7f71709f99a0, pIoCB = 0x7f7180002070, rc = 140125197539808, rc2 = 0}, {pvUser = 0x5, pIoCB = 0x0, rc = 140125197539848, rc2 = 140126859545825}, {pvUser = 0x5, pIoCB = 0x0, rc = 5, rc2 = 0}, {pvUser = 0x1, pIoCB = 0x1e8f570, rc = 5, rc2 = 0}, {pvUser = 0x0, pIoCB = 0x7f7180002070, rc = 140125197539664, rc2 = 4294967296}, {pvUser = 0x1e8f570, pIoCB = 0x7f71709f9940, rc = 140126363972786, rc2 = 140125455523472}, {pvUser = 0xffffffff, pIoCB = 0x7f71d3afc7d3 <do_lookup_x+2019>, rc = 140126846592440, rc2 = 140126859542208}, {pvUser = 0x7f71709f99a0, pIoCB = 0x216, rc = 28848304, rc2 = 140126846543672}, {pvUser = 0x7f71d2eacb48, pIoCB = 0x7f71d3afc7d3 <do_lookup_x+2019>, rc = 534, rc2 = 140126846634824}, {pvUser = 0x1b830b0, pIoCB = 0x7f71709f9a38, rc = 140125197539892, rc2 = 140126369477656}, {pvUser = 0x7f71b625f46e, pIoCB = 0x7f71b624b100, rc = 140125197539896, rc2 = 1236669803}, {pvUser = 0x126d855, pIoCB = 0x7f71d2eacb48, rc = 140125197540112, rc2 = 140126846543672}, {pvUser = 0x7f71709f9a34, pIoCB = 0x7f71709f9b00, rc = 30333904, rc2 = 83}, {pvUser = 0x0, pIoCB = 0x0, rc = 0, rc2 = 0}, {pvUser = 0x0, pIoCB = 0x7f7180002070, rc = 140125197540256, rc2 = 0}, {pvUser = 0x5, pIoCB = 0x0, rc = 140125197540296, rc2 = 140126859545825}, {pvUser = 0x5, pIoCB = 0x0, rc = 5, rc2 = 0}, {pvUser = 0x1, pIoCB = 0x1e8f570, rc = 0, rc2 = -256}, {pvUser = 0x0, pIoCB = 0x7f7180002070, rc = 140125197540112, rc2 = 6907273216}, {pvUser = 0x1e8f570, pIoCB = 0x7f71709f9b00, rc = 140126363972718, rc2 = 140125455523472}, {pvUser = 0xffffffff, pIoCB = 0x7f71d304e147 <RTFileAioCtxSubmit(RTFILEAIOCTX, PRTFILEAIOREQ, size_t)+359>, rc = 140126846556488, rc2 = 140125197540336}, {pvUser = 0x7f7174300c40, pIoCB = 0x1, rc = 140125197540224, rc2 = 140126365969813}, {pvUser = 0x7f71709f9b80, pIoCB = 0x7f71d304da3f <RTFileAioReqPrepareRead(RTFILEAIOREQ, RTFILE, RTFOFF, void*, size_t, void*)+95>, rc = 140124515993792, rc2 = 0}, {pvUser = 0x7f7144000950, pIoCB = 0x7f7174300830, rc = 1, rc2 = 140125257337920}, {pvUser = 0x7f71709f9cc0, pIoCB = 0x7f71b64473a0 <pdmacFileAioMgrNormalProcessTaskList(PPDMACTASKFILE, PPDMACEPFILEMGR, PPDMASYNCCOMPLETIONENDPOINTFILE)+1088>, rc = 2048, rc2 = 140125257337920}, {pvUser = 0x7f71709f9e40, pIoCB = 0x0, rc = 140124515993792, rc2 = 140126857340673}, {pvUser = 0x7f71709f9be8, pIoCB = 0x4000, rc = 140125197540336, rc2 = 140125197540324}, {pvUser = 0x80, pIoCB = 0x7f71480008c0, rc = 140124515993792, rc2 = 0}, {pvUser = 0x7f7174300d40, pIoCB = 0x0, rc = 140126857331153, rc2 = 140125197540400}, {pvUser = 0x0, pIoCB = 0x0, rc = 152, rc2 = 140125257338128}, {pvUser = 0x7f7174300d40, pIoCB = 0x7f7100000000, rc = 74, rc2 = 258594646230275584}, {pvUser = 0x4000206e0, pIoCB = 0x7f7174300d10, rc = 140125197540592, rc2 = 140125257339264}, {pvUser = 0x7f7174300d40, pIoCB = 0x7f71709f9ca8, rc = 140125197540518, rc2 = 0}, {pvUser = 0x7f7174300c40, pIoCB = 0x0, rc = 140125197540592, rc2 = 1}} cRequestsToWait = 1 cDone = <optimized out> pCtxInt = 0x7f71742fd160 pTimeout = 0x0 Timeout = {tv_sec = 0, tv_nsec = 0} StartNanoTS = 0 rc = 0 cRequestsCompleted = 0 #3 0x00007f71b6448240 in pdmacFileAioMgrNormal (hThreadSelf=<optimized out>, pvUser=0x7f7174300c40) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/VMM/VMMR3/PDMAsyncCompletionFileNormal.cpp:1664 apReqs = {0x7f71480008c0, 0xffffffff, 0x7f71709f9e00, 0x1ba1250, 0x7f71709fa700, 0x7f7174301180, 0x7f71709f9e00, 0x7f71d304ae3c <RTSemRWRequestWriteNoResume(RTSEMRW, RTMSINTERVAL)+156>, 0x300000000, 0x396b675ed5f9600, 0x7f71709fa700, 0x1ba1250, 0x7f71709f9e00, 0x7f71d304b1ce <RTSemRWReleaseWrite(RTSEMRW)+142>, 0x74301a60, 0xffffffffffffffff, 0x7f71a4909348, 0x7f7174301180, 0x7f71709f9e40, 0x7f71d2fb349f <rtThreadInsert(PRTTHREADINT, RTNATIVETHREAD)+223>} cReqsCompleted = 0 cReqsWait = <optimized out> uMillisCurr = <optimized out> rc = <optimized out> pAioMgr = 0x7f7174300c40 uMillisEnd = 14936863 __PRETTY_FUNCTION__ = "int pdmacFileAioMgrNormal(RTTHREAD, void*)" #4 0x00007f71d2fb40f0 in rtThreadMain (pThread=0x7f7174301180, NativeThread=<optimized out>, pszThreadName=<optimized out>) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/common/misc/thread.cpp:717 rc = <optimized out> #5 0x00007f71d304b94b in rtThreadNativeMain (pvArgs=0x7f7174301180) at /home/xftroxgpx/build/1packages/virtualbox/makepkg_pacman/virtualbox/src/VirtualBox-5.1.20/src/VBox/Runtime/r3/posix/thread-posix.cpp:327 pThread = 0x7f7174301180 Self = 140125197543168 rc = <optimized out> __PRETTY_FUNCTION__ = "void* rtThreadNativeMain(void*)" #6 0x00007f71d38dc2e7 in start_thread () from /usr/lib/libpthread.so.0 No symbol table info available. #7 0x00007f71d341954f in clone () from /usr/lib/libc.so.6 No symbol table info available. (gdb)
Not sure how helpful this is...
comment:11 by , 7 years ago
So far, found out it's stuck in this 'while':
00:00:03.378654 inside pdmacFileAioMgrNormal // ^ this only happens once ...<snip>... 00:00:55.731723 before while 00:00:55.731731 Waiting for 1 of 1 tasks to complete 00:00:55.732040 1 tasks completed // ^ those 2 lines repeat ...<snip>... 00:01:01.936480 Waiting for 1 of 1 tasks to complete 00:01:01.936723 1 tasks completed 00:01:01.936728 32768 messages from group PDM_ASYNC_COMPLETION (#339), muting it.
The .patch to show the above output in VBox.log will be added next.
VirtualBox was started like this:
export VBOX_LOG="-all+dev_ide.e.l.f+rt_file.e.l.f+pdm_async_completion.e.l.f" export VBOX_RELEASE_LOG="-all+dev_ide.e.l.f+rt_file.e.l.f+pdm_async_completion.e.l.f" /usr/bin/VirtualBox
Frankly, I don't know how to show those LogFlow() so I replaced them with LogRel() :D
More to be found, stay tuned xD
by , 7 years ago
The .patch to show the output from comment#11 in VBox.log
comment:12 by , 7 years ago
So far, the hang happens because 'Restarting incomplete transfer' which in turn happens because 12288 or '4096 bytes transferred)' is less than 'pTask->DataSeg.cbSeg=16384'. Why is cbSeg 16384? no idea, yet.
00:01:11.964411 before while 1 00:01:11.964421 0)reqsactive=1 00:01:11.964426 Waiting for 1 of 1 tasks to complete 00:01:11.964744 1 tasks completed 00:01:11.964791 1)reqsactive=1 00:01:11.964797 1_1)cbTransfered=4096 rcReq=0 hReq=00007f17ec0008c0 00:01:11.964805 pAioMgr=0x007f181c413a40 hReq=0x007f17ec0008c0 failure=0 00:01:11.964818 Task=0x007f17e80009b0 completed with VINF_SUCCESS 00:01:11.964884 2)reqsactive=0 00:01:11.964889 3)reqsactive=0 00:01:11.964894 4)reqsactive=0 00:01:11.964900 5)reqsactive=0 00:01:11.964905 after while 00:01:11.966153 before while 1 00:01:11.966164 0)reqsactive=1 00:01:11.966169 Waiting for 1 of 1 tasks to complete 00:01:11.966640 1 tasks completed 00:01:11.966674 1)reqsactive=1 00:01:11.966695 1_1)cbTransfered=12288 rcReq=0 hReq=00007f17ec0008c0 00:01:11.966716 pAioMgr=0x007f181c413a40 hReq=0x007f17ec0008c0 failure=0 00:01:11.966726 Restarting incomplete transfer 0x007f17e80009b0 (12288 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:11.966770 2)reqsactive=1 00:01:11.966776 3)reqsactive=1 00:01:11.966781 4)reqsactive=1 00:01:11.966787 5)reqsactive=1 00:01:11.966792 0)reqsactive=1 00:01:11.966796 Waiting for 1 of 1 tasks to complete 00:01:11.966803 1 tasks completed 00:01:11.966807 1)reqsactive=1 00:01:11.966812 1_1)cbTransfered=4096 rcReq=0 hReq=00007f17ec0008c0 00:01:11.966819 pAioMgr=0x007f181c413a40 hReq=0x007f17ec0008c0 failure=0 00:01:11.966825 Restarting incomplete transfer 0x007f17e80009b0 (4096 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:11.966886 2)reqsactive=1 00:01:11.966897 3)reqsactive=1 00:01:11.966903 4)reqsactive=1 00:01:11.966907 5)reqsactive=1 00:01:11.966912 0)reqsactive=1 00:01:11.966917 Waiting for 1 of 1 tasks to complete 00:01:11.967295 1 tasks completed 00:01:11.967316 1)reqsactive=1 00:01:11.967333 1_1)cbTransfered=8192 rcReq=0 hReq=00007f17ec0008c0 00:01:11.967355 pAioMgr=0x007f181c413a40 hReq=0x007f17ec0008c0 failure=0 00:01:11.967380 Restarting incomplete transfer 0x007f17e80009b0 (8192 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:11.967457 2)reqsactive=1 00:01:11.967562 3)reqsactive=1 00:01:11.967567 4)reqsactive=1 00:01:11.967572 5)reqsactive=1 00:01:11.967576 0)reqsactive=1 00:01:11.967581 Waiting for 1 of 1 tasks to complete 00:01:11.967708 1 tasks completed
comment:13 by , 7 years ago
All I know is that pTask->DataSeg.cbSeg=16384
gets set to 16384 inside drvvdAsyncIOReadSync of file src/VBox/Devices/Storage/DrvVD.cpp on the line
DataSeg.cbSeg = cbRead;
But I've no idea how to dump a stacktrace(ok, I tried but stdout/stderr won't show anywhere), since setting a breakpoint with gdb causes the thread to terminate via
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
So, I've no idea why it gets set to 16384, why cbRead would be that, but maybe I'm looking at it wrongly, and it's the cbTransfered=12288 (and later 4096) that should be 16384 ?
comment:14 by , 7 years ago
Ok, I think I get it: it tries to read 16384 bytes(aka pTask->DataSeg.cbSeg), but only reads 12288(aka cbTransfered) and then schedules the remaining(4096 bytes) to be read next, but without changing that pTask->DataSeg.cbSeg of 16384 to 4096 ... which is why it keeps restarting, and next it tries to read and gets only 8192 then 4096 and these two keep repeating...
00:01:12.547260 before while 1 00:01:12.547324 0)reqsactive=1 00:01:12.547329 Waiting for 1 of 1 tasks to complete 00:01:12.547768 1 tasks completed 00:01:12.547825 1)reqsactive=1 00:01:12.547831 1_1)cbTransfered=4096 rcReq=0 hReq=00007f93140008c0 00:01:12.547839 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=4096 00:01:12.547852 Task=0x007f93100009b0 completed with VINF_SUCCESS 00:01:12.547904 2)reqsactive=0 00:01:12.547909 3)reqsactive=0 00:01:12.547913 4)reqsactive=0 00:01:12.547919 5)reqsactive=0 00:01:12.547923 after while 00:01:12.548726 !!!!3! DataSeg.cbSeg = 16384 00:01:12.549003 before while 1 00:01:12.549014 0)reqsactive=1 00:01:12.549019 Waiting for 1 of 1 tasks to complete 00:01:12.549599 1 tasks completed 00:01:12.549650 1)reqsactive=1 00:01:12.549656 1_1)cbTransfered=12288 rcReq=0 hReq=00007f93140008c0 00:01:12.549664 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.549672 Restarting incomplete transfer 0x007f93100009b0 (12288 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.549680 Here2 00:01:12.549684 Here3 00:01:12.549721 2)reqsactive=1 00:01:12.549726 3)reqsactive=1 00:01:12.549731 4)reqsactive=1 00:01:12.549736 5)reqsactive=1 00:01:12.549741 0)reqsactive=1 00:01:12.549745 Waiting for 1 of 1 tasks to complete 00:01:12.549751 1 tasks completed 00:01:12.549756 1)reqsactive=1 00:01:12.549760 1_1)cbTransfered=4096 rcReq=0 hReq=00007f93140008c0 00:01:12.549766 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.549773 Restarting incomplete transfer 0x007f93100009b0 (4096 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.549779 Here2 00:01:12.549783 Here3 00:01:12.549849 2)reqsactive=1 00:01:12.549856 3)reqsactive=1 00:01:12.549860 4)reqsactive=1 00:01:12.549865 5)reqsactive=1 00:01:12.549869 0)reqsactive=1 00:01:12.549873 Waiting for 1 of 1 tasks to complete 00:01:12.550457 1 tasks completed 00:01:12.550525 1)reqsactive=1 00:01:12.550531 1_1)cbTransfered=8192 rcReq=0 hReq=00007f93140008c0 00:01:12.550539 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.550547 Restarting incomplete transfer 0x007f93100009b0 (8192 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.550554 Here2 00:01:12.550559 Here3 00:01:12.550646 2)reqsactive=1 00:01:12.550653 3)reqsactive=1 00:01:12.550658 4)reqsactive=1 00:01:12.550663 5)reqsactive=1 00:01:12.550668 0)reqsactive=1 00:01:12.550672 Waiting for 1 of 1 tasks to complete 00:01:12.551186 1 tasks completed 00:01:12.551238 1)reqsactive=1 00:01:12.551244 1_1)cbTransfered=4096 rcReq=0 hReq=00007f93140008c0 00:01:12.551252 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.551260 Restarting incomplete transfer 0x007f93100009b0 (4096 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.551267 Here2 00:01:12.551271 Here3 00:01:12.551470 2)reqsactive=1 00:01:12.551477 3)reqsactive=1 00:01:12.551482 4)reqsactive=1 00:01:12.551487 5)reqsactive=1 00:01:12.551492 0)reqsactive=1 00:01:12.551496 Waiting for 1 of 1 tasks to complete 00:01:12.551918 1 tasks completed 00:01:12.551965 1)reqsactive=1 00:01:12.551971 1_1)cbTransfered=8192 rcReq=0 hReq=00007f93140008c0 00:01:12.551979 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.551987 Restarting incomplete transfer 0x007f93100009b0 (8192 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.551994 Here2 00:01:12.551998 Here3 00:01:12.552084 2)reqsactive=1 00:01:12.552090 3)reqsactive=1 00:01:12.552095 4)reqsactive=1 00:01:12.552100 5)reqsactive=1 00:01:12.552105 0)reqsactive=1 00:01:12.552109 Waiting for 1 of 1 tasks to complete 00:01:12.552689 1 tasks completed 00:01:12.552737 1)reqsactive=1 00:01:12.552743 1_1)cbTransfered=4096 rcReq=0 hReq=00007f93140008c0 00:01:12.552751 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.552759 Restarting incomplete transfer 0x007f93100009b0 (4096 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.552781 Here2 00:01:12.552787 Here3 00:01:12.552875 2)reqsactive=1 00:01:12.552882 3)reqsactive=1 00:01:12.552887 4)reqsactive=1 00:01:12.552893 5)reqsactive=1 00:01:12.552897 0)reqsactive=1 00:01:12.552902 Waiting for 1 of 1 tasks to complete 00:01:12.553198 1 tasks completed 00:01:12.553233 1)reqsactive=1 00:01:12.553252 1_1)cbTransfered=8192 rcReq=0 hReq=00007f93140008c0 00:01:12.553348 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.553357 Restarting incomplete transfer 0x007f93100009b0 (8192 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.553364 Here2 00:01:12.553369 Here3 00:01:12.553457 2)reqsactive=1 00:01:12.553465 3)reqsactive=1 00:01:12.553470 4)reqsactive=1 00:01:12.553475 5)reqsactive=1 00:01:12.553480 0)reqsactive=1 00:01:12.553484 Waiting for 1 of 1 tasks to complete 00:01:12.553769 1 tasks completed 00:01:12.553774 1)reqsactive=1 00:01:12.553779 1_1)cbTransfered=4096 rcReq=0 hReq=00007f93140008c0 00:01:12.553785 pAioMgr=0x007f9344413a40 hReq=0x007f93140008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:01:12.553792 Restarting incomplete transfer 0x007f93100009b0 (4096 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:01:12.553798 Here2 00:01:12.553802 Here3 00:01:12.553827 2)reqsactive=1 00:01:12.553833 3)reqsactive=1 00:01:12.553837 4)reqsactive=1 00:01:12.553842 5)reqsactive=1 00:01:12.553846 0)reqsactive=1 00:01:12.553850 Waiting for 1 of 1 tasks to complete 00:01:12.554133 1 tasks completed
comment:15 by , 7 years ago
Alright well, I can make AioMgr not hang anymore(patch included next), but the transfer still fails the same inside guest OS.
The question is, why does the 16KiB read fail by reading only 12K and then the next 4K gets scheduled to be read, and well, what happens after that? is that properly merged together?(I don't even know what I'm doing here, heh).
Another thing is: why does it say 'Restarting transfer' if it continues it instead ?
00:00:49.794916 before while 1 00:00:49.794927 0)reqsactive=1 00:00:49.794932 Waiting for 1 of 1 tasks to complete 00:00:49.795475 1 tasks completed 00:00:49.795532 1)reqsactive=1 00:00:49.795538 1_1)cbTransfered=4096 rcReq=0 hReq=00007fb5540008c0 00:00:49.795546 pAioMgr=0x007fb580413a40 hReq=0x007fb5540008c0 failure=0 pTask->DataSeg.cbSeg=4096 00:00:49.795575 Task=0x007fb5500009b0 completed with VINF_SUCCESS 00:00:49.795673 2)reqsactive=0 00:00:49.795722 3)reqsactive=0 00:00:49.795736 4)reqsactive=0 00:00:49.795815 5)reqsactive=0 00:00:49.795847 after while 00:00:49.800052 !!!!3! DataSeg.cbSeg = 16384 00:00:49.800511 before while 1 00:00:49.800522 0)reqsactive=1 00:00:49.800527 Waiting for 1 of 1 tasks to complete 00:00:49.801160 1 tasks completed 00:00:49.801207 1)reqsactive=1 00:00:49.801215 1_1)cbTransfered=12288 rcReq=0 hReq=00007fb5540008c0 00:00:49.801225 pAioMgr=0x007fb580413a40 hReq=0x007fb5540008c0 failure=0 pTask->DataSeg.cbSeg=16384 00:00:49.801235 Restarting incomplete transfer 0x007fb5500009b0 (12288 bytes transferred) pTask->DataSeg.cbSeg=16384 pTask->cbBounceBuffer=0 00:00:49.801242 Here2 00:00:49.801247 Here3 00:00:49.801292 2)reqsactive=1 00:00:49.801298 3)reqsactive=1 00:00:49.801303 4)reqsactive=1 00:00:49.801309 5)reqsactive=1 00:00:49.801314 0)reqsactive=1 00:00:49.801320 Waiting for 1 of 1 tasks to complete 00:00:49.801328 1 tasks completed 00:00:49.801332 1)reqsactive=1 00:00:49.801337 1_1)cbTransfered=4096 rcReq=0 hReq=00007fb5540008c0 00:00:49.801343 pAioMgr=0x007fb580413a40 hReq=0x007fb5540008c0 failure=0 pTask->DataSeg.cbSeg=4096 00:00:49.801356 Task=0x007fb5500009b0 completed with VINF_SUCCESS 00:00:49.801385 2)reqsactive=0 00:00:49.801390 3)reqsactive=0 00:00:49.801394 4)reqsactive=0 00:00:49.801399 5)reqsactive=0 00:00:49.801404 after while 00:00:53.180166 !!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 49152!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 49152!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 12288!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 12288!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 69632!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 12288!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 16384!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 16384!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 32768!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 24576!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 24576!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 16384!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 12288PIIX3 ATA: Ctl#1: RESET, DevSel=0 AIOIf=0 CmdIf0=0xa0 (41170068 usec ago) CmdIf1=0x00 (-1 usec ago) 00:01:32.142770 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:33.276371 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:34.409715 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:35.542929 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:36.676460 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:37.809827 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:38.942610 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:40.075851 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:40.986261 PIIX3 ATA: Ctl#1: RESET, DevSel=0 AIOIf=0 CmdIf0=0xa0 (51189250 usec ago) CmdIf1=0x00 (-1 usec ago) 00:01:41.255947 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:42.389855 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:43.522580 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:44.656404 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:45.789990 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:46.922614 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:48.055937 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:49.189393 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:50.323394 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:51.006639 PIIX3 ATA: Ctl#1: RESET, DevSel=0 AIOIf=0 CmdIf0=0xa0 (61209628 usec ago) CmdIf1=0x00 (-1 usec ago) 00:01:51.503214 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:52.636391 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:53.769538 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:54.902948 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:56.036080 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:57.169764 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:58.303267 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:59.436424 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:01:59.737028 !!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 4096!!!!5! pAhciReq->u.Io.DataSeg.cbSeg = 8192PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:01.703318 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:02.836052 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:03.969964 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:05.103007 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:06.236351 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:07.369245 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:08.502770 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:09.635958 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:10.769690 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:11.903395 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:13.036394 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:14.169681 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:15.302546 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting 00:02:16.436566 PIIX3 ATA LUN#2: Async I/O thread probably stuck in operation, interrupting
# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2 dd: error reading '/dev/sr0': Input/output error 1+0 records in 1+0 records out 512 bytes copied, 101.355 s, 0.0 kB/s
by , 7 years ago
Attachment: | le.2.patch added |
---|
does not hang AioMgr1-N anymore(see line with comment 'fix?')
comment:16 by , 7 years ago
I'll leave it to the experts, they can probably figure this all out in 20 minutes or less :)
Cheers!
by , 7 years ago
Attachment: | pdm_async_completion_incomplete_req.patch added |
---|
comment:17 by , 7 years ago
Thanks a lot for the very detailed report and patch you provided (doesn't happen very often to get such a detailed description from a reporter)! Your patch points in the right direction but is not complete as it covers only one case so far and doesn't take into account when a request might be restarted multiple times to complete the transfer. I attached a patch which should cover all cases, would be great if you could try it locally. We can then incorporate it into the next maintenance release. If you can't test the patch we can also provide you with a test build.
comment:18 by , 7 years ago
Hey, thanks so much @aeichner! Patch works!
I have tested the patch against two VMs(with vdi`s on my SSD) with the steps from OP and with dd-ing the entire devices(sr0 and sda) and everything is working as expected. No more hangs or errors. All good here!
Really glad this is no longer an issue. Much appreciated, @aeichner! All the best to you! Cheers.
PS: I might remove the hosted VM file downloads from OP at some point, since they are no longer needed.
Hooray! #mehappy :))
by , 7 years ago
Attachment: | arch3_1-2017-04-22-09-23-31.log added |
---|
one VM's log after the perfection patch
by , 7 years ago
Attachment: | arch3-2017-04-22-09-23-31.log added |
---|
another VM's log after the perfection patch
comment:19 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fix is part of the 5.1.22 release. Thanks again for the detailed analysis!
When booted from iso(/dev/sr0) and hanged HDD(/dev/sda) with dd 42215 count=2 (see step 8)