VirtualBox

Ticket #16677 (new defect) — at Version 7

Opened 2 years ago

Last modified 2 years ago

A reliable way to hang AioMgr0-N under certain conditions

Reported by: xftroxgpx Owned by:
Component: virtual disk Version: VirtualBox 5.1.18
Keywords: Cc:
Guest type: Linux Host type: Linux

Description (last modified by frank) (diff)

A reliable way to hang AioMgr0-N when 'fs readahead'(hdparm -a) is on (and is 256(KiB)) but only when 'Use Host I/O Cache' is disabled in virtualbox 5.1.18 r114002 (currently latest available) on Arch Linux 64bit, host OS. Basically, reading from guest SATA HDD and/or IDE CD will hang AioMgr0-N(until killed) on host OS and guest OS will fail to read/boot etc.

This seems to be virtualbox settings-agnostic, except for the 'Use Host I/O Cache' which has to be disabled for this to occurr AND guest OS readahead has to be enabled (usually set to 256KiB; able to repro with 16KiB too, with different dd 'skip' settings). So, this doesn't ocurr when hdparm -a0 /dev/sda was executed, or any value between 0 and 7, inclusive, (keeps readahead 'off')

Note:

hdparm -a0 /dev/sda

seems to do the same as

# echo 0 > /sys/class/block/sda/queue/read_ahead_kb

Steps to reproduce this issue on a guest HDD/vdi (as opposed to a guest CD/IDE/iso which is in the 'next section'):

  1. get archlinux-2017.03.01-dual.iso (or any current/latest iso, BUT then the dd repro will most likely not work at the same 'skip'!)

However according to this comment from frank https://www.virtualbox.org/ticket/15281#comment:8 , I shouldn't be posting any such links here, but then reproducing this(with accuracy) is going to be really hard without them, so I'm going to be obscuring them with base64, hopefully that's a good compromise between not posting them and posting them...

Pick a download location(476.0 MB):

archlinux webpage with torrent/magnet links, located here:

aHR0cHM6Ly93d3cuYXJjaGxpbnV4Lm9yZy9yZWxlbmcvcmVsZWFzZXMvMjAxNy4wMy4wMS8=

(my)google drive, directory with the iso file:

aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL29wZW4/aWQ9MEI2UC05VXlKRmp6S1ZsWkhUa1E0YUZWbGRUUQ==

sha1sum for that iso is:

f426866ca632a35a3eeae8e4080cff25ec8da614  archlinux-2017.03.01-dual.iso
  1. get my VM

the reproduction steps only work with my VM with that specific vdi file; if that vdi gets cloned(even if to same format: vdi) it will not reproduce anymore! The IDE CD repro. steps will still work and make it hang reliably though (see 'next section')

Pick one download location(1.9G):

dropbox:

aHR0cHM6Ly93d3cuZHJvcGJveC5jb20vc2gvaHFiMGIwNDB4aTYyMDN2L0FBRE50RVF5Z0tGU1duNnpRaDNZS1VVcmE/ZGw9MA==

(my)google drive:

aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL2RyaXZlL2ZvbGRlcnMvMEI2UC05VXlKRmp6S1Rrc3lZM0pZWlhSbU1Xcz91c3A9c2hhcmluZw==

download these 3 files: arch3.vbox arch3.vdi checksums

md5,sha1,sha256 and sha512 for those two are:

1aa8124771fd7c8761435ff26ee2d55b  ./arch3.vbox
f15c97a939cda4c6bdd413fb52a9519d  ./arch3.vdi
9c1a10c0ab0eb945b677faee22ef9ab2d1ba8511  ./arch3.vbox
a9ed9c0c3542b9d174a4b3fa2b89d1fd72f249f3  ./arch3.vdi
7ac7315a292311f19f59af86bbba687885813d8da7a3f263a0bb2031369fe1f1  ./arch3.vbox
d6141ff8690d1c114fd2e8067107473e456a8b30e196fac17d601f1da564f676  ./arch3.vdi
57742f80911f72186fe6de168e310da84e9d6f39c5683b3f7669abfdf57708f1965b923d5b7462d4af82dc75ddd83ea5cbbcef27330b833f8dabe7da7e5a0d9d  ./arch3.vbox
1e752ad68de113d41999591d105798acaff130716ec81f88325fc9474e160e761eef305fd87885bdf500f1e400e3ec31f32c43c4b5a924304f4027133611e880  ./arch3.vdi
  1. TODO: "import" the .vbox somehow in the VirtualBox UI! (export/import wouldn't have worked because it clones my vdi; and imports it as vmdk)
  2. ensure 'Controller: IDE' has 'Use Host I/O Cache' enabled (it does by default when creating new VM)

this means CD access won't hang

  1. ensure 'Controller: SATA' has 'Use Host I/O Cache' disabled (it's disabled by default when creating new VM)

this means HDD access will hang, below.

  1. start, Boot from CD/iso, you see grub "Boot Arch Linux (x86_64)", press Enter to boot from the iso

(do not boot HDD's OS, it will hang during boot sequence)

  1. you get to the root prompt, auto logged in.

info:

This is the default, 256KiB of readahead:

# hdparm -a /dev/sda



/dev/sda:
 readahead     = 256 (on)
# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
	Model Number:       VBOX HARDDISK                           
	Serial Number:      VB78ecd1fa-712db085 
	Firmware Revision:  1.0     
Standards:
	Used: ATA/ATAPI-6 published, ANSI INCITS 361-2002 
	Supported: 6 5 4 
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:    16514064
	LBA    user addressable sectors:    25165824
	LBA48  user addressable sectors:    25165824
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:       12288 MBytes
	device size with M = 1000*1000:       12884 MBytes (12 GB)
	cache/buffer size  = 256 KBytes (type=DualPortCache)
Capabilities:
	LBA, IORDY(cannot be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Vendor, no device specific minimum
	R/W multiple sector transfer: Max = 128	Current = 128
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	48-bit Address feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
Checksum: correct

  1. how to hang (boot session 3 is most interesting, below)

No hang when the following: ----

boot session 1:

dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=8

boot session 2:

dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=2

boot session 3:

dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=1
dd if=/dev/sda of=/dev/null bs=512 skip=42216 count=1

Hang when the following: ----

boot session 1:

dd if=/dev/sda of=/dev/null bs=512 skip=42208 count=9

boot session 2:

dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2

boot session 3:

dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2

To exit the hang, just kill the PID of AioMgr0-N, eg. in 'top', press Shift+H to show threads names (maybe pressing 'c' is needed to toggle between cmdline/names, depending on your 'top' settings) then see the PID of AioMgr0-N, it should be at the top of the list using CPU (if sorted by %CPU, press x to see sort column) it's using about 22% constantly, unless you have set 3 CPUs in VM, then it's 100% (aka 1 core used fully) That kills the VM window, but not the VirtualBox UI window, which is what I want anyway.

What you see in guest's dmesg after the hang:

[   16.159742] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
[  291.480202] ata1.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
[  291.480544] ata1.00: failed command: READ FPDMA QUEUED
[  291.480821] ata1.00: cmd 60/20:c0:e8:a4:00/00:00:00:00:00/40 tag 24 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  291.481437] ata1.00: status: { DRDY }
[  291.481653] ata1: hard resetting link
[  291.792860] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  291.793245] ata1.00: configured for UDMA/133
[  291.793251] ata1.00: device reported invalid CHS sector 0
[  291.793270] ata1: EH complete
[  323.906810] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen
[  323.907248] ata1.00: failed command: READ FPDMA QUEUED
[  323.907515] ata1.00: cmd 60/20:d8:e8:a4:00/00:00:00:00:00/40 tag 27 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  323.908164] ata1.00: status: { DRDY }
[  323.908868] ata1: hard resetting link
[  324.223058] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  324.224317] ata1.00: configured for UDMA/133
[  324.224323] ata1.00: device reported invalid CHS sector 0
[  324.224342] ata1: EH complete
[  354.626843] ata1.00: exception Emask 0x0 SAct 0x4000 SErr 0x0 action 0x6 frozen
[  354.628012] ata1.00: failed command: READ FPDMA QUEUED
[  354.628631] ata1.00: cmd 60/20:70:e8:a4:00/00:00:00:00:00/40 tag 14 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  354.631160] ata1.00: status: { DRDY }
[  354.631854] ata1: hard resetting link
[  354.942422] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  354.943605] ata1.00: configured for UDMA/133
[  354.943613] ata1.00: device reported invalid CHS sector 0
[  354.943637] ata1: EH complete
[  387.053504] ata1.00: NCQ disabled due to excessive errors
[  387.053508] ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
[  387.054265] ata1.00: failed command: READ FPDMA QUEUED
[  387.054859] ata1.00: cmd 60/20:08:e8:a4:00/00:00:00:00:00/40 tag 1 ncq dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  387.057247] ata1.00: status: { DRDY }
[  387.057842] ata1: hard resetting link
[  387.373510] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  387.374544] ata1.00: configured for UDMA/133
[  387.374553] ata1.00: device reported invalid CHS sector 0
[  387.374588] ata1: EH complete
[  419.480161] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  419.481049] ata1.00: failed command: READ DMA
[  419.481819] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 19 dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  419.485077] ata1.00: status: { DRDY }
[  419.485813] ata1: hard resetting link
[  419.799146] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  419.799713] ata1.00: configured for UDMA/133
[  419.799719] ata1.00: device reported invalid CHS sector 0
[  419.799736] ata1: EH complete
[  451.906813] ata1: limiting SATA link speed to 1.5 Gbps
[  451.906817] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  451.907568] ata1.00: failed command: READ DMA
[  451.908150] ata1.00: cmd c8/00:20:e8:a4:00/00:00:00:00:00/e0 tag 6 dma 16384 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  451.911967] ata1.00: status: { DRDY }
[  451.912716] ata1: hard resetting link
[  452.225783] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  452.226126] ata1.00: configured for UDMA/133
[  452.226126] ata1.00: device reported invalid CHS sector 0
[  452.226126] sd 0:0:0:0: [sda] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[  452.226126] sd 0:0:0:0: [sda] tag#6 Sense Key : 0x5 [current] 
[  452.226127] sd 0:0:0:0: [sda] tag#6 ASC=0x21 ASCQ=0x4 
[  452.226127] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x28 28 00 00 00 a4 e8 00 00 20 00
[  452.226127] blk_update_request: I/O error, dev sda, sector 42216
[  452.227928] ata1: EH complete

In the real world when 'Use Host I/O Cache' is disabled:

  • for the VM's IDE CD drive: I cannot boot from the ArchLinux or Manjaro iso (it will hang similarily)
  • for the VM's SATA HDD drive: cannot boot existing arch linux install

(also seen this on other guest linux distros; it's not an ArchLinux thing)


next section:

For hanging when reading /dev/sr0 aka 'the guest OS CD drive with archlinux-2017.03.01-dual.iso', when booting the OS from HDD,

two ways to reproduce:

Note: You cannot reproduce if readahead is off, eg. hdparm -a0 /dev/sr0

(also note, that -a256 being the default, the same dd 'skip' values from below will not work for a different readahead value)

way 1.

  1. to make HDD not hang AioMgr, clone it first (this magically gets rid of any AioMgr0-N hangs for some reason)

$ VBoxManage clonehd --format VDI arch3.vdi /tmp/cloned.vdi (obvious: to clone, first remove arch3.vdi from the VM, clone, then add cloned.vdi to the VM)

  1. set both IDE and SATA to have disabled 'Use Host I/O Cache'

c) boot from HDD(it boots from iso, at grub menu choose second option which is: 'Boot existing OS'); Don't boot iso's OS because it hangs instantly at Loading boot/x86_64/vmlinuz... and hangs AioMgr1-N with 99.9% cpu usage until you kill PID of AioMgr1-N which closes that VM's window.

way 2.

  1. if you didn't clone the vdi
  2. set only IDE(where iso is) on disbled 'Use Host I/O Cache', but let SATA(where vdi is) on enabled 'Use Host I/O Cache'
  3. boot, it will boot from iso, choose 'Boot existing OS' to boot from HDD(vdi)

way1&2 commons:

  1. at root prompt(you have to login as 'root', password is 'a'):

info:

[root@vm ~]# hdparm -a /dev/sr0

/dev/sr0:
 readahead     = 256 (on)
[root@vm ~]# hdparm -A /dev/sr0

/dev/sr0:
 HDIO_DRIVE_CMD(identify) failed: Input/output error

No hang when the following:

boot session 1:

dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=2
dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=2

boot session 2:

dd if=/dev/sr0 of=/dev/null bs=512 skip=88038 count=1
dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=1
dd if=/dev/sr0 of=/dev/null bs=512 skip=88040 count=1

Hang when the following:

boot session 1:

# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

boot session 2:

# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

way1:

# time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

dd: error reading '/dev/sr0': Input/output error
1+0 records in
1+0 records out
512 bytes copied, 101.679 s, 0.0 kB/s

real	1m41.692s
user	0m0.000s
sys	0m0.007s

top:

AioMgr1-N (yep, 1 not 0) constant CPU usage of 19.6% -> 21.6% ...even after dd finishes!

dmesg:

[   89.179123] systemd-logind[196]: New session c2 of user root.
[  129.346812] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  129.347540] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in
                        opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[  129.355906] ata3.00: status: { DRDY }
[  134.400076] ata3: link is slow to respond, please be patient (ready=0)
[  139.386826] ata3: device not ready (errno=-16), forcing hardreset
[  139.387505] ata3: soft resetting link
[  144.586725] ata3: link is slow to respond, please be patient (ready=0)
[  149.403407] ata3: SRST failed (errno=-16)
[  149.404249] ata3: soft resetting link
[  154.603401] ata3: link is slow to respond, please be patient (ready=0)
[  159.420725] ata3: SRST failed (errno=-16)
[  159.421558] ata3: soft resetting link
[  164.620145] ata3: link is slow to respond, please be patient (ready=0)
[  194.426716] ata3: SRST failed (errno=-16)
[  194.427477] ata3: soft resetting link
[  199.456775] ata3: SRST failed (errno=-16)
[  199.471749] ata3: reset failed, giving up
[  199.472636] ata3.00: disabled
[  199.474100] ata3: EH complete
[  199.474938] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  199.476392] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00
[  199.491977] blk_update_request: I/O error, dev sr0, sector 88040
[  199.494604] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  199.496175] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00
[  199.497683] blk_update_request: I/O error, dev sr0, sector 88040
[  199.498588] Buffer I/O error on dev sr0, logical block 11005, async page read


way2:

# time dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2
dd: error reading '/dev/sr0': Input/output error
1+0 records in
1+0 records out
512 bytes copied, 102.138 s, 0.0 kB/s

real	1m42.147s
user	0m0.007s
sys	0m0.000s

top:

AioMgr0-N (0 here) constant CPU usage of 19.6% -> 21.6% (100% would mean 1 core is fully used) ...even after dd finishes!

dmesg:

[  102.492843] systemd-logind[202]: New session c2 of user root.
[  102.494941] systemd[1]: Started Session c2 of user root.
[  139.586919] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  139.587909] ata3.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in
                        opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[  139.592029] ata3.00: status: { DRDY }
[  144.633536] ata3: link is slow to respond, please be patient (ready=0)
[  149.620205] ata3: device not ready (errno=-16), forcing hardreset
[  149.621023] ata3: soft resetting link
[  154.820197] ata3: link is slow to respond, please be patient (ready=0)
[  159.636906] ata3: SRST failed (errno=-16)
[  159.637643] ata3: soft resetting link
[  164.836889] ata3: link is slow to respond, please be patient (ready=0)
[  169.653533] ata3: SRST failed (errno=-16)
[  169.654227] ata3: soft resetting link
[  174.854340] ata3: link is slow to respond, please be patient (ready=0)
[  204.660185] ata3: SRST failed (errno=-16)
[  204.660805] ata3: soft resetting link
[  209.690159] ata3: SRST failed (errno=-16)
[  209.703203] ata3: reset failed, giving up
[  209.704066] ata3.00: disabled
[  209.704786] ata3: EH complete
[  209.705448] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  209.719212] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00
[  209.720520] blk_update_request: I/O error, dev sr0, sector 88040
[  209.722434] sr 2:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  209.724036] sr 2:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00
[  209.725383] blk_update_request: I/O error, dev sr0, sector 88040
[  209.726154] Buffer I/O error on dev sr0, logical block 11005, async page read


my host is Lenovo Z575, kernel: Linux z5 4.11.0-rc7-g4f7d029b9bf0 #151 SMP PREEMPT Wed Apr 19 18:36:02 CEST 2017 x86_64 GNU/Linux

virtualbox 5.1.18 r114002 on Arch Linux 64bit

Potentially relevant tickets: #15281, #13022

Change History

Changed 2 years ago by xftroxgpx

When booted from iso(/dev/sr0) and hanged HDD(/dev/sda) with dd 42215 count=2 (see step 8)

Changed 2 years ago by xftroxgpx

same log as before, but after 50 more minutes and after killing the hanged AioMgr0-N thread

comment:1 Changed 2 years ago by xftroxgpx

Briefly tried to hang AioMgr1-N (with the IDE/iso steps from above in 'next section', boot session 2) with the just-released(in ArchLinux testing repo, I mean) VirtualBox 5.1.20 r114628 (but I did recompile it myself though)

[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

dd: error reading '/dev/sr0': Input/output error
1+0 records in
1+0 records out
512 bytes copied, 102.141 s, 0.0 kB/s

dmesg:

[   47.309063] mandb[208]: Updating index cache for path `/usr/share/man/man3'. Wait...
[  110.573626] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  110.574843] ata2.00: cmd a0/01:00:00:00:40/00:00:00:00:00/a0 tag 0 dma 16384 in
                        opcode=0x28 28 00 00 00 55 fa 00 00 08 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[  110.586192] ata2.00: status: { DRDY }
[  115.633487] ata2: link is slow to respond, please be patient (ready=0)
[  120.623465] ata2: device not ready (errno=-16), forcing hardreset
[  120.624545] ata2: soft resetting link
[  125.823550] ata2: link is slow to respond, please be patient (ready=0)
[  130.640159] ata2: SRST failed (errno=-16)
[  130.641283] ata2: soft resetting link
[  135.840245] ata2: link is slow to respond, please be patient (ready=0)
[  140.690183] ata2: SRST failed (errno=-16)
[  140.691178] ata2: soft resetting link
[  145.890241] ata2: link is slow to respond, please be patient (ready=0)
[  165.753379] mandb[208]: /usr/bin/mandb: warning: /usr/share/man/man3/gnutls_x509_trust_list_add_system_trust.3.gz: whatis parse for gnutls_x509_trust_list_add_system_trust(3) failed
[  175.697630] ata2: SRST failed (errno=-16)
[  175.698734] ata2: soft resetting link
[  180.727614] ata2: SRST failed (errno=-16)
[  180.763306] ata2: reset failed, giving up
[  180.766737] ata2.00: disabled
[  180.767882] ata2: EH complete
[  180.768853] sr 1:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  180.771402] sr 1:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 08 00
[  180.773281] blk_update_request: I/O error, dev sr0, sector 88040
[  180.784906] sr 1:0:0:0: [sr0] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  180.786796] sr 1:0:0:0: [sr0] tag#0 CDB: opcode=0x28 28 00 00 00 55 fa 00 00 02 00
[  180.788529] blk_update_request: I/O error, dev sr0, sector 88040
[  180.789697] Buffer I/O error on dev sr0, logical block 11005, async page read

Killing the hung AioMgr1-N thread yielded no new lines in .log file, tried Refresh. Attaching log next...

Changed 2 years ago by xftroxgpx

hung AioMgr1-N with the IDE/iso steps

comment:2 Changed 2 years ago by xftroxgpx

Found more equivalents for this line:

dd if=/dev/sr0 of=/dev/null bs=512 skip=88039 count=2

as the following:

dd if=/dev/sr0 of=/dev/null bs=1 skip=45075968 count=513
dd if=/dev/sr0 of=/dev/null bs=1 skip=45076479 count=2

no hang with:

dd if=/dev/sr0 of=/dev/null bs=1 skip=45076479 count=1

The above are with the default:

hdparm -a256 /dev/sr0

Here's one with -a16 (actually just tested, the following works the same with -a256 too! but the above do not work with -a16)

no hang with:

[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=1
1+0 records in
1+0 records out
1 byte copied, 0.00255341 s, 0.4 kB/s
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=1
1+0 records in
1+0 records out
1 byte copied, 0.00293639 s, 0.3 kB/s
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=2
2+0 records in
2+0 records out
2 bytes copied, 0.00257371 s, 0.8 kB/s

hang with:

[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=2
dd: error reading '/dev/sr0': Input/output error
1+0 records in
1+0 records out
1 byte copied, 101.377 s, 0.0 kB/s

hangs AioMgr1-N with 99.9% cpu usage! even after dd finished

killed the AioMgr1-N PID via 'kill number' but no new lines appeared in log(unlike the second .log I've posted in this thread, the one where I kill it after 50mins)

Attaching Vbox.log next.

Changed 2 years ago by xftroxgpx

hung with -a16 for a 2 byte dd of /dev/sr0

comment:3 Changed 2 years ago by xftroxgpx

btw, the above 'no hang with' and 'hang with' commands are all in the same boot session, which makes me wonder how come guest linux didn't cache the bytes/readahead-sectors(?) from the previous commands, so that the last command would then not fail. But, most likely I don't know how caching works :) Maybe it only happens for the file system...

The guest OS kernel is/was: Linux vm 4.10.9-1-ARCH #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST 2017 x86_64 GNU/Linux

Updated (pacman -Syu) and kernel is now: Linux vm 4.10.10-1-ARCH #1 SMP PREEMPT Wed Apr 12 18:50:28 CEST 2017 x86_64 GNU/Linux

Can still reproduce the previous comment:

[root@vm ~]# uname -a
Linux vm 4.10.10-1-ARCH #1 SMP PREEMPT Wed Apr 12 18:50:28 CEST 2017 x86_64 GNU/Linux
[root@vm ~]# hdparm -a16 /dev/sr0

/dev/sr0:
 setting fs readahead to 16
 readahead     = 16 (on)
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=1
1+0 records in
1+0 records out
1 byte copied, 0.00228378 s, 0.4 kB/s
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=1
1+0 records in
1+0 records out
1 byte copied, 0.00230094 s, 0.4 kB/s
[root@vm ~]# dd if=/dev/sr0 of=/dev/null bs=1 skip=45084672 count=2
2+0 records in
2+0 records out
2 bytes copied, 0.00265822 s, 0.8 kB/s
[root@vm ~]# time dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=2
dd: error reading '/dev/sr0': Input/output error
1+0 records in
1+0 records out
1 byte copied, 101.113 s, 0.0 kB/s

real	1m41.123s
user	0m0.003s
sys	0m0.003s
[root@vm ~]# 

Changed 2 years ago by xftroxgpx

powereoff guest(didn't finish) after the above without killing AioMgr1-N, new status lines in log; no new lines after killing it

comment:4 Changed 2 years ago by xftroxgpx

I've just discovered that the reason why cloning the vdi didn't trigger this issue anymore is because I kept the cloned vdi in tmpfs (/tmp). By moving it back from /tmp to SSD and removing it from Virtual Media Manager(release then remove), then adding it again to the same VM then, only now 8 hours later, tried booting from 'existing OS' AioMgr0-N hangs again during boot. Then, I copied it back to /tmp (tmpfs) and removed from Virtual Media Manager(release then remove) then added the /tmp one to the same VM => no hang! So this makes me think that the location/storage of the vdi is what's making AioMgr hang. There are no bad sectors on my SSD, though.

I've also just tested exporting the VM, then importing it; this creates a .vmdk disk file, stored on my SSD(on host) and booting from 'existing OS' does not cause the AioMgr hang. I then removed the vmdk from the VM via UI, cloned it via VBoxManage clonehd --format VDI arch3_1-disk002.vmdk /tmp/arch3_1-disk002.vdi and then added the clone to the VM, no hang because vdi is in tmpfs; then removed it from Virtual Media Manager(release and remove), moved the vdi on the SSD, added it to the VM, boot from existing OS => bam, hang!

So this tells me, vdi (but not vmdk) location or storage is a major factor to hanging AioMgr: hangs if vdi is on my SSD but not if it's on /tmp tmpfs.

Last edited 2 years ago by xftroxgpx (previous) (diff)

comment:5 Changed 2 years ago by xftroxgpx

The above is true for the iso hangs too: the iso was on my SSD until now and just tried to hang it to make sure it still hangs and it does (via dd if=/dev/sr0 of=/dev/null bs=1 skip=45084671 count=2); then copied the iso to /tmp (tmpfs) then removed the old iso from the CD drive of the VM, browsed for the one in /tmp and added it to the VM as CD, booted 'existing OS'(that doesn't hang) then tried to hang with the same dd, no hang, tried a full-on dd (via dd if=/dev/sr0 of=/dev/null) and no hang!

So, the OP with reproduction is then true only if the vdi and/or iso (depending on which you're trying to hang) are located on my host's SSD, but not when they are located in /tmp (tmpfs)

Last edited 2 years ago by xftroxgpx (previous) (diff)

comment:6 Changed 2 years ago by xftroxgpx

Setting queue depth from here https://www.virtualbox.org/ticket/13022#comment:12 has no effect on this issue and I confirmed it by cat /sys/block/sda/device/queue_depth returning '1' (even though I only used the VBoxManage command to set it: VBoxManage setextradata arch3 "VBoxInternal/Devices/ahci/0/Config/CmdSlotsAvail" 1) and then was able to hang AioMgr with the dd line from step 8 boot session 3 from OP: dd if=/dev/sda of=/dev/null bs=512 skip=42215 count=2 .

comment:7 Changed 2 years ago by frank

  • Description modified (diff)
  • Summary changed from A reliable way to hang AioMgr0-N when 'fs readahead'(hdparm -a) is on (and is 256(KiB)) but only when 'Use Host I/O Cache' is disabled, for IDE(iso) and SATA(vdi) devices to A reliable way to hang AioMgr0-N under certain conditions
Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use