VirtualBox

Opened 5 years ago

Last modified 17 months ago

#18176 new defect

USB resets and hangs during high I/O — at Version 8

Reported by: Jimson Owned by:
Component: USB Version: VirtualBox 5.2.22
Keywords: usb reset superspeed Cc:
Guest type: Linux Host type: Mac OS X

Description (last modified by michaln)

Experiencing USB reset and hangs on Virtualbox 5.2.22 on Macbook Pro 2017 10.13.6, on Oracle Linux 7.5 and Ubuntu 18.04 VMs. I'm able to replicate the problem, very quickly, by running the following:

$ sudo dd if=/dev/zero of=./testfile status=progress bs=1024k
639631360 bytes (640 MB) copied, 5.448752 s, 117 MB/s

It starts out pretty fast, but once the reset occurs, the speeds taper down to < 50MB/s. The OL system log shows the resets and I/O errors as:

Dec 10 15:55:28 jdnissen-lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 10 15:55:38 jdnissen-lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 10 15:55:44 jdnissen-lobi7 systemd: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1858 (cma)
Dec 10 15:55:44 jdnissen-lobi7 systemd: Mounting Arbitrary Executable File Formats File System...
Dec 10 15:55:45 jdnissen-lobi7 systemd: Mounted Arbitrary Executable File Formats File System.
Dec 10 15:55:46 jdnissen-lobi7 kernel: EXT4-fs (dm-3): Delayed block allocation failed for inode 12 at logical offset 546816 with max blocks 2048 with error 5
Dec 10 15:55:46 jdnissen-lobi7 kernel: EXT4-fs (dm-3): This should not happen!! Data will be lost
Dec 10 15:55:46 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:55:51 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:55:57 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:56:01 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:56:07 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:56:12 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:56:17 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:56:22 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:56:27 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
Dec 10 15:56:32 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8

I've attempted to workaround this with the new test build 6.0.0 RC1, and it didn't help. OL and Ubuntu updates don't help, either.

It's happening on two different USB drives: Seagate 2TB and Western Digital 2TB. The seagate is formatted as NTFS and the WD is Linux EXT4.

In addition, I can attach the drives to an old laptop running bare-metal OL7.2, and it works just fine...no resets, hangs, or slow I/Os, during the 'dd' (or file copies).

I work for Oracle, and this problem is preventing me from being able to copy large patch bundles, used during customer installs.

Change History (10)

by Jimson, 5 years ago

Attachment: VBox.log added

comment:1 by Jimson, 5 years ago

Made several attempts at reproducing this issue on a non-Mac host, using the same VM exported/imported. However, my attempts at reproducing on two Windows 10 hosts failed, as the device wouldn't even mount in the VM, on either, with following mount attempt errors...

Dec 11 12:05:50 jdnissen-lobi7 kernel: usb 1-1: new high-speed USB device number 2 using xhci_hcd
Dec 11 12:05:50 jdnissen-lobi7 kernel: usb 1-1: device descriptor read/64, error 18
Dec 11 12:05:51 jdnissen-lobi7 kernel: usb 1-1: device descriptor read/64, error 18
Dec 11 12:05:51 jdnissen-lobi7 kernel: usb 1-1: new high-speed USB device number 3 using xhci_hcd
Dec 11 12:05:51 jdnissen-lobi7 kernel: usb 1-1: device descriptor read/64, error 18
Dec 11 12:05:51 jdnissen-lobi7 kernel: usb 1-1: device descriptor read/64, error 18
Dec 11 12:05:52 jdnissen-lobi7 kernel: usb 1-1: new high-speed USB device number 4 using xhci_hcd
Dec 11 12:05:52 jdnissen-lobi7 kernel: usb 1-1: Invalid ep0 maxpacket: 9
Dec 11 12:05:52 jdnissen-lobi7 kernel: usb 1-1: new high-speed USB device number 5 using xhci_hcd
Dec 11 12:05:52 jdnissen-lobi7 kernel: usb 1-1: Invalid ep0 maxpacket: 9
Dec 11 12:05:52 jdnissen-lobi7 kernel: usb usb1-port1: unable to enumerate USB device

I ran out of time to troubleshoot this Windows 10 Vbox USB issue, so made another attempt on an old Dell E7440 running Oracle Linux 7.6, and though it took much longer, the USB resets occurred there, too...

Dec 12 08:54:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 12 08:54:23 lobi7 kernel: sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Dec 12 08:54:23 lobi7 kernel: sd 0:0:0:0: [sdb] CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
Dec 12 08:54:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 12 08:54:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 12 08:54:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 12 08:54:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 12 08:54:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 12 08:54:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd

[lobi7 ~]$ cd /run/media/jdnissen/b8049a7e-c0ce-4739-a586-2bb6e05ec5e6/
[lobi7 b8049a7e-c0ce-4739-a586-2bb6e05ec5e6]$ sudo dd if=/dev/zero of=./testfile status=progress bs=1024k
1393245028352 bytes (1.4 TB) copied, 13969.899470 s, 99.7 MB/s
dd: error writing ‘./testfile’: Read-only file system
1328739+0 records in
1328738+0 records out
1393283014656 bytes (1.4 TB) copied, 14164.6 s, 98.4 MB/s

Log is attached.

in reply to:  description comment:2 by Jimson, 5 years ago

The original errors in code-blocks for readability...

Dec  7 10:09:50 lobi7 kernel: usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd
Dec  7 10:09:50 lobi7 kernel: usb 2-1: New USB device found, idVendor=1058, idProduct=25a2
Dec  7 10:09:50 lobi7 kernel: usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Dec  7 10:09:50 lobi7 kernel: usb 2-1: Product: Elements 25A2
Dec  7 10:09:50 lobi7 kernel: usb 2-1: Manufacturer: Western Digital
Dec  7 10:09:50 lobi7 kernel: usb 2-1: SerialNumber: 575854314536384353524154
Dec  7 10:09:50 lobi7 kernel: usb-storage 2-1:1.0: USB Mass Storage device detected
Dec  7 10:09:50 lobi7 kernel: scsi host6: usb-storage 2-1:1.0
Dec  7 10:09:50 lobi7 mtp-probe: checking bus 2, device 4: "/sys/devices/pci0000:00/0000:00:0c.0/usb2/2-1"
Dec  7 10:09:50 lobi7 mtp-probe: bus: 2, device: 4 was not an MTP device
Dec  7 10:09:51 lobi7 kernel: scsi 6:0:0:0: Direct-Access     WD       Elements 25A2    1021 PQ: 0 ANSI: 6
Dec  7 10:09:51 lobi7 kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0
Dec  7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] 3906963456 512-byte logical blocks: (2.00 TB/1.81 TiB)
Dec  7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] Write Protect is off
Dec  7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] No Caching mode page found
Dec  7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] Assuming drive cache: write through
Dec  7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] Attached SCSI disk
...
Dec  7 09:47:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:24 lobi7 kernel: sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Dec  7 09:47:24 lobi7 kernel: sd 5:0:0:0: [sdb] CDB: Write(10) 2a 00 07 4e c4 00 00 04 00 00
Dec  7 09:47:24 lobi7 kernel: blk_update_request: I/O error, dev sdb, sector 122602496
Dec  7 09:47:24 lobi7 kernel: EXT4-fs warning (device dm-3): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 41783656448 size 8388608 starting block 15324800)
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324800
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324801
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324802
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324803
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324804
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324805
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324806
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324807
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324808
Dec  7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324809
…
Dec  7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
Dec  7 09:47:26 lobi7 kernel: sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Dec  7 09:47:26 lobi7 kernel: sd 5:0:0:0: [sdb] CDB: Write(10) 2a 00 07 4e d0 00 00 04 00 00
Dec  7 09:47:26 lobi7 kernel: blk_update_request: I/O error, dev sdb, sector 122605568
Dec  7 09:47:26 lobi7 kernel: EXT4-fs warning (device dm-3): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 41792045056 size 8388608 starting block 15325184)

My concerns aren't so much the performance impact, but rather that large file copies stop, abruptly. With 100's of files to copy, at a time, it means restarting the process where it left off.

Last edited 5 years ago by Jimson (previous) (diff)

comment:3 by michaln, 5 years ago

I was able to reproduce this... and then I wasn't. But it certainly didn't happen because anything got intentionally fixed.

The underlying problem unfortunately appears to be some race condition which is inherently very configuration specific and at least for me, currently so hard to reproduce that I can't meaningfully try to fix it. I can read hundreds of gigabytes at ~180 MB/s, no problem, the resets just aren't happening. That doesn't mean the resets won't happen on other machines, but at the moment I don't have those other machines.

Please try again with a Windows host, and report any issues encountered. I believe the problem is much less severe on a Windows host because it only slows things down but does not cause I/O errors.

If you find something that makes the errors more frequent (number of virtual CPUs? idle/busy host? VM memory size?), please add a comment here.

comment:4 by michaln, 5 years ago

Summary: USB 3.0 resets and hangs during high I/OUSB resets and hangs during high I/O

I'll add that the problem is not USB 3.0 specific. There is some probability involved, and the error may be more likely to show up as the number of USB transactions go up, so with fast devices the error probably just shows up quicker.

There is definitely a problem in the VirtualBox Linux USB proxy in that it does now allow USB devices to be truly reset. It looks like the USB device may get sufficiently confused that it simply stops responding (that could in fact be triggering the initial reset attempt) and because we do not really reset it, the USB device stays confused until the guest OS gives up on it.

comment:5 by Jimson, 5 years ago

My attempts at reproducing this problem on two home Windows 10 computers failed, at first, I think due to bug 84741. However, an upgrade to Vbox 6.0 fixed that, and interestingly, I have been unable to reproduce this bug on Windows 10 hosts running Vbox 6.0, running the same Oracle Linux VM. I have read a large 450GB file and wrote a large 1.7TB file, until the device filled up. However, there were a few minor USB resets:

Dec 21 11:21:38 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
Dec 21 12:33:36 lobi7 kernel: EXT4-fs (dm-3): error count since last fsck: 5
Dec 21 12:33:36 lobi7 kernel: EXT4-fs (dm-3): initial error at time 1544198049: ext4_journal_check_start:56
Dec 21 12:33:36 lobi7 kernel: EXT4-fs (dm-3): last error at time 1544478866: ext4_wait_block_bitmap:497
Dec 21 12:33:56 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd 

The upgrade to Vbox 6.0 doesn't fix this bug, on the Mac host, however. The problem still occurs there, rather quickly.

Last edited 5 years ago by michaln (previous) (diff)

comment:6 by michaln, 5 years ago

Actually, you did reproduce half of the problem -- it's that "reset SuperSpeed USB device number 2 using xhci_hcd" message. The difference is that on a Windows host, the VM / USB device / VirtualBox can recover from that situation, and on Linux or OS X hosts it can't.

I know exactly why the problem exists on Linux, it's because we only pretend to reset the USB device but don't really. On OS X I'm not sure why it's happening, I thought we should be really resetting the device but perhaps we aren't. I haven't looked at the OS X behavior in detail yet.

On Linux hosts, the question is how to fix it, and I don't have the answer yet though I have some ideas.

comment:7 by Socratis, 5 years ago

Michal,

Do you need any more data points? I'm on OSX 10.11.6 with plenty of VMs to test this on. And 2 USB3 HDs.

Also, since you have the power, could you edit the messages from 'Jimson' to include the logs/messages in {{{ ... }}} tags, to make reading a little bit easier? TIA.

Or you could do it too Jimson, they're your messages, you have the authority. Except the original one, the ticket report itself. As the poet once said, "U Can't Touch This!" ;)

comment:8 by michaln, 5 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use