Opened 11 years ago
Closed 8 years ago
#12117 closed defect (obsolete)
USB, guest OHCI: lost bulk-out transfers using usbserial
Reported by: | chuck-h | Owned by: | |
---|---|---|---|
Component: | USB | Version: | VirtualBox 4.2.18 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Linux |
Description
An application running in the guest, repeatedly writing short serial strings to a USB serial adapter, hangs after seconds or minutes of activity. Disconnecting/reconnecting USB device from guest (from bottom toolbar of guest window) recovers from hang.
This behavior might be revealing a bug in memory-access synchronization of the emulated OHCI controller that would apply to many other USB devices besides serial adapters.
How to cause the error
Simple test application:
#!/usr/bin/env python from datetime import * from time import sleep from serial import * port = Serial("/dev/ttyUSB0", 57600, timeout=0, writeTimeout=0.5) try: print datetime.now().strftime("%H:%M:%S.%f") while True: port.write(datetime.now().strftime("%S.%f")) sleep(0.01) except Exception as e: print datetime.now().strftime("%H:%M:%S.%f"), e
Result:
chuck@rosman:~/ohci-dbg$ ./serial_blabber.py 13:44:13.102901 13:44:15.099878 Write timeout chuck@rosman:~/ohci-dbg$
Serial adapter is on host usb bus 4 (Intel UHCI); identifies as Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC
Host and Guest OS's are both Ubuntu 12.04LTS, both with multiple cores.
Diagnostics
Recompiled guest OS kernel with ohci-hcd as loadable module. Recompiled ohci-hcd with DEBUG and OHCI_VERBOSE_DEBUG enabled. Installed Wireshark 1.6.7 on both guest and host and captured USB traffic.
Last successful transfer contained the data 31342e323936343432 = "14.296442"
GUEST CAPTURE <src> <dest> <len> <info> <data> <wallclock time> host 5.2 73 URB_BULK out 31342e323936343432 13:44:14.296562 5.2 host 64 URB_BULK out 13:44:14.298193 host 5.2 73 URB_BULK out 31342e333036383930 13:44:14.306988 host 5.2 73 URB_BULK out 31342e333137333530 13:44:14.317401 (no more endpoint 5.2 transactions occur on guest) HOST CAPTURE host 77.2 73 URB_BULK out 31342e323936343432 13:44:14.603489 77.2 host 64 URB_BULK out 13:44:14.603755 (no more endpoint 77.2 transactions occur on host)
(there is about 0.30 sec offset between wallclocks of host and guest)
kernel log: successful submission and return of last successful URB
Sep 15 13:44:14 rosman kernel: [ 6846.201221] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3eacc0 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.201228] /home/chuck/ohci-dbg/ohci-dbg.c: data(0/9): 31 34 2e 32 39 36 34 34 32 stat:-115 Sep 15 13:44:14 rosman kernel: [ 6846.202759] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 Sep 15 13:44:14 rosman kernel: [ 6846.202766] /home/chuck/ohci-dbg/ohci-dbg.c: data(2/512): 01 60 stat:0 Sep 15 13:44:14 rosman kernel: [ 6846.202800] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=0/512 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.202817] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3eacc0 dev=5 ep=2out-bulk flags=10000 len=9/9 stat=0
next urb is submitted but no return
Sep 15 13:44:14 rosman kernel: [ 6846.211654] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3eacc0 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.211664] /home/chuck/ohci-dbg/ohci-dbg.c: data(0/9): 31 34 2e 33 30 36 38 39 30 stat:-115 Sep 15 13:44:14 rosman kernel: [ 6846.211809] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 Sep 15 13:44:14 rosman kernel: [ 6846.211997] /home/chuck/ohci-dbg/ohci-dbg.c: data(2/512): 01 60 stat:0 Sep 15 13:44:14 rosman kernel: [ 6846.212055] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=0/512 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.213769] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 Sep 15 13:44:14 rosman kernel: [ 6846.213779] /home/chuck/ohci-dbg/ohci-dbg.c: data(2/512): 01 60 stat:0 Sep 15 13:44:14 rosman kernel: [ 6846.213826] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=0/512 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.215760] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 Sep 15 13:44:14 rosman kernel: [ 6846.215769] /home/chuck/ohci-dbg/ohci-dbg.c: data(2/512): 01 60 stat:0 Sep 15 13:44:14 rosman kernel: [ 6846.215895] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=0/512 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.217780] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 Sep 15 13:44:14 rosman kernel: [ 6846.217789] /home/chuck/ohci-dbg/ohci-dbg.c: data(2/512): 01 60 stat:0 ...
One more urb submitted, also no return
Sep 15 13:44:14 rosman kernel: [ 6846.222044] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3eab40 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.222051] /home/chuck/ohci-dbg/ohci-dbg.c: data(0/9): 31 34 2e 33 31 37 33 35 30 stat:-115 Sep 15 13:44:14 rosman kernel: [ 6846.224000] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 Sep 15 13:44:14 rosman kernel: [ 6846.224045] /home/chuck/ohci-dbg/ohci-dbg.c: data(2/512): 01 60 stat:0 Sep 15 13:44:14 rosman kernel: [ 6846.224225] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=0/512 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.225802] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 Sep 15 13:44:14 rosman kernel: [ 6846.225816] /home/chuck/ohci-dbg/ohci-dbg.c: data(2/512): 01 60 stat:0 Sep 15 13:44:14 rosman kernel: [ 6846.225872] /home/chuck/ohci-dbg/ohci-dbg.c: SUB ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=0/512 stat=-115 Sep 15 13:44:14 rosman kernel: [ 6846.227783] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3ea540 dev=5 ep=1in-bulk flags=10200 len=2/512 stat=0 ...
Incidentally, the two "stuck" URB's do eventually get unlinked and returned when I disconnect the USB device:
Sep 15 13:44:19 rosman kernel: [ 6851.417909] usb 1-2: USB disconnect, device number 5 Sep 15 13:44:19 rosman kernel: [ 6851.417983] /home/chuck/ohci-dbg/ohci-dbg.c: UNLINK ffff88003a3eacc0 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=-108 Sep 15 13:44:19 rosman kernel: [ 6851.418024] /home/chuck/ohci-dbg/ohci-dbg.c: UNLINK ffff88003a3eab40 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=-108 Sep 15 13:44:19 rosman kernel: [ 6851.418035] /home/chuck/ohci-dbg/ohci-dbg.c: UNLINK ffff88003a3eab40 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=-2 Sep 15 13:44:19 rosman kernel: [ 6851.435975] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3eacc0 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=0 Sep 15 13:44:19 rosman kernel: [ 6851.436086] /home/chuck/ohci-dbg/ohci-dbg.c: RET ffff88003a3eab40 dev=5 ep=2out-bulk flags=10000 len=0/9 stat=0 Sep 15 13:44:19 rosman kernel: [ 6851.436754] ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0 Sep 15 13:44:19 rosman kernel: [ 6851.436774] ftdi_sio 1-2:1.0: device disconnected
So it appears that the guest is placing the URB transaction with data 31342e333036383930 into the OHCI queue, but the Virtualbox "emulated OHCI host controller" never sees it and does not pass it on to the the real UHCI host controller on the host.
Conjectured cause
As the guest is running 4 cores, it is likely that the enqueing of the transaction descriptor (TD) into the OHCI linked-list data structure occurs on one CPU, while the emulated OHCI controller code is executing on a different CPU. It is important that the "ED Tail Pointer" gets updated *after* the TD is completely filled in, and the kernel code in ohci-q.c has a write memory barrier (wmb) call to ensure the proper execution ordering:
/* append to queue */ list_add_tail (&td->td_list, &td->ed->td_list); /* hash it for later reverse mapping */ hash = TD_HASH_FUNC (td->td_dma); td->td_hash = ohci->td_hash [hash]; ohci->td_hash [hash] = td; /* HC might read the TD (or cachelines) right away ... */ wmb (); td->ed->hwTailP = td->hwNextTD;
According to https://www.kernel.org/doc/Documentation/memory-barriers.txt, there should be a matching data dependency barrier in the queue-reading code, VirtualBox's DevOHCI.cpp. However, I did not see it there.
The conjecture is that, occasionally, the CPU executing the emulated host controller DevOHCI sees the ED tail pointer updated before it sees the corresponding TD. The corrupt TD data prevents the transaction from completing normally.
The conjecture seems plausible, but I do not have explicit evidence that it is true.
Attachments (6)
Change History (17)
by , 11 years ago
by , 11 years ago
Attachment: | guest_outonly.tcpdump.zip added |
---|
by , 11 years ago
Attachment: | host_outonly.tcpdump.zip added |
---|
by , 11 years ago
Attachment: | guest_outonly_klog.txt.zip added |
---|
comment:2 by , 11 years ago
Continued investigation - built Virtualbox from source so I could get detailed logfiles and test code changes going forward. (Followed instructions at https://www.virtualbox.org/wiki/Linux%20build%20instructions using http://download.virtualbox.org/virtualbox/4.2.18/VirtualBox-4.2.18.tar.bz2 and BUILD_TYPE=debug). With no source changes, results were very similar to comment 1 above.
1-CPU guest
chuck@rosman:~/ohci-dbg$ ./serial_blabber.py 13:18:33.608861 13:28:26.251678 Write timeout chuck@rosman:~/ohci-dbg$
4-CPU guest
chuck@rosman:~/ohci-dbg$ ./serial_blabber.py 14:02:08.568468 14:02:14.579741 Write timeout chuck@rosman:~/ohci-dbg$ ./serial_blabber.py 14:02:35.586765 14:02:38.284246 Write timeout chuck@rosman:~/ohci-dbg$ ./serial_blabber.py 14:03:03.693417 14:03:05.250200 Write timeout
A side-effect of running the compiled-from-source VirtualBox is that disconnecting the USB device from the guest machine (using the bottom toolbar USB icon) causes the guest to abort with this message in the log:
00:23:01.717717 00:23:01.717720 !!Assertion Failed!! 00:23:01.717721 Expression: PDMCritSectIsOwner(pOhci->pDevInsR3->pCritSectRoR3) 00:23:01.717723 Location : /home/chuck/Downloads/VirtualBox-4.2.18/src/VBox/Devices/USB/DevOHCI.cpp(2508) bool ohciRhXferError(PVUSBIROOTHUBPORT, PVUSBURB)
Thus the "disconnect/reconnect" hang-recovery method described at the beginning of the original ticket won't work. Hang recovery is possible by physically disconnecting the device, re-inserting, and re-connecting to the guest. This gives a typical guest log:
00:01:29.575081 VUSB: detached '00007fd62cbfd2f0[proxy 0403:6001]' from port 2 00:01:49.385215 VUSB: attached '00007fd62cbfd7f0[proxy 0403:6001]' to port 2
(Possibly related to http://lists.freebsd.org/pipermail/freebsd-emulation/2012-November/010177.html)
comment:3 by , 11 years ago
Possible red flag in this comment in defn of ohciStartOfFrame(POHCI pOhci) @ devOHCI.cpp line 3676:
/* "After writing to HCCA, HC will set SF in HcInterruptStatus" - guest isn't executing, so ignore the order! */
This suggests the author might not have been anticipating a multiprocessor environment here (and possibly elsewhere in the code without explicit comment). I have not yet traced thru the implications of this.
comment:4 by , 11 years ago
Guest type: | other → Linux |
---|---|
Host type: | other → Linux |
comment:5 by , 11 years ago
If you want to continue over assertions (they will still be logged), make sure that the VM process runs with environment variable VBOX_ASSERT=n - this will suppress the breakpoint which aborts the process. Of course you could also run the VM process in a debugger and investigate the state when the breakpoint happens, and continue.
comment:6 by , 11 years ago
Regarding "red flag" in comment 3: restoring the ideal sequence -- i.e. to first call ohciUpdateHCCA(pOhci), then ohciSetInterrupt(pOhci, OHCI_INTR_START_OF_FRAME) -- did not seem to change the serial hang behavior.
comment:7 by , 11 years ago
A possible clue. Running !VirtualBox4.2.18 with BUILD_TYPE=debug and VBOX_LOG="dev_usb.e.l".
On the last successful transfer before hang, it looks like devOHCI processed an HcCommandStatus_w() callback, setting the Bulk List Full (BLF) flag, while it was inside the ohciStartOfFrame() routine. I believe this call came in normal fashion from the guest, executing td_submit_urb() in the ohci-q.cpp kernel code.
Inside ohciStartOfFrame(), the BLF flag is tested in order to decide to call ohciServiceBulkList(), then tested again to decide whether to unwind a disabled bulk list. There seems to be a narrow window of vulnerability if the guest sets BLF between these two tests, but no indication in the logs that this ocurred.
It is worth noting that the HcCommandStatus_w() log entry is the only indication we have in the host logs that the guest has just finished queuing a new bulk TD. We may presume that the other parts of the TD-queuing process occurred at about the same time (i.e. during host ohciStartOfFrame execution), and it might be one of these actions (e.g. updating pointers) that causes trouble, not the BLF setting itself. The SMP "memory-barriers" issue of the original ticket may be pertinent.
(The sequencing of log entries below may not be perfect; as far as I can tell it appears that outputs from Log() and Log2() statements may be interleaved in a different sequence from their execution. Or perhaps I am rather confused!)
--- last 9-byte transfer in host debug log (outgoing transfer only) ---
ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohciStartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 HcCommandStatus_w(0x00000004) => HCR=0 CLF=0 BLF=1 OCR=0 SOC=0 ohci: cbp=0x11815600 be=0x11815608 cbTotal=9 URB 00007f9568040500 Bulk/m>3591: ohciServiceTdMultiple: submitting cbData=0x9 EdAddr=0x36d6d0a0 cTds=1 TdAddr0=0x36ad1180 ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohci_in_flight_remove: reaping TD=0x36ad1180 1 frames (0x000038c3-0x000038c4) ohci: cbp=0x11815600 be=0x11815608 cbTotal=9 URB 00007f9568040500 Bulk/m>3591: ohciRhXferCompleteGeneralURB: ED=0x36d6d0a0 TD=0x36ad1180 Age=1 cbTotal=0x0 NewCbp=0x00000009 dqic=0 ohci: Writeback Done (0x36ad1301) on frame 0x38c5 (age 0x1)
--- next-to-last transfer ---
ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohciStartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 ohci: cbp=0x11815600 be=0x11815608 cbTotal=9 URB 00007f9568040500 Bulk/m>3585: ohciServiceTdMultiple: submitting cbData=0x9 EdAddr=0x36d6d0a0 cTds=1 TdAddr0=0x36ad11e0 ohci_in_flight_remove: reaping TD=0x36ad11e0 0 frames (0x000038b8-0x000038b8) ohci: cbp=0x11815600 be=0x11815608 cbTotal=9 URB 00007f9568040500 Bulk/m>3585: ohciRhXferCompleteGeneralURB: ED=0x36d6d0a0 TD=0x36ad11e0 Age=0 cbTotal=0x0 NewCbp=0x00000009 dqic=0 ohci: Writeback Done (0x36ad11e1) on frame 0x38b9 (age 0x1)
comment:8 by , 11 years ago
A possibly-relevant characteristic of the usb serial driver is that there is constant polling traffic on endpoint 1; the driver is asking the FTDI device whether there are any incoming characters ready. This traffic is apparent in the kernel logs of the original ticket, but I have edited out the endpoint 1 (input) traffic from my recent comments and left only the endpoint 2 (output) traffic.
I suppose it is possible that this steady background of endpoint 1 traffic, at about 1000 transactions/sec, might also be stressing the OHCI device emulator in some way.
comment:9 by , 11 years ago
Continuing with more verbose logging, VBOX_LOG="dev_usb.e.l.l2.l3.f". This time I did not edit out the other traffic. When there is no outgoing traffic, the serial driver's read polling continually produces sequences like this:
ohci: Writeback Done (0x36831241) on frame 0x1c96 (age 0x1) ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohciStartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohci: cbp=0x1a542000 be=0x1a5421ff cbTotal=512 URB 00007fdfa4007690 Bulk/m<2407: ohciServiceTdMultiple: submitting cbData=0x200 EdAddr=0x368260f0 cTds=1 TdAddr0=0x36831000 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 URB 00007fdfa4007690 Bulk/m<2407: ohciRhXferCompletion: EdAddr=0x368260f0 cTds=1 TdAddr0=0x36831000 ohci_in_flight_remove: reaping TD=0x36831000 1 frames (0x00001c97-0x00001c98) ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohci: cbp=0x1a542000 be=0x1a5421ff cbTotal=512 URB 00007fdfa4007690 Bulk/m<2407: ohciRhXferCompleteGeneralURB: ED=0x368260f0 TD=0x36831000 Age=1 cbTotal=0x0 NewCbp=0x00000002 dqic=441720834 ohciWriteTd(,0x36831000,): R=1 DP=2 DI=0x0 T=0 EC=0 *CC=0x0 *CBP=0x1a542002 *NextTD=0x00000000 BE=0x1a5421ff (retired) ohciWriteEd(,0x368260f0,): FA=0x3 EN=0x1 D=0x2 S=0 K=0 F=0 MPS=0x40 TailP=0x36831240 *HeadP=0x36831240 H=0 C=0 NextED=0x00000000 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohci: Writeback Done (0x36831001) on frame 0x1c99 (age 0x1) ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD
In the logs below, this type of activity is interleaved with the 9-byte output transfers.
Again, the final transfer includes a "HcCommandStatus_w(0x00000004)" entry.
--- last completed 9-byte transfer ---
ohci: Writeback Done (0x36831001) on frame 0x1ca6 (age 0x1) ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohciStartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831240,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831000 BE=0x1a5421ff UNK=0x0 ohciReadTd(,0x36831240,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831000 BE=0x1a5421ff UNK=0x0 ohci: cbp=0x1a542000 be=0x1a5421ff cbTotal=512 URB 00007fdfa4007690 Bulk/m<2415: ohciServiceTdMultiple: submitting cbData=0x200 EdAddr=0x368260f0 cTds=1 TdAddr0=0x36831240 HcCommandStatus_w(0x00000004) => HCR=0 CLF=0 BLF=1 OCR=0 SOC=0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831240,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831000 BE=0x1a5421ff UNK=0x0 ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 ohci: cbp=0x18130c00 be=0x18130c08 cbTotal=9 URB 00007fdfa400c980 Bulk/m>2416: ohciServiceTdMultiple: submitting cbData=0x9 EdAddr=0x36826140 cTds=1 TdAddr0=0x36831300 URB 00007fdfa4007690 Bulk/m<2415: ohciRhXferCompletion: EdAddr=0x368260f0 cTds=1 TdAddr0=0x36831240 ohci_in_flight_remove: reaping TD=0x36831240 1 frames (0x00001ca7-0x00001ca8) ohciReadTd(,0x36831240,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831000 BE=0x1a5421ff UNK=0x0 ohci: cbp=0x1a542000 be=0x1a5421ff cbTotal=512 URB 00007fdfa4007690 Bulk/m<2415: ohciRhXferCompleteGeneralURB: ED=0x368260f0 TD=0x36831240 Age=1 cbTotal=0x0 NewCbp=0x00000002 dqic=441720834 ohciWriteTd(,0x36831240,): R=1 DP=2 DI=0x0 T=0 EC=0 *CC=0x0 *CBP=0x1a542002 *NextTD=0x00000000 BE=0x1a5421ff (retired) ohciWriteEd(,0x368260f0,): FA=0x3 EN=0x1 D=0x2 S=0 K=0 F=0 MPS=0x40 TailP=0x36831000 *HeadP=0x36831000 H=0 C=0 NextED=0x36826140 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 ohci: Writeback Done (0x36831241) on frame 0x1ca9 (age 0x1) ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD URB 00007fdfa400c980 Bulk/m>2416: ohciRhXferCompletion: EdAddr=0x36826140 cTds=1 TdAddr0=0x36831300 ohci_in_flight_remove: reaping TD=0x36831300 1 frames (0x00001ca8-0x00001ca9) ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 ohci: cbp=0x18130c00 be=0x18130c08 cbTotal=9 URB 00007fdfa400c980 Bulk/m>2416: ohciRhXferCompleteGeneralURB: ED=0x36826140 TD=0x36831300 Age=1 cbTotal=0x0 NewCbp=0x00000009 dqic=0 ohciWriteTd(,0x36831300,): R=1 DP=1 DI=0x0 T=0 EC=0 *CC=0x0 *CBP=0x00000000 *NextTD=0x00000000 BE=0x18130c08 (retired) ohciWriteEd(,0x36826140,): FA=0x3 EN=0x2 D=0x1 S=0 K=0 F=0 MPS=0x40 TailP=0x368311e0 *HeadP=0x368311e0 H=0 C=0 NextED=0x00000000 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohci: Writeback Done (0x36831301) on frame 0x1caa (age 0x1) ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD
--- typical normal transfer, earlier in log ---
ohci: Writeback Done (0x36831241) on frame 0x1c90 (age 0x1) ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohciStartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohci: cbp=0x1a542000 be=0x1a5421ff cbTotal=512 URB 00007fdfa400c980 Bulk/m<2404: ohciServiceTdMultiple: submitting cbData=0x200 EdAddr=0x368260f0 cTds=1 TdAddr0=0x36831000 ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 ohci: cbp=0x18130c00 be=0x18130c08 cbTotal=9 URB 00007fdfa4007690 Bulk/m>2405: ohciServiceTdMultiple: submitting cbData=0x9 EdAddr=0x36826140 cTds=1 TdAddr0=0x36831300 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 URB 00007fdfa400c980 Bulk/m<2404: ohciRhXferCompletion: EdAddr=0x368260f0 cTds=1 TdAddr0=0x36831000 ohci_in_flight_remove: reaping TD=0x36831000 1 frames (0x00001c91-0x00001c92) ohciReadTd(,0x36831000,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a542000 NextTD=0x36831240 BE=0x1a5421ff UNK=0x0 ohci: cbp=0x1a542000 be=0x1a5421ff cbTotal=512 URB 00007fdfa400c980 Bulk/m<2404: ohciRhXferCompleteGeneralURB: ED=0x368260f0 TD=0x36831000 Age=1 cbTotal=0x0 NewCbp=0x00000002 dqic=441720834 ohciWriteTd(,0x36831000,): R=1 DP=2 DI=0x0 T=0 EC=0 *CC=0x0 *CBP=0x1a542002 *NextTD=0x00000000 BE=0x1a5421ff (retired) ohciWriteEd(,0x368260f0,): FA=0x3 EN=0x1 D=0x2 S=0 K=0 F=0 MPS=0x40 TailP=0x36831240 *HeadP=0x36831240 H=0 C=0 NextED=0x36826140 URB 00007fdfa4007690 Bulk/m>2405: ohciRhXferCompletion: EdAddr=0x36826140 cTds=1 TdAddr0=0x36831300 ohci_in_flight_remove: reaping TD=0x36831300 1 frames (0x00001c91-0x00001c92) ohciReadTd(,0x36831300,): R=1 DP=1 DI=0 T=0 EC=0 CC=0xf CBP=0x18130c00 NextTD=0x368311e0 BE=0x18130c08 UNK=0x0 ohci: cbp=0x18130c00 be=0x18130c08 cbTotal=9 URB 00007fdfa4007690 Bulk/m>2405: ohciRhXferCompleteGeneralURB: ED=0x36826140 TD=0x36831300 Age=1 cbTotal=0x0 NewCbp=0x00000009 dqic=0 ohciWriteTd(,0x36831300,): R=1 DP=1 DI=0x0 T=0 EC=0 *CC=0x0 *CBP=0x00000000 *NextTD=0x36831000 BE=0x18130c08 (retired) ohciWriteEd(,0x36826140,): FA=0x3 EN=0x2 D=0x1 S=0 K=0 F=0 MPS=0x40 TailP=0x368311e0 *HeadP=0x368311e0 H=0 C=0 NextED=0x00000000 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohciReadTd(,0x36831120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x36d25000 NextTD=0x368310c0 BE=0x36d25005 UNK=0x0 ohci: Writeback Done (0x36831301) on frame 0x1c93 (age 0x1) ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD
Attached complete logfile https://www.virtualbox.org/attachment/ticket/12117/2013-09-30-00-16-05.050-VirtualBox-23403.log.gz of a later run. Same behavior.
by , 11 years ago
Attachment: | 2013-09-30-00-16-05.050-VirtualBox-23403.log.gz added |
---|
debug logfile VBOX_LOG="dev_usb.e.l.l2.l3.f", # define VBOX_WITH_R0_LOGGING
comment:10 by , 11 years ago
I patched the guest's ftdi-sio.ko driver, forcing it to "throttle" and stop polling the input endpoint. That reduced clutter in the logfiles but the "hang" still occurs.
It now seems that the problem arises when two ohci interrupt requests "pile up". Specifically, the logs seem to show a failure with a WDH (Writeback Done Head) interrupt being serviced, then an SOF (Start of Frame) interrupt ocurring before the first irq service completes. The SOF interrupt never gets handled, which leaves the Endpoint Descriptor in an "UNLINK" state and stalls the port.
I don't know exactly how overlapping interrupts are *supposed* to be handled; I assume the later one should be queued up and processed after the first interrupt returns, but I don't know how the details work. With multiple cores is it possible that the two interrupts could be executing simultaneously on two different cores?
I wonder if this is a defect in the guest ohci driver, that is stressed by the timing conditions generated by the virtual DevOHCI emulation. I should probably try the latest Linux kernel; the guest is currently
uname -a Linux rosman 3.2.0-53-generic #81 SMP Fri Sep 13 00:28:49 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Here are some host logfile extracts, with ***
annotations describing what is going on in the guest code (ohci-hcd.c and ohci-q.c).
Comparing "normal" WritebackDone interrupt service with "failure" (last) one --- normal --- <Note: "1130462695" here is an RTTimeMilliTS() timestamp > 1130462695 ohci: Writeback Done (0x1044f061) on frame 0x484d (age 0x1) 1130462695 ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohciStartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 *** guest: ohci_irq() { HcInterruptStatus_r() -> 0x00000006 - SO=0 WDH=1 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 HcInterruptEnable_r() -> 0x8000005a - SO=0 WDH=1 SF=0 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 *** WDH (writeback done head), mask bit 0x00000002, is the active interrupt *** dl_done_list() { *** takeback_td() { *** if no tds left on list, call start_ed_unlink() { *** ed_deschedule() : reset BLE to 0 HcControl_w(0x0000028f) => CBSR=3 PLE=1 IE=1 CLE=0 *BLE=0 HCFS=0x2 IR=0 RWC=1 RWE=0 *** enable SOF interrupt HcInterruptStatus_w(0x00000004) => SO=0 WDH=1 *SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 1130462695 ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - HcInterruptStatus_w HcInterruptEnable_w(0x00000004) => SO=0 WDH=1 *SF=1 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 1130462695 ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - HcInterruptEnable_w HcControl_r -> 0x0000028f - CBSR=3 PLE=1 IE=1 CLE=0 BLE=0 HCFS=0x2 IR=0 RWC=1 RWE=0 *** } } } *** finish out irq service HcInterruptStatus_w(0x00000002) => SO=0 *WDH=0 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 HcInterruptEnable_w(0x80000000) => SO=0 WDH=1 SF=1 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 HcControl_r -> 0x0000028f - CBSR=3 PLE=1 IE=1 CLE=0 BLE=0 HCFS=0x2 IR=0 RWC=1 RWE=0 *** } // ohci_irq ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 1130462696 ohci: Fired off interrupt 0x00000004 - SO=0 WDH=0 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_START_OF_FRAME *** guest: ohci_irq() { HcInterruptStatus_r() -> 0x00000004 - SO=0 WDH=0 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 HcInterruptEnable_r() -> 0x8000005e - SO=0 WDH=1 SF=1 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 *** SOF (start of frame), mask bit 0x00000004, is the active interrupt *** finish_unlinks() *** finish out irq service HcInterruptDisable_w(0x00000004) => SO=0 WDH=1 *SF=0 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 HcInterruptStatus_w(0x00000004) => SO=0 WDH=0 *SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 HcInterruptEnable_w(0x80000000) => SO=0 WDH=1 SF=0 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 HcControl_r -> 0x0000028f - CBSR=3 PLE=1 IE=1 CLE=0 BLE=0 HCFS=0x2 IR=0 RWC=1 RWE=0 *** } // ohci_irq ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 [...] ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 *** guest posts new URB *** ohci_urb_enqueue { *** ed_schedule() { HcBulkHeadED_w(0x2c1d8140) - old=0x2c1d8140 new=0x2c1d8140 HcBulkCurrentED_w(0x00000000) - old=0x00000000 new=0x00000000 HcControl_w(0x000002af) => CBSR=3 PLE=1 IE=1 CLE=0 *BLE=1 HCFS=0x2 IR=0 RWC=1 RWE=0 *** } *** td_submit_urb() { *** kickstart bulk list 1130462704 HcCommandStatus_w(0x00000004) => HCR=0 CLF=0 *BLF=1 OCR=0 SOC=0
--- failure --- 1130462707 ohci: Writeback Done (0x1044f241) on frame 0x4856 (age 0x1) 1130462707 ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_WRITE_DONE_HEAD ohciStartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 *** guest: ohci_irq() { HcInterruptStatus_r() -> 0x00000006 - SO=0 WDH=1 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 HcInterruptEnable_r() -> 0x8000005a - SO=0 WDH=1 SF=0 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 *** WDH (writeback done head), mask bit 0x00000002, is the active interrupt *** dl_done_list() { *** takeback_td() { *** if no tds left on list, call start_ed_unlink() { *** ed_deschedule() : reset BLE to 0 HcControl_w(0x0000028f) => CBSR=3 PLE=1 IE=1 CLE=0 *BLE=0 HCFS=0x2 IR=0 RWC=1 RWE=0 *** enable SOF interrupt HcInterruptStatus_w(0x00000004) => SO=0 WDH=1 *SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 1130462708 ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - HcInterruptStatus_w HcInterruptEnable_w(0x00000004) => SO=0 WDH=1 *SF=1 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 1130462708 ohci: Fired off interrupt 0x00000002 - SO=0 WDH=1 SF=0 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - HcInterruptEnable_w ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 1130462708 ohci: Fired off interrupt 0x00000006 - SO=0 WDH=1 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - OHCI_INTR_START_OF_FRAME *** SOF interrupt before WDH interrupt has completed HcControl_r -> 0x0000028f - CBSR=3 PLE=1 IE=1 CLE=0 BLE=0 HCFS=0x2 IR=0 RWC=1 RWE=0 *** } } } *** finish out WDH irq service HcInterruptStatus_w(0x00000002) => SO=0 *WDH=0 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 1130462708 ohci: Fired off interrupt 0x00000004 - SO=0 WDH=0 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - HcInterruptStatus_w HcInterruptEnable_w(0x80000000) => SO=0 WDH=1 SF=1 RD=1 UE=1 FNO=0 RHSC=1 OC=0 MIE=1 1130462708 ohci: Fired off interrupt 0x00000004 - SO=0 WDH=0 SF=1 RD=0 UE=0 FNO=0 RHSC=0 OC=0 - HcInterruptEnable_w HcControl_r -> 0x0000028f - CBSR=3 PLE=1 IE=1 CLE=0 BLE=0 HCFS=0x2 IR=0 RWC=1 RWE=0 *** } // ohci_irq *** SOF interrupt never handled, so never execute finish_unlinks() ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 [...] ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 *** guest posts new URB *** ohci_urb_enqueue { *** would schedule ED here if state==ED_IDLE, but we are stuck in ED_UNLINK *** therefore BLE never gets turned back on in HcControl register 1130462715 HcCommandStatus_w(0x00000004) => HCR=0 CLF=0 *BLF=1 OCR=0 SOC=0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 ohciReadTd(,0x1044f120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x1a31d000 NextTD=0x1044f0c0 BE=0x1a31d005 UNK=0x0 [... until timeout]
comment:11 by , 8 years ago
Resolution: | → obsolete |
---|---|
Status: | new → closed |
Please reopen if still relevant with a recent VirtualBox release.
More information relevant to the multi-processor synchronization conjecture. The effect is definitely worse with multi-cpu's in the guest, but transfers still hang with single-cpu guest configuration:
1-CPU guest VM: ~ 6 minutes to fail
4-CPU guest VM: ~15 seconds to fail
Can we expect the DevOHCI code to be executing in the host environment, and the ohci-hcd code to be executing in the guest environment? IIRC, the VM gets a dedicated CPU in a multiprocessor host environment, thus they will always be executing on different CPUs. Having 4 instead of 1 guest CPUs apparently increases the stress level.