[vbox-dev] vgdrvIoCtl_CancelAllWaitEvents inconsistent behavior ?

Hans de Goede hdegoede at redhat.com
Wed Jul 5 13:55:55 GMT 2017


Hi,

On 05-07-17 12:14, Michael Thayer wrote:
> Hello Hans,
> 
> 05.07.2017 11:18, Hans de Goede wrote:
> [Discussion of inconsistent wait event cancelling code in VBoxGuest.]
>> I've been looking at the userspace code myself and this bit:
>>
>> src/VBox/Additions/x11/VBoxClient/seamless.cpp:
>>
>> int SeamlessMain::selfTest()
>> {
>>      int rc = VERR_INTERNAL_ERROR;
>>      const char *pcszStage;
>>
>>      LogRelFlowFunc(("\n"));
>>      do {
>>          pcszStage = "Testing event loop cancellation";
>>          VbglR3InterruptEventWaits();
>>          if (RT_FAILURE(VbglR3WaitEvent(VMMDEV_EVENT_VALID_EVENT_MASK, 0,
>> NULL)))
>>              break;
>>          if (   VbglR3WaitEvent(VMMDEV_EVENT_VALID_EVENT_MASK, 0, NULL)
>>              != VERR_TIMEOUT)
>>              break;
>>          rc = VINF_SUCCESS;
>>      } while(0);
>>      if (RT_FAILURE(rc))
>>          LogRel(("VBoxClient (seamless): self test failed.  Stage:
>> \"%s\"\n",
>>                  pcszStage));
>>      return rc;
>> }
>>
>> Relies on the current behavior that a cancel while there are no waiters
>> results in the next wait succeeding (without reporting any events), so
>> I will mimick that behavior when moving this code over to linux
>> waitqueues.
> 
> Actually it seems to me that it relies on the fact that the 0ms time out
> is checked before the interruption.

No, it does a cancel while there are no waiters, then a VbglR3WaitEvent
which it expects to succeed, not timeout. Which matches the kernel code
which does an initial check for pSession->fPendingCancelWaitEvents
before checking the timeout.

The kernel code does:

1) check for pSession->fPendingCancelWaitEvents (and for pending events)
2) check for timeout 0, return VERR_TIMEOUT if timeout == 0
3) Alloc WaitList item + add to list
4) Recheck for pSession->fPendingCancelWaitEvents (and for pending events)
5) Wait
...

And the selftest relies on 1 and 2 basically.

> At some point we (I) used
> VbglR3InterruptEventWaits() in the seamless code, but I removed it when
> I cleaned up the code as it is not nice.  I realised while searching for
> uses of INTERRUPT_WAITEVENTS that I had forgotten to remove this pretty
> pointless test.  I plan to deprecate calling WAITEVENT in a session
> after INTERRUPT_WAITEVENTS has been called to make the matter moot and
> to change WAITEVENT to enforce this.
> 
> Which brings us back to the question of ABI stability.  If something
> like this were to pop up after you had released your code, I would
> expect to handle it by announcing that it is deprecated and changing the
> ABI later, say in the next major release.  Does that seem reasonable to
> you, or too short?

Since we're talking kernel ABI here now, you would need to align it
to the kernel cycle, not vbox major releases. I would expect a deprecation
like this to take at least a year, so put a note on the deprecation in:
linux/Documentation/ABI/obsolete/ioctl-vboxguest and add a deprecation date,
then once that date is past you can merge a patch upstream removing the
backward compat stuff / enforcing new behavior and that will then get
merged and released in the next kernel cycle, assuming there are no
objections.

Regards,

Hans

p.s.

I've finished converting the event WaitList to a standard Linux
wait_queue, the cleanup is not that big (yet) because I still need
to convert the HGCM WaitList also, after that all the WaitList
code can be removed which should be a nice cleanup.

If you want to check my work see:
https://github.com/jwrdegoede/vboxguest/commit/dce610f4b4af6b9ad14c466cca71d0501460b2cd

Note it is probably easier to use the diff to see where the changes are
and then just look at the new file, as I've basically rewritten a couple
of functions.



More information about the vbox-dev mailing list