[vbox-dev] vgdrvIoCtl_CancelAllWaitEvents inconsistent behavior ?

Wed Jul 5 09:18:06 GMT 2017

Hi,

On 04-07-17 16:57, Michael Thayer wrote:
> Hello Hans,
> 
> 04.07.2017 14:24, Hans de Goede wrote:
>> Hi,
>>
>> On 04-07-17 11:46, Michael Thayer wrote:
>>> Hello Hans,
>>>
>>> 04.07.2017 08:55, Hans de Goede wrote:
>>> [Discussion of inconsistent wait event cancelling code in VBoxGuest.]
>>>> So some remarks about this, I was thinking that *maybe* the old
>>>> behavior is on purpose, imagine the following:
>>>>
>>>> 1) cancel, no waiters, pSession->fPendingCancelWaitEvents get set
>>>> 2) the thread for which the cancel was intended stops, without ever
>>>>      calling into vgdrvIoCtl_WaitEvent again and so does not clear
>>>>      the fPendingCancelWaitEvents flag
>>>> 3) (much) later a new thread in the same session starts waiting
>>>>
>>>> old behavior: new thread sees an empty event, goes back to
>>>> waiting
>>>>
>>>> new behavior: new thread sees a VERR_INTERRUPTED -> and stops
>>>> ? (note I've not looked at the userspace code).
>>>
>>> I think you are reading too much intelligent design into the interface.
>>> I was the guilty person who created it, long ago, and I don't think I
>>> thought it through sufficiently well at the time.  That said, I don't
>>> think I wrote the current implementation code.  I looked at the
>>> user-space code I mentioned again, and actually realised that it does
>>> handle the old (and the new) code.  Testing seems to confirm this.
>>
>> Ok, so you're saying that the best thing todo for a rewrite is for
>> all waiters to always exit with VBOXGUEST_WAITEVENT_INTERRUPTED ? On
>> vgdrvIoCtl_CancelAllWaitEvents  ?
>>
>> Is the fPendingCancelWaitEvents flag really still necessary if there
>> are no waiters ? And won't this cause issues for new waiters? Or will
>> new waiters always have a new session ?
> [...]
> 
> We talked about this a bit over lunch and decided that CANCEL_WAITEVENT
> is too clumsy an interface to be used for anything other than session
> termination.  The caller should set a flag first, and no further calls
> to WAITEVENT should be be made after that.  That makes the problems you
> brought up less relevant, and also means that you could in theory make
> the changes you said and still have things work.

I've been looking at the userspace code myself and this bit:

src/VBox/Additions/x11/VBoxClient/seamless.cpp:

int SeamlessMain::selfTest()
{
     int rc = VERR_INTERNAL_ERROR;
     const char *pcszStage;

     LogRelFlowFunc(("\n"));
     do {
         pcszStage = "Testing event loop cancellation";
         VbglR3InterruptEventWaits();
         if (RT_FAILURE(VbglR3WaitEvent(VMMDEV_EVENT_VALID_EVENT_MASK, 0, NULL)))
             break;
         if (   VbglR3WaitEvent(VMMDEV_EVENT_VALID_EVENT_MASK, 0, NULL)
             != VERR_TIMEOUT)
             break;
         rc = VINF_SUCCESS;
     } while(0);
     if (RT_FAILURE(rc))
         LogRel(("VBoxClient (seamless): self test failed.  Stage: \"%s\"\n",
                 pcszStage));
     return rc;
}

Relies on the current behavior that a cancel while there are no waiters
results in the next wait succeeding (without reporting any events), so
I will mimick that behavior when moving this code over to linux
waitqueues.

> Of course, in practice there is always the risk of accidental
> dependencies on current behaviour in all parts of the kernel-user
> interface, but in the end it was your choice to try to duplicate an
> interface created by several people independently, often under time
> pressure, which has evolved more or less in parallel to the matching
> user-space and host parts.  I would still recommend using our code as it
> is in Fedora without submitting it upstream, and investing any
> additional time you have for the task in helping improve our code.

I believe that there is value for Fedora, VirtualBox and our users in
having this code upstream. The current code really is not suitable for
upstream. I already have an idea of what the cleaned up code will look
like and I think when you see it you will agree that it is an improvement
over the current vboxguest kernel module code. The downside is that this
improvement will come at the cost of it being Linux specific.

 > 04.07.2017 16:57, Michael Thayer wrote:
 > Maybe I should have written that part in private mail.  If so, very sorry!

No problem / no worries, I'm fine with having this discussion in the open.

Regards,

Hans