VirtualBox

Opened 14 months ago

Closed 4 weeks ago

#22339 closed defect (fixed)

pasted text on host doesn't include the last character, from wayland guest

Reported by: foofoofeefeebambam Owned by:
Component: clipboard Version: VirtualBox-7.1.6
Keywords: Cc:
Guest type: Linux Host type: Windows

Description (last modified by foofoofeefeebambam)

tl;dr: the attached patch ensures the null terminator of utf16 strings is part of the returned size, else pasted clipboard on host (even though copied correctly from a wayland guest) doesn't include the last character in the text.

Long read(no need): I've Arch Linux kde plasma wayland running as guest OS, and Windows 11 as host OS. VBoxClient --wayland running on guest OS (because VBoxClient --clipboard can't paste on host (wayland lacks support? as per https://gitlab.freedesktop.org/spice/linux/vd_agent/-/issues/26 ) )

I'd note that archlinux does apply some patches on virtualbox&co which I haven't really checked: https://gitlab.archlinux.org/archlinux/packaging/packages/virtualbox/-/tree/7.1.6-2?ref_type=tags

But I don't believe they affect this bug.

I've a patch for 2 places (didn't really look for more) which don't properly return the size of the buffer they made (it's off by one, in the less direction). I've tested it to work: the pasted text includes the last character, ie. I copy "sometext" in guest OS, then paste it on host OS as "sometext" instead of "sometex", that is, when pasting to host actually does paste anything at all(another bug report about this not pasting on host incoming later, EDIT: ok here: https://www.virtualbox.org//ticket/22340 ).

I've tested this with gentoo host too, but it was an older version of virtualbox, it works the same (in bugged and patched versions, regarding this issue) but it hangs more and is unreliable possibly due to being too old and not same version as the guest OS's virtualbox things.

To my limited understanding, what's happening here with this patch is that text copied on guest OS from Konsole terminal(as latin1) or from mousepad(as utf8), gets transformed (from native) into vbox format which is apparently utf16 plus zero terminator (which is 1 utf16 zero char, which is two zero bytes) (note that the zero terminator is added to the utf16 string, it didn't exist in the native(latin1 or utf8) string), but without the patch, the zero terminator wasn't counted even though it was put in there, so somehow something(ahem, in func. VbglR3ClipboardWriteDataEx, these 2 lines: Msg.Parms.pData.SetPtr(pvData, cbData/*+1*/); and rc = VbglR3HGCMCall(&Msg.Hdr, sizeof(Msg));) took the clipboard as being 1 character less thus pasting it without the last character(so pasting "something" would be pasted as "somethin", without the "g"). There may or may not be other places where this off-by-one error is present, I haven't really looked, please check?. Also, I look with zero understanding of what's going on, had to guess/debug what's going on, so maybe, MAYBE, I'm wrong about all this, even though it seems obvious now in retrospect what's going on, in this issue. Actual devs should double check me :) I'm not a dev, btw, but was happy to waste 1 day on this.

Note that for this copy/pasting to work at all, there's the requirement that on kde plasma wayland the systray Clipboard must be showing (definitely not Disabled) - this can be changed in Configure System Tray menu that appears if you RMB on the free space or up arrow of systray, then under Entries, Clipboard. (I haven't tried if 'Always hidden'/'Show when relevant' works, but 'Always showing' does work and Disabled does not.)

Attachments (2)

offby1in2places.patch (910 bytes ) - added by foofoofeefeebambam 14 months ago.
patch that includes the utf16 null terminator in size computation when returning
offby1in2places.2.patch (1.4 KB ) - added by changedemailandwasforcedtosetnewnick 14 months ago.
this time adding +1 in the caller

Download all attachments as: .zip

Change History (10)

by foofoofeefeebambam, 14 months ago

Attachment: offby1in2places.patch added

patch that includes the utf16 null terminator in size computation when returning

comment:1 by foofoofeefeebambam, 14 months ago

Description: modified (diff)

comment:2 by changedemailandwasforcedtosetnewnick, 14 months ago

Actually looking at the header file, it says:

/**
 * Converts an UTF-16 string with LF EOL to an UTF-16 string with CRLF EOL.
 *
 * Convenience function which returns the allocated + converted string on success.
 *
 * @returns VBox status code.
 * @param   pcwszSrc            UTF-16 string to convert.
 * @param   cwcSrc              Size of the string int RTUTF16 units.
 * @param   ppwszDst            Where to return the allocated converted string. Must be free'd by the caller.
 * @param   pcwDst              Where to return the size of the converted string in RTUTF16 units.
 *                              Does not include the terminator.
 */
int ShClConvUtf16LFToCRLFA(PCRTUTF16 pcwszSrc, size_t cwcSrc, PRTUTF16 *ppwszDst, size_t *pcwDst);

therefore the function was correct as it was, so then in the caller it should be where the +1 to the size should be added to include the terminator. The caller is ShClConvUtf8LFToUtf16CRLF however it too says:

/**
 * Converts an UTF-8 string with LF EOL into UTF-16 CRLF.
 *
 * @returns VBox status code.
 * @param  pcszSrc              UTF-8 string to convert.
 * @param  cbSrc                Size of UTF-8 string to convert (in bytes), not counting the terminating zero.
 * @param  ppwszDst             Where to return the allocated buffer on success.
 * @param  pcwDst               Where to return the size (in RTUTF16 units) of the allocated buffer on success.
 *                              Does not include terminator.
 */
int ShClConvUtf8LFToUtf16CRLF(const char *pcszSrc, size_t cbSrc, PRTUTF16 *ppwszDst, size_t *pcwDst);

so we go to the caller of that which is:

/**
 * A helper function that converts UTF-8 string into UTF-16.
 *
 * @returns IPRT status code.
 * @param   pvBufIn         Input buffer which contains UTF-8 data.
 * @param   cbBufIn         Size of input buffer in bytes.
 * @param   ppvBufOut       Newly allocated output buffer which will contain UTF-16 data (must be freed by caller).
 * @param   pcbBufOut       Size of output buffer.
 */
static DECLCALLBACK(int) vbConvertUtf8ToUtf16(void *pvBufIn, int cbBufIn, void **ppvBufOut, size_t *pcbBufOut)
{
    int rc;
    size_t cwDst;
    void  *pvDst = NULL;

    rc = RTStrValidateEncodingEx((char *)pvBufIn, cbBufIn, 0);
    if (RT_SUCCESS(rc))
    {
        rc = ShClConvUtf8LFToUtf16CRLF((const char *)pvBufIn, cbBufIn, (PRTUTF16 *)&pvDst, &cwDst);
        if (RT_SUCCESS(rc))
        {
            *ppvBufOut = pvDst;
            *pcbBufOut = cwDst * sizeof(RTUTF16);
        }
        else
            LogRel(("Data Converter: unable to convert input UTF8 string into VBox format, rc=%Rrc\n", rc));
    }
    else
        LogRel(("Data Converter: unable to validate input UTF8 string, rc=%Rrc\n", rc));

    return rc;
}

which doesn't say it doesn't include the terminator hmmm... so either in this one we include it, or, in its caller which is this:

RTDECL(int) VBoxMimeConvNativeToVBox(const char *pcszMimeType, void *pvBufIn, int cbBufIn, void **ppvBufOut, size_t *pcbBufOut)
{
    for (unsigned i = 0; i < RT_ELEMENTS(g_aConverterFormats); i++)
        if (RTStrNCmp(g_aConverterFormats[i].pcszMimeType, pcszMimeType, VBOX_WAYLAND_MIME_TYPE_NAME_MAX) == 0)
            return g_aConverterFormats[i].pfnConvertToVbox(pvBufIn, cbBufIn, ppvBufOut, pcbBufOut);

    return VERR_NOT_FOUND;
}

So ignore my patch, it's clearly not good, the +1 must be somewhere up the caller tree. It occurred to me only now to look in the .h for a comment header for the functions.

comment:3 by changedemailandwasforcedtosetnewnick, 14 months ago

in file VirtualBox-7.1.6/include/VBox/GuestHost/clipboard-helper.h, there's a comment header for the function:

/**
 * Converts a Latin-1 string with LF EOL into UTF-16 CRLF.
 *
 * @returns VBox status code.
 * @param  pcszSrc              UTF-8 string to convert.
 * @param  cbSrc                Size of string (in bytes), not counting the terminating zero.
 * @param  ppwszDst             Where to return the allocated buffer on success.
 * @param  pcwDst               Where to return the size (in RTUTF16 units) of the allocated buffer on success.
 *                              Does not include terminator.
 */
int ShClConvLatin1LFToUtf16CRLF(const char *pcszSrc, size_t cbSrc, PRTUTF16 *ppwszDst, size_t *pcwDst);

however, there's one in the clipboard-common.cpp too:

/**
 * Converts a Latin-1 string with LF line endings into an UTF-16 string with CRLF endings.
 *
 * @returns VBox status code.
 * @param   pcszSrc             Latin-1 string to convert.
 * @param   cbSrc               Size (in bytes) of Latin-1 string to convert.
 * @param   ppwszDst            Where to return the converted UTF-16 string on success.
 * @param   pcwDst              Where to return the length (in UTF-16 characters) on success.
 *
 * @note    Only converts the source until the string terminator is found (or length limit is hit).
 */
int ShClConvLatin1LFToUtf16CRLF(const char *pcszSrc, size_t cbSrc,
                                PRTUTF16 *ppwszDst, size_t *pcwDst)
{

Now, I bet this is why I didn't even bother to look at the .h for these functions(in my patch) in the first place, I didn't think there's be anything new there, yet here's 2 docs for the same function but only the doc in the .h file mentions that the returned size doesn't include the terminator.

So the patch is bad for this second function too.

No problem, I'll find the relevant caller(vbConvertLatin1ToUtf16) to +1 ... and sorry for the mistakes, obviously! Will try to update the patch next.

Last edited 14 months ago by changedemailandwasforcedtosetnewnick (previous) (diff)

comment:4 by changedemailandwasforcedtosetnewnick, 14 months ago

I've just spotted the x11 variant in VirtualBox-7.1.6/src/VBox/GuestHost/SharedClipboard/clipboard-x11.cpp that already does this +1 in the same (caller) space as:

                /* If we are given broken UTF-8, we treat it as Latin1. */ /** @todo BUGBUG Is this acceptable? */
                if (RT_SUCCESS(RTStrValidateEncodingEx((char *)pvSrc, cbSrc, 0)))
                    rc = ShClConvUtf8LFToUtf16CRLF((const char *)pvSrc, cbSrc,
                                                   (PRTUTF16 *)&pvDst, &cwDst);
                else
                    rc = ShClConvLatin1LFToUtf16CRLF((char *)pvSrc, cbSrc,
                                                     (PRTUTF16 *)&pvDst, &cwDst);
                if (RT_SUCCESS(rc))
                {
                    cwDst += 1                        /* Include terminator */;
                    cbDst  = cwDst * sizeof(RTUTF16); /* Convert RTUTF16 units to bytes. */

                    LogFlowFunc(("UTF-16 text (%zu bytes):\n%ls\n", cbDst, pvDst));
                }
                break;

so I must be on the right track then.

by changedemailandwasforcedtosetnewnick, 14 months ago

Attachment: offby1in2places.2.patch added

this time adding +1 in the caller

comment:5 by changedemailandwasforcedtosetnewnick, 14 months ago

I've tested this second patch to work from Konsole and from mousepad, ie. a Copy on guest OS(arch linux), and a paste in host OS(win11). (But must do it in under 1 second to work(to will Paste) due to bug: https://www.virtualbox.org//ticket/22340 )

Note that I see https://www.virtualbox.org/browser/vbox/trunk?rev=108453 is the latest version which was 13 days ago, so I don't know if vbox devs are on vacation or something, but enjoy if you are. Or they're busy, or didn't sync the source to here. Maybe syncs happen only every 2 weeks or on new releases, I don't know. I guess what I mean here is that perhaps they've already fixed this offline but didn't get the chance to sync the repo (since I see 'vboxsync' is the author for every commit there) so we can't see if it was or wasn't fixed already, and thus this fix comes a little late.

Last edited 14 months ago by changedemailandwasforcedtosetnewnick (previous) (diff)

comment:6 by changedemailandwasforcedtosetnewnick, 13 months ago

I just looked again at patch 2 and it seemed like I did it in the wrong (too deep still) caller, however I noticed then that other functions already are doing this (in the same file too), like:

175     /**
176      * A helper function that converts HTML data into internal VBox representation (UTF-8).
177      *
178      * @returns IPRT status code.
179      * @param   pvBufIn         Input buffer which contains HTML data.
180      * @param   cbBufIn         Size of input buffer in bytes.
181      * @param   ppvBufOut       Newly allocated output buffer which will contain UTF-8 data (must be freed by caller).
182      * @param   pcbBufOut       Size of output buffer.
183      */
184     static DECLCALLBACK(int) vbConvertHtmlToVBox(void *pvBufIn, int cbBufIn, void **ppvBufOut, size_t *pcbBufOut)
185     {
186         int rc = VERR_PARSE_ERROR;
187         void *pvDst = NULL;
188         size_t cbDst = 0;
189     
190         /* From X11 counterpart: */
191     
192         /*
193          * The common VBox HTML encoding will be - UTF-8
194          * because it more general for HTML formats then UTF-16
195          * X11 clipboard returns UTF-16, so before sending it we should
196          * convert it to UTF-8.
197          *
198          * Some applications sends data in UTF-16, some in UTF-8,
199          * without indication it in MIME.
200          *
201          * In case of UTF-16, at least [Open|Libre] Office adds an byte order mark (0xfeff)
202          * at the start of the clipboard data.
203          */
204     
205         if (   cbBufIn >= (int)sizeof(RTUTF16)
206             && *(PRTUTF16)pvBufIn == VBOX_SHCL_UTF16LEMARKER)
207         {
208             /* Input buffer is expected to be UTF-16 encoded. */
209             LogRel(("Data Converter: unable to convert UTF-16 encoded HTML data into VBox format\n"));
210     
211             rc = RTUtf16ValidateEncodingEx((PCRTUTF16)pvBufIn, cbBufIn / sizeof(RTUTF16),
212                                            RTSTR_VALIDATE_ENCODING_ZERO_TERMINATED | RTSTR_VALIDATE_ENCODING_EXACT_LENGTH);
213             if (RT_SUCCESS(rc))
214             {
215                 rc = ShClConvUtf16ToUtf8HTML((PRTUTF16)pvBufIn, cbBufIn / sizeof(RTUTF16), (char**)&pvDst, &cbDst);
216                 if (RT_SUCCESS(rc))
217                 {
218                     *ppvBufOut = pvDst;
219                     *pcbBufOut = cbDst;
220                 }
221                 else
222                     LogRel(("Data Converter: unable to convert input UTF16 string into VBox format, rc=%Rrc\n", rc));
223             }
224             else
225                 LogRel(("Data Converter: unable to validate input UTF8 string, rc=%Rrc\n", rc));
226         }
227         else
228         {
229             LogRel(("Data Converter: converting UTF-8 encoded HTML data into VBox format\n"));
230     
231             /* Input buffer is expected to be UTF-8 encoded. */
232             rc = RTStrValidateEncodingEx((char *)pvBufIn, cbBufIn, 0);
233             if (RT_SUCCESS(rc))
234             {
235                 pvDst = RTMemAllocZ(cbBufIn + 1 /* '\0' */);
236                 if (pvDst)
237                 {
238                     memcpy(pvDst, pvBufIn, cbBufIn);
239                     *ppvBufOut = pvDst;
240                     *pcbBufOut = cbBufIn + 1 /* '\0' */;
241                 }
242                 else
243                     rc = VERR_NO_MEMORY;
244             }
245         }
246     
247         return rc;
248     }

and its comment says it's for \0 ... but if this isn't the right place to put it, you may put it in caller(s).

comment:7 by galitsyn, 4 weeks ago

Hello,

We just released VirtualBox 7.2.8. This issue should be fixed in this version. You can download it from https://www.virtualbox.org/wiki/Downloads. Thank you for reporting the issue.

comment:8 by galitsyn, 4 weeks ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette