VirtualBox

Ticket #17716 (closed defect: invalid)

Opened 3 years ago

Last modified 3 years ago

VBox stuck in write loop with 5.2.10, corruption -> Arch Linux issue

Reported by: xnoreq Owned by:
Component: shared folders Version: VirtualBox 5.2.10
Keywords: Cc:
Guest type: Linux Host type: Windows

Description

Host: Windows 10, VBox Version 5.2.10 r122406 (Qt5.6.2)

Guest: Arch Linux 4.16.4-1-ARCH Installed package: virtualbox-guest-modules-arch 5.2.10-3  https://www.archlinux.org/packages/community/x86_64/virtualbox-guest-modules-arch/

In the guest I have a shared folder, fstab: shared /mnt/shared vboxsf uid=1000,gid=1000,rw,dmode=700,fmode=600,comment=systemd.automount

Now if I run qbittorrent (also installed from Arch repos) in the guest and e.g. download Arch (see magnet link at  https://www.archlinux.org/download/) then as soon as it starts to download the iso:

1) VirtualBox.exe CPU usage on the host increases dramatically 2) VirtualBox.exe shows heavy I/O write (20x - 30x than the download rate within the guest) 3) qbittorrent in the guest shows high CPU usage

Analyzing the writes on the host:

20:55:36,3166640	VirtualBox.exe	1840	CreateFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Disallow Exclusive, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
20:55:36,3170140	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 4.096, Length: 4.096
20:55:36,3171295	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 8.192, Length: 4.096
20:55:36,3172539	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 12.288, Length: 4.096
...
the file is being allocated
...
20:55:42,1340471	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 284.164.096, Length: 4.096
20:55:42,1341867	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 284.168.192, Length: 4.096
20:55:42,1342749	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 284.688.384, Length: 4.096
20:55:42,1343604	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 285.212.672, Length: 4.096
20:55:42,1344587	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 285.736.960, Length: 4.096
20:55:42,1345399	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 286.261.248, Length: 4.096
20:55:42,1346192	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 286.785.536, Length: 4.096
20:55:42,1347007	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 287.309.824, Length: 4.096
20:55:42,1347766	VirtualBox.exe	1840	ReadFile	C:\shared\archlinux-2018.04.01-x86_64.iso	END OF FILE	Offset: 287.834.112, Length: 4.096
20:55:45,6189281	VirtualBox.exe	1840	QueryOpen	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, AllocationSize: 284.168.192, EndOfFile: 284.168.192, FileAttributes: A
20:55:45,6190843	VirtualBox.exe	1840	QueryOpen	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, AllocationSize: 284.168.192, EndOfFile: 284.168.192, FileAttributes: A
20:55:45,6191496	VirtualBox.exe	1840	CreateFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Desired Access: Generic Read/Write, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Disallow Exclusive, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Opened
20:55:45,6192084	VirtualBox.exe	1840	QueryInformationVolume	C:\shared\archlinux-2018.04.01-x86_64.iso	BUFFER OVERFLOW	VolumeCreationTime: xxx, VolumeSerialNumber: xxx, SupportsObjects: True, VolumeLabel: xxx
20:55:45,6192206	VirtualBox.exe	1840	QueryAllInformationFile	C:\shared\archlinux-2018.04.01-x86_64.iso	BUFFER OVERFLOW	CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, FileAttributes: A, AllocationSize: 284.168.192, EndOfFile: 284.168.192, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x3400000001536f, EaSize: 0, Access: Generic Read/Write, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Word
20:55:45,6193266	VirtualBox.exe	1840	CloseFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	
20:55:45,6194764	VirtualBox.exe	1840	QueryOpen	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, AllocationSize: 284.168.192, EndOfFile: 284.168.192, FileAttributes: A
20:55:45,6196035	VirtualBox.exe	1840	WriteFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 1.572.864, Length: 4.096, Priority: Normal
20:55:45,6197488	VirtualBox.exe	1840	WriteFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 1.572.864, Length: 4.096
20:55:45,6198623	VirtualBox.exe	1840	WriteFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 1.572.864, Length: 4.096
20:55:45,6199472	VirtualBox.exe	1840	WriteFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 1.572.864, Length: 4.096
20:55:45,6200234	VirtualBox.exe	1840	WriteFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 1.572.864, Length: 4.096
...
this keeps on going and never stops
...
20:56:09,7814536	VirtualBox.exe	1840	WriteFile	C:\shared\archlinux-2018.04.01-x86_64.iso	SUCCESS	Offset: 1.572.864, Length: 4.096

Naturally, the resulting file is corrupted.

Change History

comment:1 Changed 3 years ago by socratis

Just FYI, VirtualBox's shared folders present a very simplified file system implementation, just enough to read/write files from/to the guest. Many applications can error when using shared folders, because they expect advanced features, like file locking or access controls, which don't exist for shared folders.

Like a torrent client that needs to keep the file(s) open at all times and update specific chunks. Maybe the load is too much and it can't take the I/O.

I would use a a true network share (SaMBa, NFS). Shared folders were never designed to be anything more that a simple copy mechanism, AFAIK...

comment:2 Changed 3 years ago by xnoreq

No, I really don't think that the load is too much. The guest application and host system are the *constants* here - it worked with every version of VirtualBox before in much heavier load scenarios without problems but with 5.2.10 it breaks even in the simplest case.

comment:3 Changed 3 years ago by socratis

Does it work if you downgrade? Can you try different combinations?

  • 5.2.8 VirtulBox with 5.2.10 GAs
  • 5.2.8 VirtulBox with 5.2.8 GAs
  • 5.2.10 VirtulBox with 5.2.8 GAs

comment:4 Changed 3 years ago by xnoreq

As soon as I upgrade the guest modules to 5.2.10 it breaks, regardless of the host version (5.2.8, 5.2.10, I had even tried 5.2.11 builds).

As soon as I downgrade the guest to 5.2.8 it starts to work again as it did with the previous versions, that is every released versions since 5.1.34 I believe.

Maybe it is a change in the 5.2.10 vboxsf implementation that confuses the application. That would explain the high CPU usage of the guest process.

I will try to strace the guest process to see what the kernel (module) does differently in terms of I/O.

comment:5 Changed 3 years ago by socratis

Excellent analysis! That will give the developers a much narrower focus on the issue. Kudos!

comment:6 Changed 3 years ago by xnoreq

Still broken in 5.2.12.

comment:7 Changed 3 years ago by socratis

My first comment still stands; you shouldn't be using VirtualBox shared folders for this type of activity. There's no guarantee that it will work, or that it will keep on working.

Maybe they changed something that was breaking something more fundamental than your case, and it broke your case. As I said, no guarantees were ever made (or implied) that this would ever work. It's a small miracle that it did.

My advice? Switch to normal networked folders and be done with it. If it gets fixed in VirtualBox (who knows), great. If it doesn't, you still have your rock-solid solution and you don't care in any event...

comment:8 Changed 3 years ago by xnoreq

you shouldn't be using VirtualBox shared folders for this type of activity

Having an application reading and writing to a shared folder?! You're essentially saying that shared folders should not be used to share data.

There's no guarantee that it will work, or that it will keep on working.

I don't understand this response at all. You're again essentially saying that users cannot expect features of VirtualBox, that are also present in other virtualization solutions, to work or not to break randomly.

That makes it a horrible product which forces users to switch to a different solution .. fine. It's just a shame because shared folders worked fine for years before 5.2.10.

My advice? Switch to normal networked folders

Been there, done that. The result is abysmal I/O performance and applications failing in very interesting ways. So not an option.

comment:9 Changed 3 years ago by socratis

On the contrary, shared folders are meant as an easy way to share data. But that's it. A open database residing in a shared folder is not shared data. I said (and I'm sticking by my statement) that shared folders are good enough to copy data. Period. Anything more than that, that's asking for advanced filesystem features, might fail.

The features that are there, are expected to work. But work as expected. If for application XYZ it works, but that advanced feature gets changed in the future, a feature that was never explicitly promised to work, it might fail. A copy will always work.

If your network folders are failing when using your application, chances are that it's the application that's written without shared folders in mind. Expect more failures if using VirtualBox shared folders. If true network folders is not an option, you got to change your game plan.

comment:10 Changed 3 years ago by xnoreq

If your network folders are failing when using your application, chances are that it's the application that's written without shared folders in mind.

There is no such thing as "shared folders" in Linux. The shared folder feature of VirtualBox is implemented as a VFS which is transparent to the applications. That's the whole point of VFSs... and properly implemented, it supports a wide variety of different features which may or may not be supported by any filesystem.

So it's broken. I don't see how this attempt at making excuses for this breakage helps. It just adds more noise to the ticket.

comment:11 Changed 3 years ago by socratis

and properly implemented

That's the part you don't want to understand. The implementation part doesn't fit your needs, therefore it's a bug? Not really. If FAT32 doesn't fit your needs to transfer multi-GB files with your stick, it's not a bug. That's how it's supposed to work.

it supports a wide variety of different features which may or may not be supported by any filesystem.

Not the VirtualBox ones. They support a very specific set of features. Copying a file. End of story.

I don't see how this attempt at making excuses for this breakage helps

You do realize that if it something states "Not to be used for any purpose other than ..." then it's not broken, but that's the specification is was built with, right? Huge difference.

It just adds more noise to the ticket.

Not really. It merely tries to make you understand why this ticket shouldn't exist in the first place, why this ticket should be closed as "Invalid".

In all fairness, it could very well turn out to be a bug. What I'm simply trying to get to people that use shared folders with weird, corner cases is "Don't use shared folders with weird, corner cases".

comment:12 Changed 3 years ago by xnoreq

That's the part you don't want to understand.

That shared folders after 5.2.10 isn't properly implemented? That's the reason for the ticket, duh.

The implementation part doesn't fit your needs, therefore it's a bug?

You don't know what you're talking about. If a filesystem doesn't support a feature then it is typically not a problem for applications because then the filesystem doesn't advertise the feature.

Not the VirtualBox ones. They support a very specific set of features. Copying a file. End of story.

Again, you don't know what you're talking about. VFS has no concept of "copying a file". The only relevant operations here are basic file operations, such as open(2), read(2) or write(2) ...

that's the specification is was built with, right?

What specification?! You again are just making stuff up you have non clue about.

It merely tries to make you understand why this ticket shouldn't exist in the first place, why this ticket should be closed as "Invalid".

If anything shouldn't exist, then it is the noise you've added. Please stop it.

In all fairness, it could very well turn out to be a bug.

... but let's rather spam a ticket with excuses and noise and derail it instead of getting it fixed?!

This is outrageous behavior. Are you working for Oracle?

I'm simply trying to get to people that use shared folders with weird, corner cases

What file operations that I'm using are weird corner cases?

Please answer me this.

Last edited 3 years ago by xnoreq (previous) (diff)

comment:13 Changed 3 years ago by socratis

I'll skip the "duh" and the rest of the tone/insults/misunderstandings, until you've graduated from high school, it's not fair.

But, I'll ask you this: Do you have a problem copying files from/to a shared folder? Then it would be a problem. For anything else you'll need to adjust your expectation-meter.

I tried to explain a couple of things, you seem to be not wanting to hear. I'm outta here... Buona fortuna!


PS. A couple of questions unrelated to the ticket:

  1. No, I'm not Oracle, why you have a support contract? Or am I only allowed to speak if I'm Oracle? You have heard of the concept of "open source" and "user supported", right?
  1. You do realize that everybody can comment on a ticket right? This isn't exactly personalized support, you haven't paid for that, you can't afford that. If you could you wouldn't be here...

comment:14 Changed 3 years ago by xnoreq

I hope you finally understood that you really shouldn't be commenting on tickets if you have no clue what you're talking about.

Evading my questions seals the deal. Thank you & goodbye.

comment:15 Changed 3 years ago by xnoreq

Now, after all this noise, back to the bug:

Pre 5.2.10:

openat(AT_FDCWD, "/mnt/shared/archlinux-2018.05.01-x86_64.iso", O_RDWR|O_CREAT|O_NOATIME, 0666) = 59
pwritev(59, [{iov_base="\3148"..., iov_len=16384}], 32, 412090368) = 524288 <2.838525>
pwritev(59, [{iov_base="i\331"..., iov_len=16384}], 16, 487063552) = 262144 <0.422510>

These are the first two writes, and I have confirmed that the bytes written at those offsets match the bytes that the host wrote to the file at those offset.

The operations take some time, but they finish.

Contrast this with 5.2.10 and 5.2.12:

openat(AT_FDCWD, "/mnt/shared/archlinux-2018.05.01-x86_64.iso", O_RDWR|O_CREAT|O_NOATIME, 0666) = 67
pwritev(67, [{iov_base="a\236"..., iov_len=16384}], 32, 331874304 <unfinished ...>
pwritev(67, [{iov_base="\377\331"..., iov_len=16384}], 28, 250609664 <unfinished ...>
pwritev(67, [{iov_base="qH"..., iov_len=16384}], 18, 455081984 <unfinished ...>

Even the first pwritev doesn't finish.

At the beginning of the first write offset (331874304) the data in the file matches the pwritev data.

After 4096 bytes the file ends however (should be 16k) and the bytes stop matching at some point within those 4k.

So it seems that the host gets stuck in what appears to be a loop writing to the file somewhere around this offset.

This is a very serious bug causing data loss and/or corruption.

Last edited 3 years ago by xnoreq (previous) (diff)

comment:16 Changed 3 years ago by xnoreq

Test program to reproduce the issue:

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#define BUFLEN 16384
#define NUMBUFS 32
#define OFFSET 10000

int main(int argc, char** argv) {

    struct iovec iov[NUMBUFS];

    for (int i = 0; i < NUMBUFS; i++) {
        uint8_t* buf = (uint8_t*)malloc(BUFLEN);
        iov[i].iov_base = buf;
        iov[i].iov_len = BUFLEN;

        for (int j = 0; j < BUFLEN; j++) {
            buf[j] = (uint8_t)(i%26 + 'A');
        }

    }

    int fd = open("/mnt/shared/test.txt", O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);


    int count = pwritev(fd, iov, NUMBUFS, OFFSET);

    printf("wrote: %d\n", count);
    return 0;
}

fstab entry:

shared  /mnt/shared     vboxsf  uid=1000,gid=1000,rw,dmode=700,fmode=600 0 0
Version 0, edited 3 years ago by xnoreq (next)

comment:17 Changed 3 years ago by xnoreq

Four months and two kernel versions later, the issue still hasn't been fixed.

Archlinux bug:  https://bugs.archlinux.org/task/58583

Redhat bug:  https://bugzilla.redhat.com/show_bug.cgi?id=1481630#c80

comment:18 Changed 3 years ago by xnoreq

Eight months and four kernel versions later ...

comment:19 Changed 3 years ago by xnoreq

This hang/loop/corruption issue with shared folders still exists in linux 4.20.3 with virtualbox 6.0.2.

comment:20 Changed 3 years ago by hansg

Ok, so I've finally gotten around to debugging this, sorry for taking so long and thank you for the reproducer.

This only happens when using my cleaned-up standalone version of vboxsf, which is intended for merging upstream from:  https://github.com/jwrdegoede/vboxsf/

So it seems that you are seeing this problem because the virtualbox-guest-modules-arch is using my version of vboxsf starting with the troublesome version. Therefor I believe that this ticket can be closed as this is not an upstream virtualbox bug.

During the refactoring / cleanup of the code to prepare it for merging into the mainline kernel I messed up the sf_write_end function return's value. Unlike the other mmap handling functions it is supposed to return the number of bytes written on success or 0 on error, instead of 0 on success and negative errno on error (which is documented nowhere). With that fixed your reproducer works as expected.

This is fixed in my vboxsf version with this commit:  https://github.com/jwrdegoede/vboxsf/commit/6738af37c935f3d9b0db138678c2cd3d8bc1fc99

comment:21 Changed 3 years ago by michael

  • Status changed from new to closed
  • Resolution set to invalid
  • Summary changed from VBox stuck in write loop with 5.2.10, corruption to VBox stuck in write loop with 5.2.10, corruption -> Arch Linux issue
Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use