VirtualBox

Opened 10 years ago

Last modified 8 years ago

#13135 new defect

File corruption with shared folders using VB 4.3.12

Reported by: kminder Owned by:
Component: shared folders Version: VirtualBox 4.3.12
Keywords: corruption Cc:
Guest type: Windows Host type: Mac OS X

Description

Virtual Box 4.3.12r93733 Host Mac OS X 10.8.5 Guest Windows Server Data Center 2012 R2 (4.3.12 Guest Additions Installed) I have a large ZIP file (837MB) in a directory on my host.

To reproduce: 1) Create a shared folder mapped to the directory containing the zip on the host 2) Copy the ZIP to a local guest directory (IMPORTANT) 3) Attempt to unzip the file using windows native tools, Java's jar.exe or 7zip. They will each fail with fairly meaningless error message. Windows Extract All... says: "Error 0x80004005: Unspecified error" Java's jar says: "java.util.zip.ZipException: invalid stored block lengths"

Note that step #2 is important. Unziping the file in place from the share seems to work so something is occurring when the file is copied. I've also confirmed that MD5 hash of the file does indeed change after it is copied from the share. I used this tool from Microsoft to compute the hash. http://support.microsoft.com/kb/841290/en-us

Attachments (1)

VBox.log (75.2 KB ) - added by kminder 10 years ago.
VBox.log for VM with corrupt copy

Download all attachments as: .zip

Change History (12)

comment:1 by kminder, 10 years ago

Also note that this is a four VM setup using Vagrant 1.6.3. I mention this because further investigation suggests this issue may have something to do with multiple VMs using the same host folder as a shared folder. A few additional observations.
1) The file corruption was very frequent but never 100% reproducible.
2) I was never able to reproduce the corruption if I destroyed 3 of the 4 VMs.
3) If I attempt simultaneously to copy the file in several of the VMs they would occasionally hang.

Last edited 10 years ago by kminder (previous) (diff)

comment:2 by sunlover, 10 years ago

How did you copy the file exactly? Using Windows Explorer or some other application?

Also please attach VBox.log of the VM after file corruption.

comment:3 by kminder, 10 years ago

The copy was originally done via vagrant a powershell provisioning script using the Copy-Item cmdlet. However, I also tried via Explorer, the copy command from the command line and I even tried the venerable old xcopy. I was able to reproduce the issue in each case also as I've said the issue unfortunately does not reproduce every time. I will reproduce and provide VBox.log today.

by kminder, 10 years ago

Attachment: VBox.log added

VBox.log for VM with corrupt copy

comment:4 by kminder, 10 years ago

Notice how the MD5 hash changes between the two copies on the windows guest VM.

On mac host

~/Projects/seclab2/windows-setup> md5 media/hdp-2.1.0.0-winpkg.zip
MD5 (media/hdp-2.1.0.0-winpkg.zip) = 8e010bf3d19dd271a3cc99cc369cb0d1

On win guest

PS C:\Users\Administrator> Copy-Item C:\media\hdp-2.1.0.0-winpkg.zip c:\
PS C:\Users\Administrator> Get-FileHash -Path C:\hdp-2.1.0.0-winpkg.zip -Algorithm MD5

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
MD5             E4322860EE40502016BB2C1299A5508C                                       C:\hdp-2.1.0.0-winpkg.zip


PS C:\Users\Administrator> Copy-Item C:\media\hdp-2.1.0.0-winpkg.zip c:\
PS C:\Users\Administrator> Get-FileHash -Path C:\hdp-2.1.0.0-winpkg.zip -Algorithm MD5

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
MD5             8E010BF3D19DD271A3CC99CC369CB0D1                                       C:\hdp-2.1.0.0-winpkg.zip

comment:5 by kminder, 10 years ago

I'm actually seeing general I/O issues with large files. In particular extracting large ZIP files. Specifically after getting a confirmed valid copy of the ZIP above via MD5 hashes to a non-shared directory the extract will frequently fail.

comment:6 by Frank Mehnert, 10 years ago

Could you do some more tests to find out a bit more about the file size when it fails? This would help greatly as 'large files' is a relative statement :)

comment:7 by sunlover, 10 years ago

kminder, it would be interesting to find out what triggers the corruption. You mentioned that it does not happen with 1 VM but happens with 4 VMs. Could you please test 1 VM but with a high host load (CPU and disk), for example zipping a lot of data on the host while coping a large file to the VM from a shared folder.

I've tried to reproduce the corruption on a Windows host with 2 VMs running but it did not happen.

comment:8 by harendra, 9 years ago

I am facing a data corruption when the data is accessed via shared folder mount. Not sure if it is the same as the one reported in this ticket, but I will describe it anyway here if that gives any clue as to what the problem might be.

VirtualBox Version 4.3.20 r96996 on Mac OS X Yosemite 10.10.4. Guest is Debian Linux 7.5, 3.2.0-4-amd64. The symptoms of the problem are:

1) A file which is shared from the Mac OS X via shared folders is served through the Apache Web server. The webserver (apache) shows completely different content for the file than what is inside the file. However, all regular commands on the guest like vi or less show correct content. cp also copies correct content to the destination file.

2) When the same file is copied to the local filesystem on the guest (a location which is not in the shared folder) and then the new copy is accessed via the webserver it shows accurate content which shows that the problem in not with the webserver.

3) When the virtualbox mount point was unmounted and mounted again the problem disappeared. But it starts showing up again after some time.

4) Strangely if I remove one of the lines in the file then the problem disappears. If I add the line back the problem reappears. So it is dependent on the content in the file. I guess it might have something to do with the cache, perhaps the content hash is matching with some wrong content and that content is being served instead. Though I do not know why would you use the content hash to retrieve a buffer.

The content of the file which was not being served correctly were:

[
[1438885800000,1.4],
[1439145000000,1.41],
[1439231400000,1.42],
[1439317800000,1.44],
[1439404200000,1.44],
[1439490600000,1.42],
[1439749800000,1.43],
[1439836200000,1.43],
[1439922600000,1.42],
[1440009000000,1.44],
[1440095400000,1.42],
[1440354600000,1.51]
]

When this file is accessed via the webserver it displayed the following contents:

[
[915129000000,1.83],
[915388200000,1.81],
[915474600000,1.79],
[915561000000,1.75],
[915647400000,1.71],
[915733800000,1.64],
[915993000000,1.65],
[916079400000,1.69],
[916165800000,1.69],
[916252200000,1.7],
[916338600000,1.74],
[916597800000,1.72],
[916684200000

Perhaps this is picked up from some other buffer in the cache which might be full of such content since I am mostly accessing this type of json content through the webserver.

However if I remove one of the lines from the file the webserver shows the file accurately.

Last edited 9 years ago by harendra (previous) (diff)

comment:9 by cjames, 8 years ago

We are having the exact same problem but with small files. These are small (a few KB) text files that we edit on the host system (Mac running Yosemite 10.10.5, VirtualBox 4.3.30) and are accessed by Apache on the guest system (Ubuntu 12.0.4). When a file is edited, Apache sees the new length but still gets the old contents; often the tail end of the file is binary junk.

One very surprising fact is that when this happens, restarting VirtualBox does NOT fix the problem: the incorrect file is still delivered. Rebooting the Mac fixes it until the next time you change the file.

A workaround is to make a copy of the file and replace the original with the copy, e.g. "cp foo bar; mv bar foo".

comment:10 by cjames, 8 years ago

UPDATED: See following note. This solution did NOT fix the problem.

We found the source of our problem. It's not VirtualBox exactly. It's an interaction between Google Chrome, Apache and the timezone of the guest operating system. WireShark helped me track this down.

To see if this is your problem, start up an "incognito" or "private" browser window that doesn't share any state or history with your browser, and reload the page that's giving trouble. If the page is now correct, then it's not the file that's corrupt.

In our case, the timezone of the guest Linux was set to UTC (GMT), and the Mac host system was PDT (GMT-8). Strangely, you could change the timezone on Linux ("export TZ=PDT"), and it would still show the GMT time, except claim it was PDT.

Thus, the file's modification time was off by eight hours. Ubuntu knew this, and Apache knew it, but it would report it to Chrome as GMT when in fact it was reporting PDT (or maybe vice versa?).

So when Chrome asked "get me this file, but only if it's newer than this timestamp", it gets back is "not modified", so it doesn't re-fetch the file. BUT ... I think this is a bug in Chrome, it seems to use part of the changed file and/or the changed file's length. So you end up with parts of the new file injected into the older version, or random binary bytes at the end of the file.

You can use a different browser, or wget, or curl, or (as noted above) an incognito browser, and you get the correct file, which clearly indicates that VirtualBox isn't the problem. Firefox and Safari don't seem to have this problem, but Chrome does.

I fixed it by using "dpkg-reconfigure tzdata" and setting the timezone to "America/Los_Angeles", then rebooting the virtual box. Problem gone.

Last edited 8 years ago by cjames (previous) (diff)

comment:11 by cjames, 8 years ago

OK I TAKE IT ALL BACK. Apache and timezones were NOT the problem. Actually, they were a problem: files were not delivered appropriately, as described above in my previous message.

But even with this fixed, VirtualBox sends out corrupt files via Apache. I have found similar complaints in a number of places, where user have mysterious and inexplicable differences in the files on the guest system versus the host system. This has been going on for years (I found reports at least four years old and as recent as two weeks ago), and apparently nobody cares.

This is becoming a showstopper for us. This bug is marked "critical" and has been open for 17 months with no response. It's probably time to look at other VM products.

Last edited 8 years ago by cjames (previous) (diff)
Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use