VirtualBox

Opened 14 years ago

Closed 14 years ago

#6124 closed defect (fixed)

corruption reading from cdrom with multiple cpus

Reported by: Tom Gonzalez Owned by:
Component: guest smp Version: VirtualBox 3.1.2
Keywords: corruption Cc:
Guest type: Solaris Host type: Windows

Description

My Solaris guest reports a corrupted file when reading from the cdrom when I have more than 1 cpu core assigned to the guest.

Details:

VirtualBox 3.1.2

Host: Windows 7 Home Premium x64, Intel i7

Guest: Solaris 10 5/08 no additional patches in 64-bit mode, host cdrom active

I am trying to copy or just extract the 10_x86_Recommended.tar.bz2 patch cluster from a EIS dvdrom. It's about an 800MB file.

# cd /var/tmp

# bunzip2 -c /cdrom/cdrom0/pathtofile/10_x86_Recommended.tar.bz2 | tar xvf -

With 2 cpu cores assigned, bunzip2 stops after a dozen patches with file is corrupt.

With 1 cpu assigned, the process fully completes the extract without error.

Other settings: VT-x=on , NestedPaging=on , APIC=on , PAE=on

I tried several variations with these settings on and off, all worked unless more than one cpu was assigned. So I am just showing the simplest change that caused the issue.

Attached will be log files from a one cpu run, and a two cpu run.

Attachments (2)

Sol10_508-2010-02-03-15-01-33_with_2cpus_FAILS.log (63.5 KB ) - added by Tom Gonzalez 14 years ago.
log file from 2 cpu config that failed
Sol10_508-2010-02-03-14-47-19_with_1cpu_WORKS.log (62.3 KB ) - added by Tom Gonzalez 14 years ago.
log file from 1 cpu config that worked

Download all attachments as: .zip

Change History (19)

by Tom Gonzalez, 14 years ago

log file from 2 cpu config that failed

by Tom Gonzalez, 14 years ago

log file from 1 cpu config that worked

comment:1 by Tom Gonzalez, 14 years ago

Additional note:

If I copy the file from the cdrom to /var/tmp in the 2 cpu configuration, it fails to extract in both the 2 cpu config and the 1 cpu config.

If I copy the file from the cdrom to /var/tmp in the 1 cpu configuation, it completes the extract in both the 1 cpu config and in the 2 cpu config.

So it seems to be a corruption issue happens during a copy or read from the cdrom while in the 2 cpu config.

comment:2 by Sander van Leeuwen, 14 years ago

Component: otherguest smp
Summary: corruption reading from cdrom with multiple cpuscorruption reading from cdrom with multiple cpus -> try with 3.1.4

Try the 3.1.4 beta discussed on the forum.

comment:3 by Tom Gonzalez, 14 years ago

I just installed 3.1.4_BETA2_r57282 released 4 Feb 2010. I observe the same behavior. With 1 cpu the file reads off the cdrom ok. With 2 cpus the file is corrupted a short ways into reading. No change in status. Do you need the logs from the 3.1.4 BETA2 runs, or any other information?

comment:4 by Frank Mehnert, 14 years ago

Summary: corruption reading from cdrom with multiple cpus -> try with 3.1.4corruption reading from cdrom with multiple cpus

Do you read this file from a physical drive or from a DVD image?

comment:5 by Tom Gonzalez, 14 years ago

From a DVD disc in the physical drive mapped in from the Windows 7 host. I will try a mapped iso file to see if that has the same issue or not.

comment:6 by Tom Gonzalez, 14 years ago

Okay, I'm still using 3.1.4_BETA2. Using a ISO file mapped into the guest, the results are the same. Works ok with 1 cpu, fails with 2 cpus assign. Same error as before. Bunzip2 stops and states the file seems to be corrupt.

comment:7 by Frank Mehnert, 14 years ago

Interesting. So far I wasn't able to reproduce this issue with a medium CDROM image (~700MB) on a Linux host with 2 guest CPUs. Maybe restricted to Windows hosts.

comment:8 by Tom Gonzalez, 14 years ago

Just to be clear, when I copied the file from the DVD to the HD, no error was produce, it silently corrupted the file during the copy. It produced an error when I tried to uncompress it after the copy. Or uncompressing it on the fly as it read from the DVD. Sorry, just want to be clear.

I have done more testing, I've installed the same version of Solaris 10 5/08 on a real PC with dual core 64-bit, and tested the same issue using an IDE DVD and a SATA DVD. Not sure how VirtualBox is emulating the controller. Using either controller did not produce an error. I wanted to test it on real HW with that version of Solaris as a comparison. I've never heard of this issue with Solaris on real HW.

Next, I tried Solaris 10 10/09 in VirtualBox and tested the same 1 CPU and 2 CPU configurations, and both worked without error. So, I then patched one of my Solaris 10 5/08 VirtualBox VMs to Dec 2009 patch levels, and it now extracts ok with 2 CPUs.

So it seems like a patch from the 5/08 to the 10/09 resolves this issue. I have looked through the patches and have not seen which patch may have correct this. It's possible some other bug fix, also fixed this issue and therefore is not listed in any of the patches.

Again, I stumbled into this because I installed Solaris 10 5/08 and attempted to patch it configured with 2 cpus.

Is this still a valid VirtualBox bug? Should Solaris 10 5/08 work without issue as is? Or should we at least find out which patch resolved it?

I will check a few other releases of Solaris 10 to see if others are affected and what version it seems to be fixed in. I'm curious to see the results. It may take me a week or so.

Thanks for your time and patients.

comment:9 by Frank Mehnert, 14 years ago

Yes, I've checked the md5sum of the big file (which /dev/urandom content) ...

comment:10 by Tom Gonzalez, 14 years ago

After removing kernel patch 141445-09 (released Oct/13/2009) from a fully patched system, the problem returns.

Just for reference, the previous kernel patch to the above is 139556-08 (release May/07/2009) and running with it installed does experience this issue. So it seems running any kernel patch prior to 141445-09 would produce this corruption, at least with my configuration. FYI, I can/have install and reproduced it several times now.

Maybe it's something unique with running on Windows 7 in 64-bit as the host, maybe something else or a combination of things.

Is there any more testing anyone can think of to help narrow the description of this bug? Is this still a valid bug to pursue? Just wondering, since I have not see or heard of this happening with Solaris on real HW.

Well, at least there is a workaround for anyone experiencing this, use 1 cpu until you can apply kernel patch 141445-09 or later.

I'd be happy to do more testing, provide logs or whatever is needed... as time permits.

comment:11 by Tom Gonzalez, 14 years ago

New information. I thought I had tested all the possible configuration settings, I made a chart to keep track and everything, but it seems I missed one.

I turned off Nested Paging and the CDROM corruption is no longer present.

I wanted to test this on my iMac. I turned on all the same settings, but I noticed that Nested Paging did not activate. I saw this when I hovered the mouse over the chip icon in the lower right of the Vbox window. The iMac did not have the issue.

According to the VirtualBox Users manual, Nested Paging is only available on Intel CPUs starting with the i7 processor. My Windows 7 PC has an i7. My iMac only has a Core 2 Duo, so no Nested Paging for my iMac.

Anyway, it looks like the issue is now defined as: Corruption reading from the CDROM while having more than 1 CPU and Nested Paging available and active. And for the Solaris guest, having a kernel patch of 139556-08 or earlier. I'm not sure if other guests are affected.

comment:12 by Tom Gonzalez, 14 years ago

One more thing. All the tests in the previous post were done on the full released version of 3.1.4, not the beta.

comment:13 by Sander van Leeuwen, 14 years ago

That's an interesting observation. Will have to check this here. Thanks.

comment:14 by Sander van Leeuwen, 14 years ago

I've just fixed a problem related to nested paging and guest SMP. If you're interested in a test build, then I can make one available for you.

comment:15 by Sander van Leeuwen, 14 years ago

Actually no, that can't be the problem. You'd always get a guru meditation otherwise. Still have to try it myself then.

comment:16 by Tom Gonzalez, 14 years ago

VirtualBox 3.2.0 seems to not have this issue with Nested Paging with multiple cpus. I tested my same Solaris 10 VM in VB 3.1.6 to show I still have the issue (I did), then upgraded to VB 3.2.0, the corruption problem did not occur. I ran my test several times, all ok. The Change log for 3.2.0 show a few significant updates to the Nested Paging routines. If you can verify the same result, I think we can close this issue as being fixed in 3.2.0.

comment:17 by Sander van Leeuwen, 14 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use