VirtualBox

Ticket #6124 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

corruption reading from cdrom with multiple cpus

Reported by: tgonz99 Owned by:
Priority: major Component: guest smp
Version: VirtualBox 3.1.2 Keywords: corruption
Cc: Guest type: Solaris
Host type: Windows

Description

My Solaris guest reports a corrupted file when reading from the cdrom when I have more than 1 cpu core assigned to the guest.

Details:

VirtualBox 3.1.2

Host: Windows 7 Home Premium x64, Intel i7

Guest: Solaris 10 5/08 no additional patches in 64-bit mode, host cdrom active

I am trying to copy or just extract the 10_x86_Recommended.tar.bz2 patch cluster from a EIS dvdrom. It's about an 800MB file.

# cd /var/tmp

# bunzip2 -c /cdrom/cdrom0/pathtofile/10_x86_Recommended.tar.bz2 | tar xvf -

With 2 cpu cores assigned, bunzip2 stops after a dozen patches with file is corrupt.

With 1 cpu assigned, the process fully completes the extract without error.

Other settings: VT-x=on , NestedPaging=on , APIC=on , PAE=on

I tried several variations with these settings on and off, all worked unless more than one cpu was assigned. So I am just showing the simplest change that caused the issue.

Attached will be log files from a one cpu run, and a two cpu run.

Attachments

Sol10_508-2010-02-03-15-01-33_with_2cpus_FAILS.log Download (63.5 KB) - added by tgonz99 4 years ago.
log file from 2 cpu config that failed
Sol10_508-2010-02-03-14-47-19_with_1cpu_WORKS.log Download (62.3 KB) - added by tgonz99 4 years ago.
log file from 1 cpu config that worked

Change History

Changed 4 years ago by tgonz99

log file from 2 cpu config that failed

Changed 4 years ago by tgonz99

log file from 1 cpu config that worked

comment:1 Changed 4 years ago by tgonz99

Additional note:

If I copy the file from the cdrom to /var/tmp in the 2 cpu configuration, it fails to extract in both the 2 cpu config and the 1 cpu config.

If I copy the file from the cdrom to /var/tmp in the 1 cpu configuation, it completes the extract in both the 1 cpu config and in the 2 cpu config.

So it seems to be a corruption issue happens during a copy or read from the cdrom while in the 2 cpu config.

comment:2 Changed 4 years ago by sandervl73

  • Component changed from other to guest smp
  • Summary changed from corruption reading from cdrom with multiple cpus to corruption reading from cdrom with multiple cpus -> try with 3.1.4

Try the 3.1.4 beta discussed on the forum.

comment:3 Changed 4 years ago by tgonz99

I just installed 3.1.4_BETA2_r57282 released 4 Feb 2010. I observe the same behavior. With 1 cpu the file reads off the cdrom ok. With 2 cpus the file is corrupted a short ways into reading. No change in status. Do you need the logs from the 3.1.4 BETA2 runs, or any other information?

comment:4 Changed 4 years ago by frank

  • Summary changed from corruption reading from cdrom with multiple cpus -> try with 3.1.4 to corruption reading from cdrom with multiple cpus

Do you read this file from a physical drive or from a DVD image?

comment:5 Changed 4 years ago by tgonz99

From a DVD disc in the physical drive mapped in from the Windows 7 host. I will try a mapped iso file to see if that has the same issue or not.

comment:6 Changed 4 years ago by tgonz99

Okay, I'm still using 3.1.4_BETA2. Using a ISO file mapped into the guest, the results are the same. Works ok with 1 cpu, fails with 2 cpus assign. Same error as before. Bunzip2 stops and states the file seems to be corrupt.

comment:7 Changed 4 years ago by frank

Interesting. So far I wasn't able to reproduce this issue with a medium CDROM image (~700MB) on a Linux host with 2 guest CPUs. Maybe restricted to Windows hosts.

comment:8 Changed 4 years ago by tgonz99

Just to be clear, when I copied the file from the DVD to the HD, no error was produce, it silently corrupted the file during the copy. It produced an error when I tried to uncompress it after the copy. Or uncompressing it on the fly as it read from the DVD. Sorry, just want to be clear.

I have done more testing, I've installed the same version of Solaris 10 5/08 on a real PC with dual core 64-bit, and tested the same issue using an IDE DVD and a SATA DVD. Not sure how VirtualBox is emulating the controller. Using either controller did not produce an error. I wanted to test it on real HW with that version of Solaris as a comparison. I've never heard of this issue with Solaris on real HW.

Next, I tried Solaris 10 10/09 in VirtualBox and tested the same 1 CPU and 2 CPU configurations, and both worked without error. So, I then patched one of my Solaris 10 5/08 VirtualBox VMs to Dec 2009 patch levels, and it now extracts ok with 2 CPUs.

So it seems like a patch from the 5/08 to the 10/09 resolves this issue. I have looked through the patches and have not seen which patch may have correct this. It's possible some other bug fix, also fixed this issue and therefore is not listed in any of the patches.

Again, I stumbled into this because I installed Solaris 10 5/08 and attempted to patch it configured with 2 cpus.

Is this still a valid VirtualBox bug? Should Solaris 10 5/08 work without issue as is? Or should we at least find out which patch resolved it?

I will check a few other releases of Solaris 10 to see if others are affected and what version it seems to be fixed in. I'm curious to see the results. It may take me a week or so.

Thanks for your time and patients.

comment:9 Changed 4 years ago by frank

Yes, I've checked the md5sum of the big file (which /dev/urandom content) ...

comment:10 Changed 4 years ago by tgonz99

After removing kernel patch 141445-09 (released Oct/13/2009) from a fully patched system, the problem returns.

Just for reference, the previous kernel patch to the above is 139556-08 (release May/07/2009) and running with it installed does experience this issue. So it seems running any kernel patch prior to 141445-09 would produce this corruption, at least with my configuration. FYI, I can/have install and reproduced it several times now.

Maybe it's something unique with running on Windows 7 in 64-bit as the host, maybe something else or a combination of things.

Is there any more testing anyone can think of to help narrow the description of this bug? Is this still a valid bug to pursue? Just wondering, since I have not see or heard of this happening with Solaris on real HW.

Well, at least there is a workaround for anyone experiencing this, use 1 cpu until you can apply kernel patch 141445-09 or later.

I'd be happy to do more testing, provide logs or whatever is needed... as time permits.

comment:11 Changed 4 years ago by tgonz99

New information. I thought I had tested all the possible configuration settings, I made a chart to keep track and everything, but it seems I missed one.

I turned off Nested Paging and the CDROM corruption is no longer present.

I wanted to test this on my iMac. I turned on all the same settings, but I noticed that Nested Paging did not activate. I saw this when I hovered the mouse over the chip icon in the lower right of the Vbox window. The iMac did not have the issue.

According to the VirtualBox Users manual, Nested Paging is only available on Intel CPUs starting with the i7 processor. My Windows 7 PC has an i7. My iMac only has a Core 2 Duo, so no Nested Paging for my iMac.

Anyway, it looks like the issue is now defined as: Corruption reading from the CDROM while having more than 1 CPU and Nested Paging available and active. And for the Solaris guest, having a kernel patch of 139556-08 or earlier. I'm not sure if other guests are affected.

comment:12 Changed 4 years ago by tgonz99

One more thing. All the tests in the previous post were done on the full released version of 3.1.4, not the beta.

comment:13 Changed 4 years ago by sandervl73

That's an interesting observation. Will have to check this here. Thanks.

comment:14 Changed 4 years ago by sandervl73

I've just fixed a problem related to nested paging and guest SMP. If you're interested in a test build, then I can make one available for you.

comment:15 Changed 4 years ago by sandervl73

Actually no, that can't be the problem. You'd always get a guru meditation otherwise. Still have to try it myself then.

comment:16 Changed 4 years ago by tgonz99

VirtualBox 3.2.0 seems to not have this issue with Nested Paging with multiple cpus. I tested my same Solaris 10 VM in VB 3.1.6 to show I still have the issue (I did), then upgraded to VB 3.2.0, the corruption problem did not occur. I ran my test several times, all ok. The Change log for 3.2.0 show a few significant updates to the Nested Paging routines. If you can verify the same result, I think we can close this issue as being fixed in 3.2.0.

comment:17 Changed 4 years ago by sandervl73

  • Status changed from new to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use