#959 closed defect (fixed)
problems with BIOS reals beyond LBA boundary?
Reported by: | Clemens Fruhwirth | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 1.5.2 |
Keywords: | bios read broken | Cc: | |
Guest type: | other | Host type: | other |
Description
I observe the following problems with VirtualBox. An explanation that fits this problem patterns is that there is a problem with BIOS initiated reads beyond the LBA boundary of a disk.
- Problem: My VMware based installation (on VMDK) does not boot properly. It fails to find NTLDR.
- Problem: Using the VMware installation from above (=uncleaned partition, partially filled) and doing a separate installation into the typical C:\WINDOWS.0 fails too (installation done under VirtualBox). Again the boot loader is not able to load NTLDR.
- No problem: Installing a fresh copy of windows (under VBox) onto the same partition but this time reformat. NTLDR works, Windows boot, world is fine.
- Problem: The image from the installation is modified and I dump all my old vmware stuff into C:\. Replacing the folders Program Files, Windows, etc.etc. Result after reboot: NTLDR found, boot menu, but launching Windows fails with STOP 0x0007b (or similar), the error that indicates that it can't access the boot device.
Here is an explanation that fits the above problem pattern: in real mode, reads from disks that are beyond the LBA boundary (separate interface INT 0x10,AH=0x42 IIRC) is broken. FAT/NTFS MBR uses the AH=0x02 interface for everything beyond sector 1024*255*63.
This fits all problem/no problems:
- my existing installation has an NTLDR beyond the LBA boundary sector: 1024*255*63. The loader tries to access NTLDR, loads some rubbish, signature verification fails. boot loader dies.
- the fresh windows installation on the half-filled disk dumps new data to disk beyond the LBA boundary. situation doesn't change. boot loader dies.
- a fresh installation to an empty disk fills up the disk but does not reach the LBA boundary (around 8GB). Hence, NTLDR is placed below this boundary, also all DLLs neccessary for booting are placed beload this boundary.
- modifying the fresh installation from above and putting DLLs beyond the LBA boundary gets the boot loader into trouble. NTLDR is unmodified, hence it loads, but after that no driver dlls can be accessed.
Side notes: Vmware is able to boot into ALL of the 3 cases were virtual box isn't able. For problem 4, I made sure to add the necessary drivers into the criticaldevice section of the windows registry. I'm not sure if I did that correctly, but what really indicates that there is a problem at a much lower level (bios level) is that, in problem 3 I'm not only unable to boot into windows, but also unable to boot into the recovery console. the recovery console refuses to load because HAL.dll is not found (as I said HAL.dll is likely to reside after the LBA boundary).
My next test case is: format a plain NTFS disk, fill it up with to 8GB, and do a fresh win xp install. The result (if my thesis is correct) should be that NTLDR is not found, and booting dies. (will do that in about a week, I just filed this bug now so that this issues is known, and this knowledge isn't lost if for some reason I forget to do this)
Of course, if you have further insight you are free to provide it.
Attachments (3)
Change History (13)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
Diffing working scenario 3 against non-working scenario 4 rules out an issue attributed to CHS differences between VMware and VirtualBox. To be more precise, the very-same VMDK setup(*) works when I use it for a fresh installation and fails after fiddling with it afterwards. I even used virtualbox with a secondary windows installation to modify the filesystem to generate scenario 4. So, however VirtualBox interprets the disk geometry in the VMDK file, it is the same for scenario 3 and 4. Hence, this error can't be attributed to a difference.
VMware was only involved in generating the setup of the windows root-dir that I copied over to the fresh installation. But there are no traces of CHS geometry of the system disk in C:\WINDOWS (at least I hope so).
(*) My VMDK setup is fullDevice on plain LVM blockdev. Description file is attached.
comment:3 by , 17 years ago
Now this is really strange. The VMDK (in the geometry settings) claims to have at least 257939640 sectors, whereas the RW section says there are about 100000000 sectors. How do you create this VMDK? How big is the LVM volume? Are there any read/write errors in VBox.log?
The VMDK is at least inconsistent, and thus is the most probable source of all the trouble you get.
comment:4 by , 17 years ago
as you adviced on IRC, I regenerated the vmdk file. (attached) the VBox.log file is also attached.
comment:5 by , 17 years ago
If you want to try instrumenting the BIOS, it's fairly easy: just add BX_INFO(...) calls where you want something to be logged. That function takes printf-style parameters, first the format string and then the parameters. The log lines (don't forget the line ending) get written to VBox.log.
The BIOS source is in src/VBox/Devices/PC/BIOS/rombios.c. For this to work you don't even need a debug build.
comment:6 by , 17 years ago
I just tried installing to a pre-formatted NTFS partition (filled with a 8,4gb file) and contrary to the prediction of my hypothesis above, it succeeded. that means, it installs and boots correctly.
probably, I'll try to debug the behaviour of the MBR tomorrow, diffing it to the behaviour I get under vmware.
comment:7 by , 17 years ago
ok, here are my insights from debugging virtualbox and vmware in parallel:
yes, it is a CHS issue, and yes, it's related to LBA. The latter because the FAT32 (and NTFS boot loader) has a special and stupid logic to use LBA addressing for any sector beyond a self calculated sector boundary, but before that boundary, it uses broken CHS access via int13/ah=0x02h calls. The boundary is cyls*heads*sectors as returned by int13/ah=0x08. virtualbox returns it's LCHS in this cause.
the problem comes from the LCHS being different from the values stored in the BPB (bios parameter block). the BPB is embedded into the first sector of a FAT file system, telling the boot loader (for what reasons ever) what's the CHS geometry. if this stored CHS geometry is different from the actual LCHS, the boot sector calculates nonsense an accessed the wrong sector.
look at the subroutine starting at 0x7ce0 http://mirror.href.com/thestarman/asm/mbr/ntFAT32BR.htm EAX=target sector, ECX=how much sector to read, EBX=target memory the first CMP instructions whether the sector to access is greater then this self calculated boundary stored at [BP-08]. if it is, jump 0x7D34. here is the broken calculation, using [BP+18], and [BP+1A] to break down the absolute sector number in EAX into CHS values. [BP+18] and [BP+1A] are the CHS geometry stored in the BPB.
comment:8 by , 17 years ago
(sorry, for double commenting, I intended to hit "Preview" instead of "Submit")
further, virtualbox heroically tries to guess the LCHS. but it guesses it from the partition table (DevATA.cpp:ataGuessDiskLCHS) . that's a good start, but the failure comes from the BPB in my case, so guessing from the BPB would be better. however as there might be four primary partitions on a disk, there might be four BPB's and four conflicting values. I'd say that guessing this might turn out as a mess.
probably, making this configurable by hand might be a solution for the experienced user.
my solution for the moment is to NOP-out the boundary check in the boot sector and always go for LBA addressing. this instantly fixes the problem.
quite interesting, the whole boot sector is embedded in the file containing the recovery console too. hence it also contains a BPB with a mismatching BPB, hence it failed to boot when I copied it over from a vmware installation.
comment:9 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Please reopen if this is still an issue with VirtualBox 2.2.0.
comment:10 by , 15 years ago
It is not a VirtualBox bug, so yes please close.
For reference: It is more like a bug in the bootloader of windows that believes to passionately in the CHS values it once saw.
Read more here http://blog.clemens.endorphin.org/2007/12/removing-chs-based-access-from-windows_3170.html or get http://clemens.endorphin.org/killchs.c
Usually such bugs are caused by the disk geometry VirtualBox detects/uses is different to what VMWare uses. So it's normally not caused by LBA as such, it's due to a CHS mapping difference. VirtualBox has had issues like this before (in the 1.4.x series mainly), but I haven't got any CHS geometry issues reports for a long time.
To resolve this issue (it could be a remaining boundary case), please provide the VBox.log file of an unsuccessful boot. It contains the CHS values used by VirtualBox (there are two sets of them, so attach it as a whole instead of extracting the info yourself). Also I need the VMDK image description. Depending on what VMDK variant you're using, it's more or less hard to get at this: for fixed images, it's a separate text file (~600 bytes usually), and for sparse images it's embedded in the header. I don't need (at least not initially) the full VMDK image.