[vbox-dev] VMDK inflation and grain alignment/padding

Fri Mar 18 10:00:00 GMT 2016

Hi all,

can't find any evidence in the VMDK spec that there is a possibility of 
having a smaller grain anywhere, including at the end of the image. 
There is one value in the header which defines the grain size, which 
typically is 64K.

The spec is rather clear that the total capacity of an image should be a 
multiple of the grain size, which is a clear statement that there can be 
images where it isn't the case.

VirtualBox can handle this, but assumes that there still is a full last 
grain backing the end of the image. A grain is a grain. With 
streamOptimized images there are compressed grains, but again there's no 
hint anywhere that partial grains are allowed.

The fact that qemu-img produces images which are bending the "one grain 
size" rule is not convincing me that these are to be considered valid 
images. Only if VMware software creates such images I would be 
sufficiently convinced that the spec is incomplete.

Klaus

On 23.02.2016 10:06, Michal Necasek wrote:
> On 2/18/2016 6:25 PM, Christian Svensson wrote:
>>
>> On Thu, Feb 18, 2016 at 4:35 PM, Michal Necasek
>> <michal.necasek at oracle.com <mailto:michal.necasek at oracle.com>> wrote:
>>
>>         According to my reading of it, qemu-img produced an rater suspect
>>      image. You have a supposedly 64K grain but after decompression, there is
>>      only 49K of data. That's obviously a problem because what do you do with
>>      the missing 15K? You could make something up, but what? Zeros? All bits
>>      set? Random noise? There's no good answer.
>>
>>
>> Well, do you need to make something up?
>>
>    If such a grain occurs in the middle of the disk image, obviously yes.
>
>>         If I understood it correctly, in your case the inconsistent grain
>>      occurs at the end of the disk. Does that mean that the size of the disk
>>      in grains is larger than the size in sectors?
>>
>>
>> I think so. I'll verify, but VMware showed some pretty funky numbers for
>> capacity.
>>
>    Would be great if you could find out how VMware actually interpreted
> the disk image.
>
>>         At the end of a disk it would be possible to guess "what the user
>>      wanted" but if such a grain occurred in the middle, how do you deal
>>      with it?
>>
>>
>> I'm not convinced that you need to. Isn't the image integrity is on the
>> user, and if it is why does VBox care about doing arbitrary checks about
>> data integrity? For OVA we have manifest files to verify the integrity
>> for example.
>   >
>    You're confusing integrity and validity. The checksum verifies that
> the data were transmitted without error. It says absolutely nothing
> about the data being valid. If I create a checksummed OVA archive with a
> VMDK image that's filled with random noise, the checksum will verify
> that the garbage wasn't corrupted during transfer. It doesn't magically
> turn it into valid data.
>
>> Would simply decompressing all the grains and appending them to each
>> other be acceptable?
>>
>    See above. If decompression produces less data than the grain size,
> what do we do with the missing bits? It would only not be a problem if
> we could conclusively determine that the missing bits are not meant to
> be part of the disk image.
>
>    The problem we have here is that the VMDK spec says the disk capacity
> "should be a multiple of the grain size" and does not give any hint how
> to interpret an image where that is not the case.
>
>>         To be fair you probably asked qemu-img to do an impossible task. The
>>      grain size must be a power of two and greater than 4K, but your disk
>>      image apparently has a size that's not even a multiple of 2K. The VMDK
>>      spec also says that the capacity of the disk (extent) should be a
>>      multiple of the grain size. So you created a disk image that cannot be
>>      very well be represented as a VMDK.
>>
>>
>> If this is true qemu-img should be made to complain, not create a
>> "broken" VMDK. The fact that VMware products seem to be able to parse
>> them would possibly imply a grandfather rule. I'm open for rising the
>> question with qemu, but if VBox can be made to handle these images I
>> think that makes more sense. "Be strict when generating output, be
>> liberal when accepting input" -- Sombody somewhere.
>>
>    That's a good rule of thumb. The problem we have here is how to
> determine if there is only one possible interpretation of the input data.
>
>    As for grandfathering in, VMware doesn't define the OVF/OVA format;
> DMTF does. If you could show that VMware, Xen, Red Hat, Oracle VM, etc.
> all interpret the input the same way and VirtualBox is the only odd one
> out, that would be a very convincing argument to change VirtualBox. If
> that's not the case, that would be a good reason for VirtualBox not to
> accept questionable input either.
>
>
>         Regards,
>            Michal