[vbox-dev] VMDK inflation and grain alignment/padding

Michal Necasek michal.necasek at oracle.com
Tue Feb 23 09:06:02 UTC 2016

On 2/18/2016 6:25 PM, Christian Svensson wrote:
> On Thu, Feb 18, 2016 at 4:35 PM, Michal Necasek
> <michal.necasek at oracle.com <mailto:michal.necasek at oracle.com>> wrote:
>        According to my reading of it, qemu-img produced an rater suspect
>     image. You have a supposedly 64K grain but after decompression, there is
>     only 49K of data. That's obviously a problem because what do you do with
>     the missing 15K? You could make something up, but what? Zeros? All bits
>     set? Random noise? There's no good answer.
> Well, do you need to make something up?
  If such a grain occurs in the middle of the disk image, obviously yes.

>        If I understood it correctly, in your case the inconsistent grain
>     occurs at the end of the disk. Does that mean that the size of the disk
>     in grains is larger than the size in sectors?
> I think so. I'll verify, but VMware showed some pretty funky numbers for
> capacity.
  Would be great if you could find out how VMware actually interpreted 
the disk image.

>        At the end of a disk it would be possible to guess "what the user
>     wanted" but if such a grain occurred in the middle, how do you deal
>     with it?
> I'm not convinced that you need to. Isn't the image integrity is on the
> user, and if it is why does VBox care about doing arbitrary checks about
> data integrity? For OVA we have manifest files to verify the integrity
> for example.
  You're confusing integrity and validity. The checksum verifies that 
the data were transmitted without error. It says absolutely nothing 
about the data being valid. If I create a checksummed OVA archive with a 
VMDK image that's filled with random noise, the checksum will verify 
that the garbage wasn't corrupted during transfer. It doesn't magically 
turn it into valid data.

> Would simply decompressing all the grains and appending them to each
> other be acceptable?
  See above. If decompression produces less data than the grain size, 
what do we do with the missing bits? It would only not be a problem if 
we could conclusively determine that the missing bits are not meant to 
be part of the disk image.

  The problem we have here is that the VMDK spec says the disk capacity 
"should be a multiple of the grain size" and does not give any hint how 
to interpret an image where that is not the case.

>        To be fair you probably asked qemu-img to do an impossible task. The
>     grain size must be a power of two and greater than 4K, but your disk
>     image apparently has a size that's not even a multiple of 2K. The VMDK
>     spec also says that the capacity of the disk (extent) should be a
>     multiple of the grain size. So you created a disk image that cannot be
>     very well be represented as a VMDK.
> If this is true qemu-img should be made to complain, not create a
> "broken" VMDK. The fact that VMware products seem to be able to parse
> them would possibly imply a grandfather rule. I'm open for rising the
> question with qemu, but if VBox can be made to handle these images I
> think that makes more sense. "Be strict when generating output, be
> liberal when accepting input" -- Sombody somewhere.
  That's a good rule of thumb. The problem we have here is how to 
determine if there is only one possible interpretation of the input data.

  As for grandfathering in, VMware doesn't define the OVF/OVA format; 
DMTF does. If you could show that VMware, Xen, Red Hat, Oracle VM, etc. 
all interpret the input the same way and VirtualBox is the only odd one 
out, that would be a very convincing argument to change VirtualBox. If 
that's not the case, that would be a good reason for VirtualBox not to 
accept questionable input either.


More information about the vbox-dev mailing list