Context Navigation

← Previous Ticket
Next Ticket →

#21963 new defect

VirtualBox destroys qcow2 image when attaching it to a VM

Reported by:	bhaible	Owned by:
Component:	virtual disk	Version:	VirtualBox 6.1.38
Keywords:	qcow2	Cc:	bhaible
Guest type:	Linux	Host type:	Linux

Description

Some valid qcow2 images get destroyed when the user attaches them to a VM.

How to reproduce:

1) Get a particular qcow2 image:

$ wget http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/evbarm-aarch64/binary/gzimg/arm64.img.gz
$ gunzip arm64.img
$ qemu-img convert -f raw -O qcow2 arm64.img arm64-cr.qcow2
$ qemu-img resize -f qcow2 arm64-cr.qcow2 10G

2) Create a backup of the image:

$ cp -p arm64-cr.qcow2 arm64-cr.qcow2.bak

3) Edit the settings of a Linux guest VM (I use an Ubuntu 23.10 VM at this place), by adding a new SATA hard disk and picking arm64-cr.qcow2. When I do this, VirtualBox presents a failure dialog with the message

"QCow: Reading the L1 table for image '/home/bruno/Downloads/qemu-2102/arm64-cr.qcow2' failed (VERR_EOF)."

Then I do this again, and it succeeds.

4) Start the VM, and inside the VM do

# fdisk /dev/sdb

It shows no partitions, although the original image had 2 partitions (a vfat and an ufs partition).

5) Attempt to terminate the VM through the VirtualBox GUI. It does not terminate. I had to manually kill the VirtualBoxVM process.

6) The resulting .qcow2 file is not valid any more:

$ qemu-img check arm64-cr.qcow2.bak
No errors were found on the image.
16081/163840 = 9.82% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 1054408704
$ qemu-img check arm64-cr.qcow2
ERROR found L1 entry with reserved bits set: 40000000080
ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x40000000000 size 0x10000
ERROR found L1 entry with reserved bits set: 471b00000080
ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x471b00000000 size 0x10000
ERROR found L1 entry with reserved bits set: 483b00000080
ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x483b00000000 size 0x10000
Leaked cluster 4 refcount=1 reference=0
Leaked cluster 5 refcount=1 reference=0
Leaked cluster 6 refcount=1 reference=0
Leaked cluster 7 refcount=1 reference=0
...
Leaked cluster 16085 refcount=1 reference=0
Leaked cluster 16086 refcount=1 reference=0
Leaked cluster 16087 refcount=1 reference=0

6 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

16084 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
Image end offset: 1054408704

7) Look at the remaining data in the .qcow2 file: It has been reduced from 302 MB to 10 MB.

$ qemu-img convert -f qcow2 -O raw arm64-cr.qcow2.bak arm64-cr.tmp.img; gzip -c -9 < arm64-cr.tmp.img | wc -c; rm -f arm64-cr.tmp.img
302447697
$ qemu-img convert -f qcow2 -O raw arm64-cr.qcow2 arm64-cr.tmp.img; gzip -c -9 < arm64-cr.tmp.img | wc -c; rm -f arm64-cr.tmp.img
10420385

Change History (1)

comment:1 by bhaible, 4 months ago

Hanna Czenczek has analyzed the issue, by delving into the VirtualBox sources. <https://gitlab.com/qemu-project/qemu/-/issues/2102>

Her findings are:

Given the specification of the QCOW2 format at https://github.com/qemu/qemu/blob/master/docs/interop/qcow2.txt, it's a bug in VirtualBox, not a bug in qemu-img convert.
VirtualBox assumes that the size of the L1 table is a multiple of the cluster size, and tries to read full clusters. But this is not guaranteed by the specification.
It only converts the L1 table entries from big endian to native endian if the L1 table has been successfully loaded.
At the end, in arm64-cr.qcow2, the L1 table entries are in little-endian format, rather than in big-endian format.

Here are a couple of shell functions that help understand the situation:

# Function to output the cluster size (in bytes) of a .qcow2 file.
func_qcow2_cluster_size ()
{
  cluster_bits=`od -A n --endian=big -t d4 -j 20 -N 4 "$1"`
  echo "2 ^ $cluster_bits" | bc
}
# Function to output the active L1 table size (in bytes) of a .qcow2 file.
func_qcow2_l1table_size ()
{
  l1table_entries=`od -A n --endian=big -t d4 -j 36 -N 4 "$1"`
  echo "8 * $l1table_entries" | bc
}
# Function to output the offset of the active L1 table of a .qcow2 file.
func_qcow2_l1table_offset ()
{
  od -A n --endian=big -t d8 -j 40 -N 8 "$1"
}
# Function to output the size of regular file.
func_qcow2_file_size ()
{
  stat -c %s "$1"
}
# Function to output the distance (in bytes) from the start of the active
# L1 table to the end of the file.
func_qcow2_distance_from_l1table_to_end ()
{
  l1table_offset=`func_qcow2_l1table_offset "$1"`
  file_size=`func_qcow2_file_size "$1"`
  expr $file_size - $l1table_offset
}

This is what these functions tell about the original image:

$ func_qcow2_cluster_size arm64-cr.qcow2.bak
65536
$ func_qcow2_l1table_size arm64-cr.qcow2.bak
160
$ func_qcow2_distance_from_l1table_to_end arm64-cr.qcow2.bak
160

The peculiar property of arm64-cr.qcow2.bak is that func_qcow2_distance_from_l1table_to_end is not a multiple of 65536.

Note: See TracTickets for help on using tickets.

Download in other formats: