VirtualBox

Ticket #11344 (closed defect: fixed)

Opened 16 months ago

Last modified 14 months ago

Extending VDI size destroys XFS filesystem => fixed in svn

Reported by: Grimeton Owned by:
Priority: critical Component: virtual disk
Version: VirtualBox 4.2.6 Keywords: xfs,broken
Cc: Guest type: other
Host type: Linux

Description

Hello,

after resizing a VDI image, while the VM was offline, from 1TB to 1.3TB, the XFS filesystem inside has been destroyed. I talked to Eric Sandeen, from the XFS people, and we ran some tests and in the end the filesystem will always be destroyed.

We're testing on 4.2.4 and 4.2.6 on Linux (amd64) and OS X (10.8.x) hosts with this procedure:

  • Power off the VM
  • Create a virtual harddisk with 1TB space as VDI
  • Attach the virtual harddisk to a port on the sata controller
  • Power up the VM
  • Create an XFS filesystem on top of the drive (NO PARTITIONS)
  • Mount the filesystem and create some directories/files on it like this:

cd /mntpoint && for i in {1..30}; do mkdir $i; cd $i; for j in {a..z}; do touch $j; done; cd ..; done

  • Unmount the fs
  • Power down the machine
  • Change the VDI size like this: vboxmanage modifyhd /path/to/vdi --resize 1331000
  • Power the VM on again
  • Mount the fs again (no error yet!)
  • do a "ls /mountpoint"
  • see lots of errors
  • run xfs_repair -n and see lots of errors.

So far we can't say if it is a Virtualbox or an XFS issue or a result of the combination of the two. We tested with different image sizes and always got a destroyed filesystem in the end.

Eric will look into this and keep me updated on it.

So far just the warning: DO NOT RESIZE the Virtualbox VDI image if you have XFS inside it.

Happy new year!

KR,

Grimeton

Change History

comment:1 Changed 16 months ago by aeichner

What happens if you create a partition before creating the xfs filesystem? I don't know how the XFS filesystem on disk format looks like but is it possible that the filesystem relies on the size of the block device to find some metadata? If you increase the size of the disk VBox will not touch the data on it but just increase the stored size of the disk and creating more space for the block allocation table.

comment:2 Changed 16 months ago by Grimeton

Eric digged a bit further and this is what he wrote me last night:

04:40 <sandeen> FWIW, I see it here too.

06:55 <sandeen> the first miscompare is at 0x4000000001

06:55 <sandeen> random? ;)

06:56 <sandeen> anyway, this has to be a vb problem I think, I'm not going to spend more time

on it unless the vb guys need a hand

So it looks like the problem is VB related.

I also ran some tests with a partition on the drive. The results are even more catastrophic as the kernel refuses to mount the filesystem at all. I also see a calltrace of the xfs module (already communicated to Eric).

Eric explained me how the fs works - this is what he told me:

The moment the fs is created, it creates so called AllocationGroups. Those AGs are put on the drive in a predefined gap. The initial value is four. If you create a directory or a file, then XFS puts those new objects into the next AG that is on the counter. So it starts with 0,1,2,3 -> back to 0,1,2,3 and so on.

The superblock on the beginning stores the information of the last block on the disk that is used. So if the disk size is extended XFS will see more space at the end that it can use.

Now the "funny" part:

Eric tells me that XFS sees empty space behind its last block on the device and that's it. There is nothing touched at all until one runs xfs_growfs.

You tell me - if one changes the image size of the VDI only empty space is added at the end ...

Seems like something else is going on, because after the resize (of the VDI image) XFS complains about missing information in the first blocks on the drive which you can see here:  http://sprunge.us/SLdi

So it looks like something has been overwritten there.

If you like I can attach the complete chat logs from last night. Maybe that sheds some more light on this.

KR,

Oliver

comment:3 Changed 16 months ago by Grimeton

A screenshot from last night that shows that only every fourth ALG is affected:  http://i.imgur.com/elHV0.png

comment:4 Changed 16 months ago by sandeen

This is completely unrelated to XFS, please don't get off on the wrong track. :)

I did this:

Create a 1.00T .vdi disk, dynamically allocated, attach it, and boot the guest. In the guest, pattern the disk from 255G to 257G. I used this command, in the guest:

# xfs_io -c "pwrite 255g 2g" /dev/sdb

to do this, but any method to write a pattern to the disk would work. The above writes the pattern "0xcdcdcdcd" into that 2g range from 255g to 257g, as we can see:

# dd if=/dev/sdb bs=1G skip=255 count=2 | hexdump -C
00000000  cd cd cd cd cd cd cd cd  cd cd cd cd cd cd cd cd  |................|
*
80000000
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 149.839 s, 14.3 MB/s

I then powered off the guest, and resized the .vdi:

# VBoxManage modifyhd 1TDisk1.vdi --resize 1331200

Prior to this I also cloned that disk to have the original for comparison.

I rebooted the guest and then looked at the contents of the resized disk:

# dd if=/dev/sdb bs=1G skip=255 count=2 | hexdump -C
00000000  cd cd cd cd cd cd cd cd  cd cd cd cd cd cd cd cd  |................|
*
00100000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00200000  cd cd cd cd cd cd cd cd  cd cd cd cd cd cd cd cd  |................|
*
80000000

2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 211.092 s, 10.2 MB/s

As you can see, a large swath of zeros has now appeared in that range as a result of the disk image resize. If I look at the pre-resize clone (attached to /dev/sdc now), I still see what I expect:

# dd if=/dev/sdc bs=1G skip=255 count=2 | hexdump -C
00000000  cd cd cd cd cd cd cd cd  cd cd cd cd cd cd cd cd  |................|
*
80000000
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 149.839 s, 14.3 MB/s

This seems to clearly be a problem with data on the disk image getting corrupted during a resize, regardless of what the format of that data is...

Thanks, -Eric

comment:5 Changed 16 months ago by Grimeton

We should change the title to "destroys filesystem inside".

I guess other filesystems will show the same problem.

KR,

Oliver

comment:6 Changed 16 months ago by aeichner

  • Summary changed from Extending VDI size destroys XFS filesystem to Extending VDI size destroys XFS filesystem => fixed in svn

Reproduced it with the instructions above. The bug appears if there needs to be more than 1 block rellocated to create enough space for the block allocation table. This will be fixed in the next maintenance release. A workaround for now is to increase the size of the disk in smaller steps to the desired amount. Thanks for the report!

comment:7 Changed 16 months ago by sandeen

aeichner, thanks. Asking on behalf of the original reporter, is there any way to un-mangle an image impacted by the original bug?

Last edited 16 months ago by sandeen (previous) (diff)

comment:8 Changed 16 months ago by Grimeton

Thanks for asking Eric, but I already deleted the image and created one twice the size of the available disk space on the host, so I'll never have that problem again ;)

KR,

Oliver

comment:9 Changed 14 months ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fix is part of VBox 4.2.8.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use