VirtualBox

Ticket #15831 (closed defect: fixed)

Opened 9 months ago

Last modified 7 months ago

vm state corupted because failed snapshot deletion

Reported by: tim43263246 Owned by:
Priority: major Component: other
Version: VirtualBox 5.1.4 Keywords:
Cc: Guest type: all
Host type: Linux

Description

Hi,

i tryed to delete a snapshot in switched out state but than i couldnt start the vm anymore because it says that some file is missing. as i looked into dmesg i saw:

[362393.820323] DeleteSnap[29354]: segfault at 70 ip 000000000070a181 sp 00007fe4b6837940 error 4 in VBoxSVC[400000+489000]

so it looked like the process responsible for deleting a snapshot crashed. Could you please fix the problem that if this process crashes it is leading to a corrupt state where you cant start the vm anymore and thell me how to fix my corrupt vm now?

thanks

Attachments

strace-output Download (223.8 KB) - added by rowland 9 months ago.
strace output following all child processes when VBoxSVC segfaults deleting a snapshot

Change History

comment:1 Changed 9 months ago by tim43263246

ok i was now able to fix the vm but not without the loss of all snapshots ;( you should realy fix that bug

comment:2 Changed 9 months ago by rowland

I had the same problem. I posted to the forum thread "Discuss the 5.1.4 release". What I wrote follows. I am commenting on this ticket instead of opening a new one for the same problem.

I had a problem with snapshots in 5.1.4 that was not present in 5.1.2. I'm running VirtualBox 5.1.4 on Ubuntu 16.04 LTS. I have one VM that has 5 disks. If I try to remove a snapshot on that VM, the UI gets stuck saying it's deleting the snapshot, but it never finishes. If I close and restart VirtualBox, it claims a disk is missing. The snapshot was removed for the first disk, but not the other 4. The VM appears to still have the snapshot, but it can't be removed. I see the following in dmesg output on my system:

Aug 19 16:56:10 ubuntu-oryx kernel: [14500.247312] DeleteSnap[21019]: segfault at 31 ip 000000000053574c sp 00007f00ce01d950 error 4 in VBoxSVC[400000+49c000]

I completely removed and reinstalled VirtualBox 5.1.4, but that didn't solve the problem. I got the VM back by manually editing its configuration file, then adding the virtual disks back. The only way I could get snapshots working for this VM was to downgrade to VirtualBox 5.1.2. That version has no problem with snapshots on this VM. I can reproduce this in VirtualBox 5.1.4 by just taking a snapshot and then deleting that same snapshot (the only snapshot) immediately.

The 5.1.4 version didn't have this problem with another VM that only has one virtual disk however. VirtualBox 5.1.2 works all the time.

All I have to do to reproduce this on Ubuntu 16.04 LTS is to create a new VM with two virtual disks, snapshot them, then delete the snapshot. I attached gdb to the VBoxSVC process and saw this when it crashed (I know this probably isn't very helpful):

[rowland@ubuntu-nuc ~]$ sudo gdb /usr/lib/virtualbox/VBoxSVC 10043
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/virtualbox/VBoxSVC...(no debugging symbols found)...done.
Attaching to program: /usr/lib/virtualbox/VBoxSVC, process 10043
[New LWP 10045]
[New LWP 10046]
[New LWP 10047]
[New LWP 10048]
[New LWP 10049]
[New LWP 10068]
[New LWP 10069]
[New LWP 10096]
[New LWP 10097]
[New LWP 10100]
[New LWP 10103]
[New LWP 10107]
[New LWP 10181]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f511c426d13 in select () at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) cont
Continuing.
[New Thread 0x7f510b789700 (LWP 10313)]

Thread 15 "DeleteSnap" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f510b789700 (LWP 10313)]
0x000000000053574c in ?? ()
(gdb) bt
#0  0x000000000053574c in ?? ()
#1  0x0000000000535939 in ?? ()
#2  0x0000000000535f68 in ?? ()
#3  0x0000000000535fa1 in ?? ()
#4  0x0000000000513c43 in ?? ()
#5  0x00000000005593f2 in ?? ()
#6  0x000000000055eaf4 in ?? ()
#7  0x00000000004b057a in ?? ()
#8  0x00007f511d4bf5ec in ?? () from /usr/lib/virtualbox/VBoxRT.so
#9  0x00007f511d548e7b in ?? () from /usr/lib/virtualbox/VBoxRT.so
#10 0x00007f511d8296fa in start_thread (arg=0x7f510b789700)
    at pthread_create.c:333
#11 0x00007f511c430b5d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) 

I repeated the experiment and collected the strace output on the same process (following child processses, etc.) in the attachment. Again, I have no idea how useful that is. This is a pretty big problem for my work (though that is definitely not a complaint - VirtualBox is both free and awesome). I just hope to help make it better somehow. Also, I did try the latest testing build as well. It has the same problem. Thanks!

Changed 9 months ago by rowland

strace output following all child processes when VBoxSVC segfaults deleting a snapshot

comment:3 Changed 9 months ago by Papolytic

I see this bug isn't owned by anyone. Is there another similar bug number (I can't find it via search) that I can look at to see if anyone is working on it?

As far as other functionality, things seem to be going well in the latest 5.1.5 testBUilds like Build 110598. The thing for me as noted above is that I can't reliably delete snapshots (well, not all of them) on any guest VM that has 2 or more .vdi disks, and I'm running win 7 pro 64 host. Guests are 1) win 10 64, and 2) ubuntu 16.04 64.

I tried a vboxmanage command for deleting the snapshots and this did work on one snapshot, but it failed on the next so it's not apparently a good answer to anything.

Thanks.

Last edited 8 months ago by Papolytic (previous) (diff)

comment:4 Changed 8 months ago by Jim Carroll

I have the same problem on Ubuntu 15.10 using both the .deb install as well as the "All Platforms" script.

comment:5 Changed 8 months ago by Papolytic

Confirming same problem on 6.1.6. Is there any Win 7 64 strace utility that is recommended for helping with debug? Thanks.

comment:6 Changed 8 months ago by brobertson

This problem appears to be a serious regression introduced in 5.1.4 and persisting in 5.1.6. I can't reproduce it on 5.1.2 (and have reverted to that release to get around the problem).

It also occurs on Windows 8.1 Pro (up-to-date with all Microsoft updates as of this update) hosts. It is 100% repeatable on my PC but I don't have another machine I can test it on right now.

As noted above, it only occurs when there are two or more virtual HDDs.

I've only seen it when deleting the first snapshot on the machine - for example if Snap1, Snap2, and Snap3 are taken in that order, Snap2 and Snap3 can be deleted w/o problems. A workaround therefore MIGHT be to create a snapshot immediately after creating a VM and then never delete that snapshot.

The problem doesn't even require starting the VM (although it still happens in more realistic scenarios where the VM is started/stopped). The guest OS does not seem to matter (I've seen it with Ubuntu 16.04 "Live CD mode", Ubuntu 16.04 installed onto the disks, and Knoppix 7.6.1). In fact, it isn't even necessary to have a bootable system -- the simple reproducible case below exploits this.

When VBoxSVC.exe crashes while deleting the snapshot, the .vbox file is left unchanged and still reflects the snapshot. According to 'showmediuminfo', the base vdi and the snapshot vdi for the first disk have the same relationship after the crash as before -- but the snapshot vdi file's state is 'inaccessible' and the file is, in fact, missing on disk. According to 'showmediuminfo', the base vdi and the snapshot vdi for the second disk have the same relationship after the crash and both the base and snapshot vdi file still exist.

A representative EventLog event for the crash is:

Faulting application name: VBoxSVC.exe, version: 5.1.6.10634, time stamp: 0x57d6d545
Faulting module name: VBoxSVC.exe, version: 5.1.6.10634, time stamp: 0x57d6d545
Exception code: 0xc0000005
Fault offset: 0x00000000000d10b0
Faulting process id: 0x23a0
Faulting application start time: 0x01d211488790ac06
Faulting application path: c:\Program Files\Oracle\VirtualBox\VBoxSVC.exe
Faulting module path: c:\Program Files\Oracle\VirtualBox\VBoxSVC.exe
Report Id: c58778b0-7d3b-11e6-8306-74d4351740fe
Faulting package full name: 
Faulting package-relative application ID: 

The following distilled sequence of commands creates the problem every time on my PC on 5.1.4 & 5.1.6 but not on 5.1.2 -- the last command causes the VBoxSVC crash:

vboxmanage createvm --name TestSnap --basefolder J:\Virtual_Machines\VirtualBox --ostype Linux_64 --register
vboxmanage modifyvm TestSnap --memory 2048 --acpi on --ioapic on --mouse usbtablet
vboxmanage storagectl TestSnap --add sata --name SATA
vboxmanage createhd disk --size 1000 --filename J:\Virtual_Machines\VirtualBox\TestSnap\disk-1
vboxmanage storageattach TestSnap --storagectl SATA --type hdd --port 0 --device 0 --medium J:\Virtual_Machines\VirtualBox\TestSnap\disk-1.vdi
vboxmanage createhd disk --size 1000 --filename J:\Virtual_Machines\VirtualBox\TestSnap\disk-2
vboxmanage storageattach TestSnap --storagectl SATA --type hdd --port 1 --device 0 --medium J:\Virtual_Machines\VirtualBox\TestSnap\disk-2.vdi
vboxmanage snapshot TestSnap take Snap01
vboxmanage snapshot TestSnap delete Snap01

comment:7 Changed 8 months ago by Papolytic

I was just reading your line about how you only see it when deleting the first snapshot. I actually have the problem each time on the first snapshot, and sometimes on others. Recently I was able to delete a couple of 2nd & 3rd snapshots using vboxmanage, but unless you can delete them all, there is no order-based-workaround for me.

I've also done this on non-setup systems just using a livecd along with any type of two virtual disks. I haven't seen it at all on a single vdi vm, but anytime I create two, whether or not I think I've used them, I get the hang + crash + corruption.

I'm a bit surprised that this doesn't seem to be on anyone's radar. I suppose I'll roll back to 5.0.12 (or whichever one I used to use that had the least problems) and use that.

comment:8 Changed 8 months ago by Organic_Marble

I have this same problem, and it is a serious one. My main use case for VB is to try changes and revert them. All my VMs have multiple disks, so I cannot confirm if it works OK with a single disk system.

comment:9 Changed 8 months ago by leetoo

I also have the same problem. Documented in forum  https://forums.virtualbox.org/viewtopic.php?f=6&t=79809.

VM VirtualBox Manager GUI v5.1.4 and v5.1.6 crash when deleting snapshot from a Win7 guest with 3 HDs on SATA Controller.

Deleting snapshots for the same VM was working ok under VBox 5.1.2, so i went back to v5.1.2

Host: Win7Pro 64bit

Guest: Win7Pro 64bit (3 HDs on SATA Controller)

Last edited 8 months ago by leetoo (previous) (diff)

comment:10 Changed 8 months ago by Draenan

Also experiencing this problem.

Host: macOS Sierra 10.12, VirtualBox 5.1.6

Guest: FreeBSD 64-bit

Two vDisks on SATA, one snapshot. Attempt to delete the snapshot results in some work, then the progress bar disappearing and the first vDisk being listed as being listed as "Differencing, Inaccessible" in the VirtualBox Manager. Also disk is listed as "{UUID string}.vdi" instead of its actual name "vDisk1.vdi". It appears to be the snapshot name? In the "Snapshots" folder there is only one entry, which appears to belong to the second disk.

comment:11 Changed 8 months ago by Vorg

The same situation, 2 discs, after snapshot deletion a whole VM was corrupted and had to be deleted. VM VDIs were in inconsistent state and had to be deleted also. Two days of VM settings are gone :-(

Host Win7 64 Pro Guest Win7 64 Pro SATA, 2 HD, 1 Optical

comment:12 Changed 8 months ago by nim

I tested myself and it's still broken in 5.1.7-111038 test build. Maybe you guys are too busy with other critical issues and did not have time to fix this small problem.

comment:13 Changed 8 months ago by Papolytic

I'm hoping that Oracle Vbox workers aren't belittling "no snapshot deletion with > 1 VDI" as unimportant. I never hear anything about progress on this issue.

I'm very willing to be patient about problems with open software, but I don't want this to fall off the radar.

Thanks

comment:14 Changed 7 months ago by Red Viking

I encounter this problem, too.

Host: Windows 10, VirtualBox 5.1.6 Guest: Lubuntu 16.04.1

My virtual machine also has two VDIs.

comment:15 Changed 7 months ago by klaus

Don't worry, snapshot issues (like everything which puts the user's data at risk) will not fall off the radar.

Currently the testbuild upload is running... check https://www.virtualbox.org/wiki/Testbuilds - any 5.1 builds with revision 111231 or later should be working again.

This particular issue was a 5.1.4 regression caused by a tiny behavior change as part of a cleanup, converting to one code base for task management.

Last edited 7 months ago by klaus (previous) (diff)

comment:16 Changed 7 months ago by Papolytic

@#15: Klaus : Thanks for the headsup on 111231. I'll give that a try and report back later on today.

Cheers

Edit:: Build 111231 : Tests:

Host: Win 7 SP1 Pro 64: Guest1: 4 VDI version of Ubuntu 16.04 LTS up to date and 64 bit. Guest2: 3 VDI version of Win 10 (most recent fast-ring build) up to date, 64 bit:

Result: I deleted 7 4 vdi snapshots on Ubuntu and 3 on Win 10.

No problems. Thanks for the info and good work to the team! Many thanks.

Edit 2: Tested with Build 111271 this AM and all seems well with snapshots , other issues.

Last edited 7 months ago by Papolytic (previous) (diff)

comment:17 Changed 7 months ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fix is part of VBox 5.1.8. Please open separate tickets for unrelated issues.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use