Context Navigation

← Previous Ticket
Next Ticket →

#20480 reopened defect

SSM: Giving up: Too many passes! (1048576)

Reported by:	AO	Owned by:
Component:	other	Version:	VirtualBox 6.1.10
Keywords:	Snapshot	Cc:	j@…
Guest type:	Windows	Host type:	Linux

Description

We use Virtualbox to make crash-consistent backups of VMs. The script does basically:

Take snapshot
Clone snapshot to backupfolder
Delete snapshot

In some cases (we believe it might be 'busy Windows servers running databases') the backup takes longer than normal. The next day it will take even longer, until after a few days the whole snapshot-taking will fail:

VBoxManage: error: Failed to take snapshot VBoxManage: error: Failed to save the machine state to '/mnt/path/to/file.sav' (VERR_SSM_TOO_MANY_PASSES) VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component ConsoleWrap, interface IConsole VBoxManage: error: Context: "RTEXITCODE handleSnapshot(HandlerArg*)" at line 483 of file VBoxManageSnapshot.cpp

A subtract of the logfile does show the problem 'building up' over several days. As soon as the problem exists (and the snapshot fails and keeps failing) it will not go away until we reboot the host:

20:58:09.224319 SSM: Step 1 completed after pass 77. 44:58:12.823156 SSM: Step 1 completed after pass 66. 68:58:11.413950 SSM: Step 1 completed after pass 68. 92:58:12.074819 SSM: Step 1 completed after pass 66. 116:58:13.100936 SSM: Step 1 completed after pass 71. 140:58:17.220420 SSM: Step 1 completed after pass 76. 164:58:16.008601 SSM: Step 1 completed after pass 67. 188:58:17.074839 SSM: Step 1 completed after pass 67. 212:58:17.135282 SSM: Step 1 completed after pass 69. 236:58:17.638284 SSM: Step 1 completed after pass 66. 260:58:18.045740 SSM: Step 1 completed after pass 68. 284:58:18.171871 SSM: Step 1 completed after pass 67. 308:58:17.953139 SSM: Step 1 completed after pass 66. 332:58:18.609908 SSM: Step 1 completed after pass 69. 356:58:19.043087 SSM: Step 1 completed after pass 67. 380:58:45.760012 SSM: Step 1 completed after pass 397. 405:00:09.558703 SSM: Step 1 completed after pass 1598. 429:01:14.283550 SSM: Step 1 completed after pass 2457. 453:16:20.819291 SSM: Step 1 completed after pass 14096. 498:30:29.200112 SSM: Step 1 completed after pass 964826. 549:39:38.213154 SSM: Giving up: Too many passes! (1048576) 549:39:47.451837 SSM: Failed to save the VM state to '/mnt/path/to/file.sav' (file deleted): VERR_SSM_TOO_MANY_PASSES 597:35:18.928754 SSM: Giving up: Too many passes! (1048576) 597:35:40.379878 SSM: Failed to save the VM state to '/mnt/path/to/file.sav' (file deleted): VERR_SSM_TOO_MANY_PASSES

We observed that the .sav-file is growing really big before failing.

The whole VBox.log is attached, as well as the relevant commands of our bash-script.

Attachments (3)

VBox.log.zip (83.8 KB ) - added by AO 3 years ago.: VBox log file
script-commands.txt (194 bytes ) - added by AO 3 years ago.: Commands from our bash script
Increase in daily snapshot size.txt (1.2 KB ) - added by AO 3 years ago.: Increase in daily snapshot file.

Download all attachments as: .zip

Change History (7)

by AO, 3 years ago

Attachment:	VBox.log.zip added

VBox log file

by AO, 3 years ago

Attachment:	script-commands.txt added

Commands from our bash script

comment:1 by Klaus Espenlaub, 3 years ago

Resolution:	→ invalid
Status:	new → closed

You're asking for it (in other words, VirtualBox is behaving exactly as designed)... live snapshotting causes exactly those symptoms if the VM is "too active". It keeps the VM running, trying to incrementally save the changed memory faster than the guest can change it. Which should be quite obvious that it causes big saved state files and cannot be guaranteed to work since memory can be changed much faster than it can be written to disk.

If you want the snaphshot to succeed then don't use the --live option. Otherwise deal with the failure and try again later, when the guest is less busy changing memory.

The other side effect of not using --live is that the saved state file will usually be much smaller.

by AO, 3 years ago

Attachment:	Increase in daily snapshot size.txt added

Increase in daily snapshot file.

comment:2 by AO, 3 years ago

L.S,

You are missing the problem.

The problem is that the daily snapshots are related. When a snapshot grows on day 1, it will also grow in day 2 and the subsequent days intill after about a week the snapshot will collapse with an error.

See newly attached file. This is abstract of the VBox Log file

Regards,

comment:3 by AO, 3 years ago

Resolution:	invalid
Status:	closed → reopened

comment:4 by AO, 3 years ago

L.S,

The problem is really very annoying and we would like to have a solution to this problem. The crashes occur about every other week.

It would help us a lot if you can make a switch (like –NoMemDump) which suppresses the generating of the memory dump when we take a –live snapshot.

I know that it is possible to suppress the memory dump by not using the –live option but this has as consequence that we have to stop production servers every night for several hours which is obviously out of the question.

De I have to make a new ticket as this is a request for a new feature or can we take it from here?

Regards,

August

Note: See TracTickets for help on using tickets.

Download in other formats: