Ticket #16795 (new defect)

Opened 3 years ago

Last modified 2 weeks ago

TRIM on SSD not stable with Linux, Windows 10, FreeBSD

Reported by: linuxguru Owned by:
Component: other Version: VirtualBox 5.1.22
Keywords: TRIM SSD Cc:
Guest type: all Host type: Windows

Description (last modified by frank) (diff)

Linux TRIM timeouts sometimes, FreeBSD same, Windows 10 doesn't boot at all

See below and attachments for details:

Linux Fedora 25 latest updates

  1. Attach an image:
    VBoxManage storageattach "Fedora 25" --storagectl "SATA" --port 0 --device 0 --nonrotational on --discard on --medium "Fedora 25.vdi" --type hdd
  2. check trim enabled
    hdparm -I /dev/sda | grep TRIM
               *    Data Set Management TRIM supported (limit unknown)
  3. Enable it in the mount options and reboot afterwards
    /dev/mapper/fedora-root /                       ext4    defaults,discard        1 1
  4. run fstrim /

Hangs sometimes with timeouts

FreeBSD 11 latest updates

  1. Attach disk as under Linnux
  2. Check for TRIM support
    camcontrol identify /dev/ada0 | grep -i trim
    Data Set Management (DSM/TRIM) yes
  3. Enable TRIM support on filesystem
    mount | grep -i ufs
    /dev/ada0p2 on / (ufs, local, journaled soft-updates)
    # Boot into single user mode
    tunefs -t enable /dev/ada0p2
    tunefs: issue TRIM to the disk set
    bash -c 'tunefs -p /dev/ada0p2 2>&1 | grep -i trim'
    tunefs: trim: (-t)                                         enabled


VirtualBox_Fedora 25_28_05_2017_08_44_23.png Download (13.1 KB) - added by linuxguru 3 years ago.
Fedora 25 TRIM problem
VirtualBox_FreeBSD - UFS_27_05_2017_22_08_25.png Download (11.9 KB) - added by linuxguru 3 years ago.
FreeBSD UFS Trim problem
VirtualBox_FreeBSD_27_05_2017_22_20_32.png Download (10.5 KB) - added by linuxguru 3 years ago.
FreeBSD ZFS Trim problem

Change History

Changed 3 years ago by linuxguru

Fedora 25 TRIM problem

Changed 3 years ago by linuxguru

FreeBSD UFS Trim problem

Changed 3 years ago by linuxguru

FreeBSD ZFS Trim problem

comment:1 Changed 3 years ago by frank

  • priority changed from blocker to major
  • Description modified (diff)

comment:2 Changed 3 years ago by linuxguru

Any update on this issue? It is important for thin provisioning.

comment:3 Changed 3 years ago by deAtog

I believe this is related to #16450. I've reviewed the source code for the discard option and it appears that it has several issues. The current implementation for handling discards does the following in the following situations:

  1. If the TRIM'd block clears a partial VDI data block, the area is filled with 0's
  1. If the TRIM'd block clears an entire VDI data block:
    1. The TRIM'd block in the VDI block header is marked as unallocated.
    2. The last block in the VDI is read into memory and written to the TRIM'd block location.
    3. The block pointing to the last block in the VDI file is updated with the new location.
    4. The VDI is truncated to remove the last allocated block.

As you can see, in the 2nd case above, there is a lot of IO that happens whenever the guest OS TRIM's an entire VDI data block. All of the steps in that case do not take any precaution to ensure that they occur sequentially and uninterrupted. Any interruption results in an IO error reported to the guest OS, which may subsequently retry the operation. This further exacerbates the issue and is what I believe causes the issues seen here. It is my opinion that the implementation of the discard option needs to be completely rewritten.

I would attempt such a rewrite, but other projects are currently occupying my time. For any developers looking at this ticket, I would do the following to resolve this and improve the functionality of this option.

  1. When a VDI is opened and this option is enabled do the following:
    1. Create a list of all VDI data blocks that have been allocated (based on block size and file size).
    2. Iterate over the VDI block header and remove all blocks from the list which are in use.
    3. Convert the remaining list to a min heap, called the free-data heap.
    • Note: in a 100% used VDI, this will result in an empty heap.
  1. When a TRIM command is received:
    1. If a partial VDI block is TRIM'd:
      1. Don't do anything. There's no requirement that free space must contain 0's.
    2. If an entire VDI block is TRIM'd:
      1. Mark the block as unallocated in the VDI block header.
      2. Add the location to the free-data heap.
  1. When a new block needs to be allocated:
    1. If the free-block heap is NOT empty:
      1. Remove the minimum free data location from the free-data heap.
      2. Assign and update the location to the block being allocated.
  1. If the free-block heap IS empty:
    1. Enlarge the VDI by the VDI block size.
    2. Assign the location of the space to the block being allocated.
  1. When the VDI is closed:
    1. Iterate over any remaining free locations in the free-data heap.
    2. Move the data from the last data blocks in the VDI to the available free-data locations.
    3. Update the VDI block header with the new data locations as data blocks are moved.
    4. Truncate the VDI by the number of data blocks moved.

If the above is implemented, discards are reduced to a quick update of the VDI block header. Every freed, but allocated, data block also allows future block allocations to be much simpler. The consistency of the VDI is maintained by the fact the free-data heap is rebuilt when the VDI is opened ensuring any allocated, but free blocks are correctly added to the heap. In an ideal world, the initial free-data heap would always be empty.

comment:4 Changed 2 years ago by SixEcho

agree discard/trim needs serious attention and would be a really useful feature. (see SATA 3.1 Queued Trim)

testing on win8 vdi with a lot of non-trimmed space... run disk optimize causes a lot of trims to be issued, which seems to overwhelm discard making the disk unresponsive to the guest which eventually crashes/resets.

00:01:29.409587 AHCI#0: Port 0 reset
00:01:30.545526 VD#0: Discard request was active for 31 seconds
00:01:30.545569 VD#0: Cancelling all active requests
00:02:00.550698 AHCI#0: Port 0 reset
00:02:03.724005 GIM: HyperV: Guest indicates a fatal condition! P0=0x7a P1=0xc68e28 P2=0xc000000e P3=0x19b8e880 P4=0x8d1c5670
00:02:06.691342 VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago
00:02:07.302353 GIM: HyperV: Reset initiated through MSR
00:02:07.302423 Changing the VM state from 'RUNNING' to 'RESETTING'

comment:5 Changed 16 months ago by oddsocks

FWIW this still seems to occur with VB 6.0.0

comment:6 Changed 2 weeks ago by facboy

As a point of reference, this is still occurring on VB 6.1.4, Windows 10 host, CentOS 8 guest.

Note: See TracTickets for help on using tickets.
ContactPrivacy policyTerms of Use