Although the Trim command has been defined for nearly a decade, for some reason I have never written a post to explain it. It’s time for that to change.
Trim is something that was never required for HDDs, so it was a new command that was defined once SSDs became prevalent. The command is required because of one of those awkward encumbrances that NAND users must accommodate: Erase before write.
NAND flash bits cannot be altered the same way as an HDD. In an HDD a bit that’s currently set to a “1” can be re-written to a “0” and vice versa. Writing a bit either way takes the same amount of time. In NAND flash a 1 can be written to a zero, but the opposite is not the case. Instead, the entire block (4-16k bytes) must be erased at once, after which all bits are set to a 1. Once that has been done then zeros can be written into that block to store data. An erase is an excruciatingly slow operation, taking up to a half second to perform. Writes are faster, but they’re still slow.
Let’s say that a program needs to update a single byte within the block: The most straightforward way to do this is to have the controller read all of the data from the block and write all of it into another “fresh” (or erased) block, except for the byte that needs to be changed. New data is written into that byte location. The new block is then mapped into the area that was originally inhabited by the old block, and the old block is then scheduled for erasure. When an SSD fills up there are fewer fresh blocks to move the data into, and it gets to a point where the SSD slows down as it waits for new blocks to be erased to accept new data.
It would be good to optimize the SSD so that as many blocks as possible are scheduled for erasure as early as possible to avoid bottlenecks. But existing programs don’t usually tell an SSD or HDD when they stop using a block. The SSD often will need to wait until a block is over-written before it learns that the old data is no longer useful, so it may have spent several seconds, minutes, hours, or even days keeping obsolete data in a block when that block could have been erased during idle moments.
The Trim command allows the software to inform the SSD that a block will no longer be used by the program. It essentially gives permission to the SSD to erase the block any time it desires. It doesn’t “Command” the SSD to erase the block (as in: “Do this NOW!”) but it gives the SSD the ability to do a little more housekeeping than it would normally be able to do. The way that Trim is defined, the SSD may or may not even use it. This is left completely to the controller’s discretion.
The benefit of using Trim is that more erased blocks will be available in time for data writes, and this means that the SSD will perform more consistently and will be a little faster than it would be without Trim.
“Trim” doesn’t stand for anything – it’s not an acronym, so standard rules of English recommend that it should not be spelled in all caps, but still many people write it as “TRIM” just as they do with NAND flash, which is also not an acronym. The SSD Guy likes to keep with standard rules, though, so I only capitalize the letter “T”. Since it’s a relatively new standard either can be used. In fact, it goes by other names in non-SATA protocols: It’s called “Deallocate” in NVMe and “Unmap” in SCSI/SAS.
(This post’s graphic, in case you were wondering, is a trimmer resistor, which I simply chose for its name. It has nothing to do with the SATA Trim command. Linear designs sometimes use these to adjust a tricky voltage level. It measures about a centimeter across. You insert a small screwdriver into the slot and turn it back and forth until you get the desired results. It’s how you “Trim” a linear circuit.)
The trim command is only useful to reduce write amplification. The SSD can only erase at a full erase block which is a fairly large area. To do that it needs to move in-use blocks to another place and then it can erase the erase block and get it ready to be written again. The trim command allows the SSD to not copy some parts of the effectively unused data and thus reduce the write amplification.
Just to make sure things are “aligned” – NAND can be read/written in full pages which are typically 4K-16K in size but erased in blocks, which are typically MBs in size (varies across suppliers, NAND generations etc.). This is what is causing the creation of “garbage” – pages that are valid and pages that are not valid within same erase block. The SSD needs to “garbage collect” over time so erase block can be reclaimed – erasing non valid data and moving valid data to other blocks.
Amnon, That was very well explained.
Many thanks for taking the time to write it!
Jim
Can you comment on how different controllers handle trim? I know sandforce in its day was pretty weird, when it came to trim.
I have to admit that this is an area I have never investigated.
SandForce had good reason to be weird – the controller’s internal compression made their block lengths vary. That was already difficult to manage since the static-length HDD-interface sectors were converted into variable-length compressed blocks before being stored into fixed-length flash pages. Freeing up space in this environment would necessarily take a lot of management.