How Controllers Maximize SSD Life – Internal NAND Management

Tempus FugitGiven that you have used all those other forms of improving SSD wear that we have discussed so far, but you still don’t find that this is enough, what do you do next?  Well a few SSD controllers go one step further and manage some of the inner workings of the NAND flash chip itself.

If that sounds like a significant undertaking to you, then you clearly understand why so very few controllers take this approach.  The information used to perform this function is not generally available – it takes a special relationship with the NAND flash supplier – and you can’t develop this relationship unless the NAND supplier is certain that you won’t go sharing its secrets with its competition.  This means that the SSD controller is dedicated to a single brand of NAND, and it means that the SSD maker can’t shop around among NAND suppliers for the best price.  Furthermore, the NAND supplier won’t share this information unless it believes that there is some compelling reason to work the SSD manufacturer.  Since there are hundreds of SSD makers it’s really difficult to get these companies to pay attention to you!

The SSD manufacturers that have this kind of relationship with their flash suppliers are very rare and very special.

Let’s say that you’re an SSD maker that’s in the happy position where a NAND supplier is willing to disclose such information to you.  What will you learn?  You will be told how to manage a lot of variables that can be controlled by changing internal settings on the NAND flash, and of parameters that influence bit errors:

  • Thresholds between levels in MLC
  • Programming times and algorithms
  • Physical placement relationships between adjacent cells
  • Means of de-trapping electrons trapped in the tunnel oxide

There are more than these, but these are the ones The SSD Guy knows about.  These are the kinds of things that can be controlled if you understand how to access the special internal test modes that are implanted within every flash chip sold today.

To give you an idea of how this works, let’s look at a single parameter: MLC level sensing.

When an MLC flash is programmed it places two bits on a cell by charging the cell to one of four levels:

  • Uncharged
  • 1/3 charged
  • 2/3 charged
  • Fully charged

Certain inevitable issues can cause these levels to veer off-center.  Trapped electrons in the tunnel oxide can make the floating gate appear to hold a larger charge than the programming algorithm actually placed on it.  Thermal or other influences can drain electrons off the cell to reduce the charge.  Adjacent cell reads and writes may add or remove electrons from the floating gate.

A controller that understands these phenomena can adjust the sensing thresholds within the NAND chip to help reduce the chance that a bit will be misinterpreted.  If a cell is supposed to be 2/3 charged, but it looks like it’s 1/2 charged, will it be read as 1/3 charged or as 2/3 charged?  If the thresholds are managed by the SSD controller to compensate for other factors the controller can reduce the impact of this source of bit errors.

Clearly this is a pretty extreme way to squeeze the last bit of life out of NAND flash.  It is used to push the technology far beyond its specified wear, but the companies that use it are very successful in achieving flash life well beyond that of controllers that don’t use this approach.

This post is the last of a series published by The SSD Guy in September-November 2012 to describe the leading methods SSD architects use to get the longest life out of an SSD despite the limited number of erase/write cycles that NAND flash specifications guarantee.  The following list provides the names of all of these articles, and hot links to them:

Click on any of the above links to learn about how each of these techniques works.

Alternatively, you can visit the Storage Networking Industry Association (SNIA) website to download the entire series as a 20-page booklet in pdf format.