How Controllers Maximize SSD Life – Other Error Management

Tempus FugitThere are more advanced means than simple error correction to help remove bit errors in NAND flash and those will be the subject of this post.  The general term for this approach is “DSP” although it seems to have very little to do with the kind of DSP algorithm used to perform filtering or build modem chips.

While ECC corrects errors without knowing how they got there, DSP helps to correct any of the more predictable errors that are caused by internal error mechanisms that are inherent to the design of the chip.  A prime example of such an error would be adjacent cell disturb.

Here’s a brief explanation of adjacent cell disturb: Since each bit cell in a flash chip is like one of an array of tiny capacitors all manufactured on the same die, there can be cross coupling between adjacent cells.  An adjacent cell may bleed charge off of a neighboring cell.  During reads and writes the energy passing through the adjacent cell may push a neighboring cell’s bit just past its threshold – during a read the current drawn off the adjacent cell may draw some charge off a neighboring cell, and during a write the high fields in the adjacent cell may increase the charge of its neighbor.

The DSP engine knows which bits in a chip are adjacent to which other bits, and actually changes the data written into the flash into “symbols” (a term from communications technology).  The symbols are chosen to make the data less sensitive to errors stemming from the effects of adjacent cell disturb.

There is a science surrounding the selection of the right symbols, since they impact other aspects of the SSD including write speed.

When reading the flash the symbols are re-mapped into data, a process that corrects predictable errors before they are processed through the ECC algorithm.

DSP corrects the more predictable errors leaving the less predictable errors for the ECC algorithm to clean up.

Since cleaner data is passed to ECC for correction, DSP effectively adds more bits of error correction to the ECC.  As we saw in the ECC post, adding bits of error correction can extend the endurance of the flash chip, so in an indirect way, DSP not only corrects errors, but it also extends the life of the flash in an SSD.

DSP is not all that widely used today, but a growing number of future SSD controllers are likely to include this approach.

This post is part of a series published by The SSD Guy in September-November 2012 to describe the leading methods SSD architects use to get the longest life out of an SSD despite the limited number of erase/write cycles that NAND flash specifications guarantee.  The following list provides the names of all of these articles, and hot links to them:

Click on any of the above links to learn about how each of these techniques works.

Alternatively, you can visit the Storage Networking Industry Association (SNIA) website to download the entire series as a 20-page booklet in pdf format.