Extreme SSD Error Correction
At last week’s International Solid State Circuits Conference (ISSCC) Shuhei Tanakamaru, a researcher from Japan’s Chuo University, detailed a scheme to reduce MLC SSD bit error rates (BER) by 32 times over conventional techniques. The approach used an impressive combination of mirroring, vertical and horizontal error correction, and a deep understanding of the most likely kinds of bit errors flash will experience.
This is a very novel and well-conceived technique that may find industry adoption in future SSDs.
The steps included in the paper are used in addition to the standard error correction schemes that I presented in my recent series How Controllers Maximize SSD Life. Here are brief descriptions of the elements of the architecture.
- Standard ECC is used as a foundation
- The mirroring approach doesn’t simply replicate pages: the data is written into the mirror page in reverse order. This helps to reveal errors that stem from the data’s distance within the page from the source line (a signal line at the starting end of the page)
- The data in the mirror page is also inverted from that of the original page. This helps the controller to understand the nature of the error to help decide whether the original page or the mirrored page carries the correct data (see the next bullet)
- A technique the researchers call ERS (Error Reduction Synthesis) looks at the errors with respect to their position on the page: In the lower half it is more common for a “1” to erroneously become a “0.” In the upper half the opposite is true. The error is corrected according to this understanding. When combined with the above two mirroring techniques this step gives a correct decision in all but 0.01% of the cases.
- The block is regarded as a RAID array of NAND pages instead of HDDs, with the last page used to store parity for all the other pages in the block. This parity is calculated as each page is written and is temporarily stored in an ReRAM for this design, although any one of a number of other technologies could be used instead, including PCM, MRAM, and even SRAM or DRAM, depending on how the designer wanted to deal with power failures. Once the block has been filled the ReRAM data is written into the last page.
- A final touch, called “Error Masking” keeps a compressed map of all previously-failed bits.
The combination of all these techniques improves the bit error rate of 2xnm MLC NAND by 91% over conventional ECC. This can either be used to reduce the BER for a given SSD lifetime, or to extend the SSD lifetime for a given BER.
Before you rush to point out that a mirrored design requires double the NAND flash of a conventional SSD, consider the fact that a gigabyte of SLC NAND costs 6-7 times as much as a gigabyte of MLC NAND. An SSD that uses twice as much MLC flash to provide BERs or lifetimes as long as those offered by SLC SSDs will be a less costly alternative.
The Chuo University design is a very thorough approach to improving SSD quality to help counter the continuing degradation in NAND endurance as it migrates to increasingly-finer processes.