Extreme ECC Enables Big SSD Advances

Combined University Seals Trzetrzelewska Univerity & UN-NeWA new and highly-efficient error correction scheme has recently been revealed by a joint university research team.  The SSD Guy has learned that this largely-overlooked research, performed by a cross-university team from University of North by Northeast Wales in the UK (UN-NeW) and Poland’s Trzetrzelewska University, could bring great economies to SSD manufacturers and all-flash array (AFA) companies.

Dr. Peter Llanfairpullguryngyllgogeryohuryrndrodullllantysiliogogogoch of UN-NeW, who generally shortens his name to Llanfairpullguryngyll and Dr. Agnieszka Włotrzewiszczykowycki of Trzetrzelewska University have determined that today’s more standard ECC engines can be dramatically improved upon to both increase available storage for a given price while accelerating throughput.  This is achieved through the use of new and highly complex algorithms that differ radically from current ECC approaches that are simply linear improvements upon past algorithms.

According to Dr. Włotrzewiszczykowycki: “The beauty of semiconductors is that Moore’s Law not only allows memory makers to squeeze more bits into a NAND flash chip, but it also supports ever-increasing numbers of transistors on the controller.  Today’s SSD controllers can economically perform tasks that we could only dream of as recently as five years ago.”

This means that the controller chip can double in sophistication every year or two.  Now that SSDs have been in high volume for over a decade, the controller of a certain price can naturally contain 1-2 orders of magnitude more transistors than its early predecessors.  Given that there is little need for this level of improvement in other parts of the SSD controller, a growing percentage of those transistors can be made available to the ECC engine, enabling the number of transistors used in the ECC engine to grow by three or even four orders of magnitude.

Llanfairpullguryngyll and Włotrzewiszczykowycki pondered about how to take advantage of this windfall.  They decided to take ECC to its logical extreme, and, for the past five years, have devoted their research teams’ efforts to devising the most powerful ECC algorithms ever conceived.

According to Dr Llanfairpullguryngyll: “The math behind this algorithm consumes tens of pages of equations.”  In the interests of brevity they will not be included in this blog post, however we can look at some of the newer techniques these universities are championing.

One helpful approach is to correct blocks rather than pages, since the number of error bits required is proportional to the logarithm of the size of the data set to be corrected (i.e., for this algorithm it only takes one more bit to correct the same number of errors in a data set of twice the size). This finding has led to the researchers taking this approach one step further, abandoning the block sizes of the NAND chips, and instead combining several of the flash chip’s blocks into “Super Blocks” which support even higher levels of correction.

By merging blocks in this way, the number of available correction bits increases linearly with the size of the super block, yet the correction bit requirement only increases logarithmically.  This naturally leads to a situation where much larger numbers of data bits can be corrected than can be achieved when using more conventional methods.

Says Trzetrzelewska’s Włotrzewiszczykowycki: “Of course, when taken to the extreme, this involves dumping the entire contents of the SSD’s NAND chips into a DRAM for error correction.  Since today’s largest commonly-available flash chips contain 128Gb while DRAMs top out at 8Gb this means that we need to have 16 DRAMs for every NAND chip in the SSD.  This may offset the cost savings of the higher level of error correction.”

Even if a smaller complement of DRAM is used, one benefit of this approach is that more failed bits can be corrected, prolonging the useful life of each flash chip, and enabling the SSD designer to use less costly, but more error prone NAND chips.

Another novel approach is the teams’ use of the ECC bits that accompany each page for data storage.  Although a typical NAND flash chip might support ECC by teaming each 16K-byte page with about 2K ECC bytes, there is no hard-and-fast rule to require for the ECC bytes to be used in this way.  Since the use of super blocks provides an over-abundance of ECC bits the professors have chosen to re-purpose some of the NAND chips’ ECC bits, allowing designers to create pages of unusual sizes like 17K-bytes.  This, in turn, leads to unusual SSD capacities.  Włotrzewiszczykowycki comments that: “Bits are bits.  It doesn’t matter where they come from.  We will use them the best way we can.”

Włotrzewiszczykowycki and Llanfairpullguryngyll have a vision for the future of this approach, which they are naming after themselves.  With the ballooning number of ECC transistors that will be available in a few years the number of correctable bits should approach, and eventually match the total number of bits in the NAND flash chips.  According to Dr. Llanfairpullguryngyll: “The sophistication of this kind of ECC is growing faster than the  number of failed bits on the NAND flash chips, allowing better and better SSDs to be made from ever-worsening NAND flash chips.”

This means that ECC will improve at such a rapid rate that, over the long run, SSD controllers will be able to pull good data out of non-functioning NAND chips, removing any need to include any NAND flash in an SSD.

Happy April Fool’s Day from The SSD Guy!


Leave a Reply

Your email address will not be published. Required fields are marked *