At last week’s International Solid State Circuits Conference (ISSCC) Shuhei Tanakamaru, a researcher from Japan’s Chuo University, detailed a scheme to reduce MLC SSD bit error rates (BER) by 32 times over conventional techniques. The approach used an impressive combination of mirroring, vertical and horizontal error correction, and a deep understanding of the most likely kinds of bit errors flash will experience.
This is a very novel and well-conceived technique that may find industry adoption in future SSDs.
The steps included in the paper are used in addition to the Continue reading “Extreme SSD Error Correction”
SNIA (The Storage Networking Industry Association) has conferred a great honor upon the SSD Guy by bringing all of the blog posts in the series How Controllers Maximize SSD Life into a single printed volume of the same name.
Readers can either ask for a print copy from SNIA, or can download a pdf rendition by visiting the SNIA SSSI (Solid State Storage Initiative) education web page.
During this month’s Storage Visions conference, SMART Storage Systems hosted a “NAND Band” party. The company kept the details secret until the guests were all there, after which two “Blues Brothers” impersonators (SMART’s president John Scaramuzzo and Rick Neff, Director of Business Development) showed up in a video singing their new rendition of the 1966 Spencer Davis Group hit: “Gimme Some Lovin’.” SMART’s version was called: “Gimme Some Endurance” and the lyrics centered around the importance of endurance in SSDs.
(SMART’s NAND Band should not be confused with the techno band named NAND which I only discovered while writing this post.)
The reception was held only a couple of hours after Continue reading “The NAND Band!”
This is a bad day for The SSD Guy. I just finished publishing an eight-part series explaining How Controllers Maximize SSD Life, then my evil twin The Memory Guy today published a post telling of a new flash design from Macronix that might just eliminate the flash wear-out mechanism!
But my concerns are inconsequential compared to the feelings of all those folks who have devoted phenomenal time and energy to develop wear management algorithms.
This all stems from an article in the IEEE Spectrum that details a flash chip design that Continue reading “SSDs that Don’t Wear Out”
Given that you have used all those other forms of improving SSD wear that we have discussed so far, but you still don’t find that this is enough, what do you do next? Well a few SSD controllers go one step further and manage some of the inner workings of the NAND flash chip itself.
If that sounds like a significant undertaking to you, then you clearly understand why so very few controllers take this approach. The information used to perform this function is not generally available – it takes a special relationship with the NAND flash supplier – and you can’t develop this relationship unless the NAND supplier Continue reading “How Controllers Maximize SSD Life – Internal NAND Management”
One way that SSD controllers maximize the life of an SSD is to use feedback on the life of flash blocks to determine how wear has impacted them. Although this used to be very uncommon, it is now being incorporated into a number of controllers.
Here’s what this is all about: Everybody knows that endurance specifications tell how much life there is in a block, right? For SLC it is typically 100,000 erase/write cycles, and for MLC it can be as high as 10,000 cycles (for older processes) but goes down to 5,000 or even 3,000 for newer processes. TLC endurance can be in the hundreds of cycles. Now the question is: “What happens after that?”
In most cases individual bits start to Continue reading “How Controllers Maximize SSD Life – Feedback on Block Wear”
Over provisioning is one of the most common ways that SSD designers can help assure that an SSD has a longer life than the flash’s endurance rating would support. If an SSD contains more flash than is presented at its interface, the controller can manage wear across a larger number of blocks while at the same time accelerating disk performance by moving slow operations like block erases out of the way of the SSD’s key functions.
Many people like to compare wear leveling to rotating a car’s tires. In this vein, think of over provisioning as having a bunch of spare Continue reading “How Controllers Maximize SSD Life – Over Provisioning”
Write amplification plays a critical role in maximizing an SSD’s usable life. The lower the write amplification, the longer the SSD will last. SSD architects pay special attention to this aspect of controller design.
Unlike the other factors described in this series this is not a technique that extends flash life beyond the 10,000 erase/write cycles that one would normally expect to result in a failure, but it is very important to SSD longevity.
Write Amplification is sufficiently complex that I won’t try to define it in this post, but Continue reading “How Controllers Maximize SSD Life – Reduced Write Amplification”
There are more advanced means than simple error correction to help remove bit errors in NAND flash and those will be the subject of this post. The general term for this approach is “DSP” although it seems to have very little to do with the kind of DSP algorithm used to perform filtering or build modem chips.
While ECC corrects errors without knowing how they got there, DSP helps to correct any of the more predictable errors that are caused by internal error mechanisms that are inherent to the design of the chip. A prime example of such an error would be adjacent cell disturb.
Here’s a brief explanation of Continue reading “How Controllers Maximize SSD Life – Other Error Management”
Error correction (ECC) can have a very big impact on the longevity of an SSD, although few understand how such a standard item can make much difference to an SSD’s life. The SSD Guy will try to explain it in relatively simple terms here.
All NAND flash requires ECC to correct random bit errors (“soft” errors.) This is because the inside of a NAND chip is very noisy and the signal levels of bits passed through a NAND string are very weak. One of the ways that NAND has been able to become the cheapest of all memories is by requiring error correction external to the chip.
This same error correction also helps to correct bit errors due to wear. Wear can cause bits to become stuck in one state or the other (a “hard” error), and it can increase the frequency of soft errors.
Although it is not widely Continue reading “How Controllers Maximize SSD Life – Improved ECC”