SSDs Need Controllers with More, NO! Less Power

More Power-Less PowerThe Storage Developer Conference in September gave a rare glimpse into two very different directions that SSD architectures are pursuing.  While some of the conference’s presentations touted SSDs with increasing processing power (Eideticom, NGD, Samsung, and ScaleFlux) other presentations advocated moving processing power out of the SSD and into the host server (Alibaba, CNEX, and Western Digital).

Why would either of these make sense?

A standard SSD has a very high internal bandwidth that encounters a bottleneck as data is forced through a narrower interface.  It’s easy to see that an SSD with 20+ NAND chips, each with an 8-bit interface, could access all 160 bits simultaneously.  Since there’s already a processor inside the  SSD, why not open it to external programming so that it can perform certain tasks within the SSD itself and harness all of that bandwidth?

Example tasks would include sorts and other database management, compression and deduplication, encryption, and so forth – tasks that don’t require a lot of processing power, but that move a lot of data out of, and back into, the storage medium.

Micron presented a proof-of-concept model of this at the Flash Memory Summit five years ago in 2013, but the idea appears to have been considered for some time prior to that, with examples being contrived out of everything from specially-designed small computing systems (in Carnegie-Mellon University’s FAWN – the Fast Array of Wimpy Nodes) to conventional flash SDcards with modified internal firmware.

While some advocates of this approach have called this “In-Situ Processing,” others dislike the fact that this same term can legitimately be used for in-memory processors like Micron’s Automata Processor (recently spun out to Natural Intelligence Semiconductor) or the Venray TOMI.  To distinguish the two from each other they have devised a newer, more specific term for smarter SSDs: “Computational Storage.”  (The SSD Guy appreciates this, too, since TOMI and Automata fit into The Memory Guy blog, while Computational Storage fits into The SSD Guy blog!)

In the Computational Storage community there’s a difference of opinion about how much computing power belongs inside of an SSD.  The Micron example used the hardware that the SSD already contained.  SSDs from NGD and ScaleFlux use more powerful processors and boost the available DRAM.  The SSD that Samsung presented at SDC was not significantly more complex than a standard SSD controller, but was custom-designed to manage Key Values.  Eideticom’s controller is very powerful and actually sits outside of the SSD, but within a JBOF or “Just A Bunch Of Flash” array of NVMe SSDs, controlling any or all of the JBOF’s SSDs without the intervention of the server’s CPU.  Even a single Computational Storage SSD will provide important performance improvements to all of these systems, and the performance boost scales proportionally when multiple SSDs are added to the system: Five SSDs give five times the performance improvement, and twenty SSDs give twenty times the boost.

A completely opposite school of thought was expressed at SDC by Alibaba, CNEX, and Western Digital.  These companies argued that the processing normally done in an SSD might be better managed by the host server’s CPU.  There’s validity to this argument as well.

SSD housekeeping functions are performed asynchronously to the server’s application program, and this can create troublesome conflicts.  An SSD that encounters too much write traffic might enter a garbage collection process that cannot be interrupted at a time when the host requires immediate access.  SNIA’s SSD Performance Test Specification was specifically designed to trigger such events and to measure SSD performance across different write workloads so that system architects would understand when and how slowdowns might occur.

A lot of thought has been dedicated to addressing this problem, including the Trim Command, which helps the host to manage obsolete blocks, and other controls that can be used to disallow garbage collection at specific times.  Baidu did some groundbreaking work in moving control of a number of basic SSD functions outside of the drive to allow application programs to control exactly when an SSD would perform its housekeeping.  This lets the application program determine the least painful time to perform such functions.  This only works in systems where the hardware and software can be designed to complement each other, a luxury available to few entities other than hyperscale Internet firms.

Like the other approach, this one avails itself of the higher bandwidth within the SSD, but in a different way: The application program can be crafted to better match the number of internal channels in the SSD.  SSDs typically have multiple internal flash channels (4-32, for many, but some have more or fewer) and if the number of concurrent processes in the application program is made to match the number of SSD flash channels, and if each I/O stream in the application program is assigned its own flash channel in the SSD, the performance of the entire system will be optimized.

This approach is generally known as the OpenChannel SSD, and is favored by hyperscale companies, not only because they have the ability to optimize their applications around specific hardware, but also because it reduces the cost of the SSD controller, and this is very important to companies that are deploying millions of new SSDs every year.

So the question now is, can there be a market for both?

Well, I already explained why the OpenChannel SSD would appeal to hyperscale data centers, and why it may not be interesting to companies who don’t control their applications software, purchasing it instead from companies like Oracle, SAP, or Microsoft, but I didn’t explain the market for Computational Storage.

The Computational Storage approach also requires some change to the application software, but this will be supported by libraries supplied by the SSD manufacturers.  The libraries provide calls that invoke the SSD’s help as an alternative to performing certain functions in the processor.  This approach to streamlining the application is significantly simpler than dividing the application into multiple streams.  While there’s no standard way to do this, leading advocates of Computational Storage have joined forces to create standards that will allow a hardware + software ecosystem to be developed for this new architecture.  Even without this support, many eager buyers are already using these companies’ products and are willing to work with non-standard hardware to benefit from the competitive edge that they offer.

Over the long run it would seem that three very different SSD markets will coexist: One for standard SSDs in general-purpose computing and PCs, one for Computational Storage in competitive enterprise applications, and one for OpenChannel SSDs in hyperscale data centers.

Objective Analysis will closely follow these markets over coming years and will help our clients benefit from a deeper understanding of the the way they behave.  Please visit our website (https://Objective-Analysis.com) to learn how your company can benefit from information related to your market.

4 thoughts on “SSDs Need Controllers with More, NO! Less Power”

  1. I was wondering if you had noticed what Apple has done with their flash drives in the iMac Pro that was released in 2018? Instead of using a common flash drive, and inserting it into an m.2 PCIe 3 slot, Apple has used two comtrollerless flash drives, each at 2TBs. Apple then decided to use their T2 SoC as the controller, externally giving it the added responsibility of encryption and speed optimization.

    I realize Apple is a consumer based company, but I thought that was potentially a very smart way to approach part of the balancing act between the CPU, RAM, flash drives and controllers.

    If Apple wanted to I suppose the T2 could act as a RAID controller, (in the way SSDs are used in striped RAIDs), besides it’s encryption tasks. I suppose depending exactly where the T2 is located in relationship to memory and the CPU, Apple could dramatically speed up the timing of how data flow between vital components, and ultimately the CPU.

    What do you think of this?

    1. Steve,

      I didn’t know anything about this so I did some checking with people who know a lot more than I do. This doesn’t include Apple, because I find it nearly impossible to learn from either Apple or Apple employees anything except what Apple specifically wants for me to know, which is usually not much.

      One Open Channel SSD (OCSSD) expert told me that Apple uses a proprietary architecture that is not OCSSD but that has similarities.

      It’s smart to use the system’s host processor to manage NAND flash, but there are many people who don’t like the idea diverting host processor resources for flash management.

      You’re probably right about using the T2 as a RAID controller. I suspect that a carefully-configured system could get a lot more speed out of the flash/DRAM duo than most current systems do.

      Thanks for the comment,

      Jim

Comments are closed.