Computational Storage Hits the Mainstream

Chart showing two lines on a graph of performance vs number of SSDs. With "Scale In" the performance is proportional to the number of SSDs. With a standard server the performance diesn't change. With 16 SSDs the performance is 4 timeas as much, and with 32 SSDs it's 8 times as much.There’s an idea that has been kicking around for a number of years, and it seems now to be gaining traction.  The idea is to use the inherent smarts and high available bandwidth within an SSD to perform functions that would normally be done by a server’s processor thereby reducing the load on the processor while minimizing the amount of data that needed to make a round trip from the SSD to the processor and back for some trivial function.

Such data movement is said to consume a very measurable percentage of the world’s energy supply, and this is just wasteful.

The SSD Guy first wrote about the idea in 2013, when Ed Doller, then at Micron, delivered a keynote presentation at the Flash Memory Summit that showed how the processors within SSDs could be reprogrammed to perform low-level tasks, allowing simple functions to be performed with in the SSD itself, rather than making the data take a long excursion to the server processor and back so that the server could perform the same simple function.  Most importantly, the performance of his system scaled linearly: it was directly proportional to the number of smart SSDs you used.  This is no small achievement!

This post’s first graphic is Doller’s most compelling slide from that presentation.  If you click on it you can see the whole slide.  It shows that performance in Micron’s PoC study scaled linearly with the number of drives applied, and that it reached 8 times the server’s performance when using 32 drives, which is pretty impressive when you compare the tiny processors used in SSDs against the high-performance processors typical of servers.

We now zoom ahead to last week’s Persistent Memory and Computational Storage Summit hosted by the Storage Networking Industry Association (SNIA).  Eight years later established vendors are explaining how they have worked together to produce a standard interface to allow computing systems to use this new resource with applications that can be written to use any of a variety of such devices, or even to perform their tasks with no such devices.

The market has moved from being a novelty to becoming something sophisticated enough that it could change the structure of all future computing systems.

The people who created these standards even went so far as to give the technology a new name, since they were all previously using different names for their products.  This name is “Computational Storage“.  This simply states that a storage device is capable of being given commands from a server, commands like “Sort” or “Compress” or “Encrypt” that they can do internally without requiring the server’s processor to fetch the data, sort it, then return it to where it originally resided.

Types of Computational Storage Devices

Since there were different vendors offering products under different names, it should not be a surprise to learn that they each had different ideas about how to do this job.  Computational Storage has gone by many names: Scale-In, In-Situ Processing, Compute to Data, In-Data Processing, No-Load Acceleration.  The architectures are equally diverse.  This made the job of standardization more difficult than many other standardization efforts.

Perhaps the greatest advance that this group made was to find a way to sort the different approaches into rational groups, but they did exactly that.  Computational Storage (CS) has now been broken into three fundamental categories, all of which fall under the umbrella term “Computational Storage Device” (CSx):

  • CSD: Computational Storage Drives.  CSDs resemble the first devices shown by Micron in 2013.  Either a microprocessor, an ASIC, or an FPGA within an otherwise-ordinary SSD performs the computation.  Three different CSD architectures appear in the figure below, courtesy of SNIA.  On the left the computation is performed by an FPGA within a bridge that manages a number of SSDs.  In the center the computation is performed by an FPGA that resides within an SSD itself.  On the right the computation is performed, as in Doller’s presentation, by the processor that normally manages the functions of the SSD.

Three side-by-side block diagrams explained in the text

  • CSP: Computational Storage Processor.  A CSP is the team leader of an array of SSDs, manipulating data within the storage array upon command from the server.  Think of this team as something similar to a CSD but as an array, rather than as a single SSD.  Today’s CSPs take advantage of the fact that a device other than the server processor can take charge of a PCIe/NVMe channel and command NVMe SSDs what to do.  This is illustrated in the diagram below:

Processor, SSDs, and Host all tied to the same bus

  • CSA: The Computational Storage Array.  This is an array of SSDs or of computational storage devices that perform tasks within the array.  The system is self-contained and self-managed.

Each does certain tasks better than others, but each is an example of a product that can offload significant busywork from the server processor.

Companies in this Space

  • NGD Systems is the only CS company that The SSD Guy has previously outlined in a blog post (shame on me!)  The company makes an ASIC-based CSD that has received good acceptance in hyperscale data centers.  The ASIC can be programmed by the user to perform any of myriad functions.
  • ScaleFlux also produces CSDs, especially FPGA-based CSDs with very high storage capacity, and has also met with great interest in the hyperscale community.  The FPGA can be programmed by the user.
  • Eideticom has taken a different approach by creating a CSP that is a controller for arrays of NVMe SSDs .  The host server simply issues a command to the CSP and knows that the data manipulation will be taken care of without intervention.  The programs executed within the CSP can be customized by the user.
  • IBM uses a computational storage architecture in its second-generation FlashCore Module.  This SSD includes an FPGA that compresses data to improve speed and endurance of the QLC flash it uses as media.  Although IBM doesn’t flag this product as a computational storage drive, it fits those criteria.  The FPGA is not programmable in this device.
  • Netint also manufactures a dedicated device in the form of a video transcoder which is an NVMe SSD that internally compresses video files within the device.  Netint’s processor is not user programmable.
  • Samsung has a SmartSSD that is a user-programmable FPGA-based CSD.  The company has teamed up with Xilinx to produce a chipset that can be used to design a variety of CS devices.  Xilinx has done a lot to promote the Samsung’s SmartSSD on its website.

The Xilinx FPGA used in Samsung’s SmartSSD is the same one that IBM uses in its new FlashCore Module, and Xilinx is promoting the product and support software for use by other companies.  This means that we can anticipate more CSPs, CSDs, and CSAs in the future.

Like Xilinx, Arm supplies technology that is used in computational storage devices, but Arm does not make CSx products itself.  Still, the company sees this as such an important technology that they provide numerous resources on a dedicated computational storage webpage.

SNIA Support

The Storage Networking Industry Association has been very supportive of this new effort.  SNIA’s Computational Storage Task Group has a solid 46 participating companies, including those mentioned above.

This team has already published a Computational Storage Architecture and Programming Model that pulls all of the vendors’ specifications into a consistent format.

An idealized system that uses all three kinds of computational storage appears in the block diagram below, also from SNIA (click on it to enlarge):

Block diagram showing SSDs, a CSP, two CSDs, and a CSA tied to a bus full of processorsThe boxes are, from left to right:

  1. Standard SSD-based storage
  2. A CSP that manages and computes the data in those standard SSDs
  3. A CSD in which the hosts load data directly to the storage medium
  4. Another kind of CSD which allows the hosts to load data either directly to the storage medium or through the CSD’s internal processor
  5. A CSA

Through SNIA’s standards all of these devices could reside together within the same system without conflict as is shown in the diagram.


I often quip that the road to hell is paved with technologically-superior products.  While computational storage is a stunning technology, it will require a lot of work for it to become a critical component in widespread computing.  Fortunately, the vendors are all aware of this challenge and are prepared to fight to make computational storage an important part of tomorrow’s standard computing systems.

The formation of the SNIA task group is a prime example.  Rather than entering into blind competition, all leading computational storage vendors, plus some of their key suppliers and customers, have promised to cooperate in a way that should expand usage.

New ideas always take time to settle in, though, so it may be some years before these companies’ vision is shared by the  computing community at large.  For the time being, though, I expect to see these companies continue to push forward to convert their vision into reality.

While Objective Analysis has not yet published a report that covers the computational storage market, we intend to rectify that in the course of 2021.  Watch for news of that on the SSD Guy blog, or feel free to contact me for an update.





Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.