Flexible Data Placement Means Better SSDs

Overlay with Fadu's logo over Meta's logoOne of the most compelling keynotes at the Flash Memory Summit last August was presented by SSD controller maker FADU Technology and social media giant Meta (Facebook).  These two companies were advocating a new way of managing SSDs called: “Flexible Data Placement.”  The SSD Guy believes that these two companies’ findings are pretty remarkable and well worth sharing.

Ross Stenfort from Meta presented a timeline showing that since 2019 Meta and Microsoft (Azure) have been working together to develop a data center SSD specification for the Open Compute Project (OCP.)  After a few iterations the specification has gone through a number of important updates, leading to the Datacenter NVMe SSD Version 2.5, to be released later this year, which is the first of these standards to include Flexible Data Placement (FDP).  (He shared a link to Version 2.0 of the specification.)

FADU was demonstrating a controller that supports FDP in their booth at the Flash Memory Summit.  The company claims that this is the first SSD controller to support FDP.

Flexible Data Placement

Flexible Data Placement gives the host server more control over where data resides within the SSD.  The goal is to reduce write amplification to improve performance.  The benchmarks that Meta shared show that it certainly does that!

In his first benchmark chart Mr. Stenfort plotted three SSDs’ write amplification as a function of time.  (Although Meta’s chart doesn’t reveal the actual time on its charts’ horizontal axes, other charts later in this post will indicate that the entire chart covers a long period of probably a full 24-hour day.)

A write amplification factor (WAF) of 1.0 means that each server write results in a single write to the NAND flash chips inside the SSD.  This is good, because the flash in the SSD wears out with a lot of writes, so you want to have the lowest possible number of writes.  A WAF larger than one means that the SSD’s internal housekeeping (like garbage collection) creates extra NAND write traffic in addition to the writes requested by the host server.  FDP is intended to dramatically reduce the SSD’s WAF.

The blue line in the chart illustrates a standard SSD that is being given endless 64KB random writes.  You can see that the WAF increases the longer the SSD is being given this workload, asymptotically approaching a number slightly higher than 3.  Each host write at this point results in three internal NAND flash writes to the SSD, so the flash chips’ lifetime is reduced to 1/3rd of its normal amount.

Graph showing 3 lines plotted on a Write Amplification Factor (WAF) vs. Time chart. There are no labels on the Time axis, and the vertical WAF axis runs from 0-4. A blue arrow near the vertical axis says "Better" at the bottom and "Worse" at the top. The first of the 3 lines (blue) represents a standard SSD experiencing 64kB random writes, and it starts at 1, then ramps quickly on the left then tapers off to approach an asymptote at a WAF of slightly over 3. A yellow line, labeled “Log Structured 8 Writers 64KB" ramps earlier and faster (very suddenly) to a WAF of 2.4, then wavers around that level for the remainder of the chart. Finally, a red line, labeled "Log Structured 8 Writers 64KB with FDP" starts at 1 like the others and stays right there for the remainder of the chart.

The most common approach to this issue is to add more NAND flash inside the SSD, and make the additional flash invisible to the user.  This is called overprovisioning, and it quite naturally increases the cost of the SSD.  Can the SSD’s WAF be reduced without increasing the price?

He said that some clever designers might try to improve write amplification by converting the workload into something more SSD-friendly.  This can be done by modifying the software to turn the workload into multiple streams and using a faster data structure.  This appears in the yellow line, which shows data for a log-structured approach with eight serial writers.  Although the WAF raises earlier and faster in this case, it reaches a level of 2.2-2.3 and then stays there from then on, which is a whole lot better than the 3.0 level that the blue line reaches.

You may not have noticed, but there’s a red line that runs along the 1.0 WAF line towards the bottom of the chart.  This is the WAF that results from teaming that same log-structured eight-writer approach with FDP.  The result is that the SSD experiences far fewer extra writes, and can provide excellent performance with significantly less overprovisioning, while allowing the NAND chips to give nearly their entire wear budget to the host server.

That’s great for wear, and it keeps the SSD cheap, but what does it mean to performance?

Performance Improves, Too

Mr. Stenfort’s next chart shows the write throughput for the same tests, using the same color scheme.  As you would expect, the blue line for the standard SSD provides the worst long-term performance, slowly decaying from over 3,000MB/s to asymptotically approach 1,000MB/s.  This makes sense: If every host write triggers three internal NAND writes then the SSD should indeed be expected to slow to 1/3rd of its original speed.

This graph plots as similar 3 lines on a Write Throughput (MB/s) vs. Time chart. There are again no labels on the Time axis, and the vertical MB/s axis runs from 0-3,000. The blue lines (the standard SSD) starts at slightly over 3,000, then tapers off to approach an asymptote at 1,000. The yellow line shows that “Log Structured 8 Writers 64KB" starts at 3,000 then drops earlier and faster (very suddenly) to a level that bounces rapidly and noisily between 1,500-2,000 for the rest of the chart. Finally, the red "Log Structured 8 Writers 64KB with FDP" line starts at 3,000 and stays right there for the remainder of the chart.

With this in mind it should come as no surprise that the yellow line for the software approach drops faster than the blue line, but reaches a higher write speed randomly jumping between 1,500-2,000MB/s for the long term, which is far better than the blue line’s 1,000MB/s level!  Both lines show significantly more noise than their WAF counterparts, but this may stem from the way that their data was measured.

Finally, we have the red line.  The extra processing related to FDP appears to have incurred a very small penalty to the starting bandwidth, compared to the other two, but the performance remains very solidly at 3,000MB/s for the entire period of the chart.  In a system where bandwidth is the single most important parameter, this implies that one FDP SSD can do the work of three standard SSDs, which should lead to some very impressive cost savings.

Stenfort says that bandwidth isn’t the only story, but that QOS (quality of service) and power consumption improve as well.  This makes sense, since write amplification often causes stalls, and that’s a QOS issue, while reduced performance means that the SSD will consume more power to do the same amount of work.  The noise in the yellow line is one likely indicator of QOS issues.

FADU’s Part of the Story

After Stenfort’s presentation FADU’s CEO and founder, Jihyo Lee, stepped in to tell about the company’s mission (low power) and to introduce FADU’s new FDP controller.  He shared benchmark charts very similar to Stenfort’s, but which showed slightly different but still extremely impressive, improvements.  These are FADU’s charts:

Graph similar to Meta’s Write Amplification Factor (WAF) vs. Time chart, but with only 2 lines instead of 3. The title above says: “Write amplification (4~128KB write)”. The Time axis runs from 0-35,000 seconds, and the vertical WAF axis runs from 0-3. A gray line labeled “non FDP” resembles Meta’s blue line, representing a standard SSD, and it starts at 1, then ramps asymptotically to approach a WAF of about 2.2. A maroon line, labeled “FDP" starts at 1 and stays right there for the remainder of the chart. A fat arrow points from the upper gray line to the lower maroon line and is labeled “Wear”.

Graph similar to Meta’s Write Throughput (MB/s) vs. Time chart. The title above says: “Write throughput (4~128KB write)”. The Time axis runs from 0-35,000 seconds, and the vertical throughput axis runs from 0-7,000MB/s. The gray line (labeled “Non FDP”) starts at 5,500, then drops very suddenly at around 2,000 seconds to a level that bounces rapidly and noisily between 1,000-2,500 for the rest of the chart. Its shape resembles the yellow line in the Meta chart, rather than the blue one. The maroon "FDP" line starts a little lower, at 5,000, and stays there with noise of about +/-100MB/s for the remainder of the chart.Mr. Lee said that these results were from the FDP SSD that FADU was demonstrating on the Flash Memory Summit show floor during the event.

Will FDP Be Broadly Used?

At this point readers are probably wondering if flexible data placement will become broadly used, taking over the bulk of the SSD market.  This isn’t likely, although its use in large data centers should become universal.  Here’s why:

For a system to take advantage of FDP the application programs need to understand what it is and how to use it.  In a closed system this is not difficult to manage, and hyperscale data centers frequently upgrade their applications to tune them to hardware changes.  Management looks at it like this: if they spend $1 million to change the software they can reduce their hardware costs by $3 million.  That makes the decision easy.

For off-the-shelf software the argument is different.  Should the software company spend $1 million to update the code?  Not only is there no obvious return to them for doing this, but it also increases the chance that new bugs will be introduced into the code.  Because of this, it often takes a decade or longer for something like FDP to gain widespread use.

Overall, The SSD Guy expects to see hyperscalers, at least Meta and Microsoft, if not others, to rapidly adopt FDP, but for the technology to be slow to ramp in other markets.

My company, Objective Analysis, regularly strategizes about new technologies to help our clients develop their best plans for the future.  If you would like your company to thrive please contact us to explore ways that we can work together.

 

A video of the keynote can be found HERE.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.