One of the most compelling keynotes at the Flash Memory Summit last August was presented by SSD controller maker FADU Technology and social media giant Meta (Facebook). These two companies were advocating a new way of managing SSDs called: “Flexible Data Placement.” The SSD Guy believes that these two companies’ findings are pretty remarkable and well worth sharing.
Ross Stenfort from Meta presented a timeline showing that since 2019 Meta and Microsoft (Azure) have been working together to develop a data center SSD specification for the Open Compute Project (OCP.) After a few iterations the specification has gone through a number of important updates, leading to the Datacenter NVMe SSD Version 2.5, to be released later this year, which is the first of these standards to include Flexible Data Placement (FDP). (He shared a link to Version 2.0 of the specification.)
FADU was demonstrating a controller that supports FDP in their booth at the Flash Memory Summit. The company claims that this is the first SSD controller to support FDP.
Flexible Data Placement
Flexible Data Placement gives the host server more control over where data resides within the SSD. The goal is to reduce write amplification to improve performance. The benchmarks that Meta shared show that it certainly does that!
In his first benchmark chart Mr. Stenfort plotted three SSDs’ write amplification as a function of time. (Although Meta’s chart doesn’t reveal the actual time on its charts’ horizontal axes, other charts later in this post will indicate that the entire chart covers a long period of probably a full 24-hour day.)
A write amplification factor (WAF) of 1.0 means that each server write results in a single write to the NAND flash chips inside the SSD. This is good, because the flash in the SSD wears out with a lot of writes, so you want to have the lowest possible number of writes. A WAF larger than one means that the SSD’s internal housekeeping (like garbage collection) creates extra NAND write traffic in addition to the writes requested by the host server. FDP is intended to dramatically reduce the SSD’s WAF.
The blue line in the chart illustrates a standard SSD that is being given endless 64KB random writes. You can see that the WAF increases the longer the SSD is being given this workload, asymptotically approaching a number slightly higher than 3. Each host write at this point results in three internal NAND flash writes to the SSD, so the flash chips’ lifetime is reduced to 1/3rd of its normal amount.
The most common approach to this issue is to add more NAND flash inside the SSD, and make the additional flash invisible to the user. This is called overprovisioning, and it quite naturally increases the cost of the SSD. Can the SSD’s WAF be reduced without increasing the price?
He said that some clever designers might try to improve write amplification by converting the workload into something more SSD-friendly. This can be done by modifying the software to turn the workload into multiple streams and using a faster data structure. This appears in the yellow line, which shows data for a log-structured approach with eight serial writers. Although the WAF raises earlier and faster in this case, it reaches a level of 2.2-2.3 and then stays there from then on, which is a whole lot better than the 3.0 level that the blue line reaches.
You may not have noticed, but there’s a red line that runs along the 1.0 WAF line towards the bottom of the chart. This is the WAF that results from teaming that same log-structured eight-writer approach with FDP. The result is that the SSD experiences far fewer extra writes, and can provide excellent performance with significantly less overprovisioning, while allowing the NAND chips to give nearly their entire wear budget to the host server.
That’s great for wear, and it keeps the SSD cheap, but what does it mean to performance?
Performance Improves, Too
Mr. Stenfort’s next chart shows the write throughput for the same tests, using the same color scheme. As you would expect, the blue line for the standard SSD provides the worst long-term performance, slowly decaying from over 3,000MB/s to asymptotically approach 1,000MB/s. This makes sense: If every host write triggers three internal NAND writes then the SSD should indeed be expected to slow to 1/3rd of its original speed.
With this in mind it should come as no surprise that the yellow line for the software approach drops faster than the blue line, but reaches a higher write speed randomly jumping between 1,500-2,000MB/s for the long term, which is far better than the blue line’s 1,000MB/s level! Both lines show significantly more noise than their WAF counterparts, but this may stem from the way that their data was measured.
Finally, we have the red line. The extra processing related to FDP appears to have incurred a very small penalty to the starting bandwidth, compared to the other two, but the performance remains very solidly at 3,000MB/s for the entire period of the chart. In a system where bandwidth is the single most important parameter, this implies that one FDP SSD can do the work of three standard SSDs, which should lead to some very impressive cost savings.
Stenfort says that bandwidth isn’t the only story, but that QOS (quality of service) and power consumption improve as well. This makes sense, since write amplification often causes stalls, and that’s a QOS issue, while reduced performance means that the SSD will consume more power to do the same amount of work. The noise in the yellow line is one likely indicator of QOS issues.
FADU’s Part of the Story
After Stenfort’s presentation FADU’s CEO and founder, Jihyo Lee, stepped in to tell about the company’s mission (low power) and to introduce FADU’s new FDP controller. He shared benchmark charts very similar to Stenfort’s, but which showed slightly different but still extremely impressive, improvements. These are FADU’s charts:
Will FDP Be Broadly Used?
At this point readers are probably wondering if flexible data placement will become broadly used, taking over the bulk of the SSD market. This isn’t likely, although its use in large data centers should become universal. Here’s why:
For a system to take advantage of FDP the application programs need to understand what it is and how to use it. In a closed system this is not difficult to manage, and hyperscale data centers frequently upgrade their applications to tune them to hardware changes. Management looks at it like this: if they spend $1 million to change the software they can reduce their hardware costs by $3 million. That makes the decision easy.
For off-the-shelf software the argument is different. Should the software company spend $1 million to update the code? Not only is there no obvious return to them for doing this, but it also increases the chance that new bugs will be introduced into the code. Because of this, it often takes a decade or longer for something like FDP to gain widespread use.
Overall, The SSD Guy expects to see hyperscalers, at least Meta and Microsoft, if not others, to rapidly adopt FDP, but for the technology to be slow to ramp in other markets.
My company, Objective Analysis, regularly strategizes about new technologies to help our clients develop their best plans for the future. If you would like your company to thrive please contact us to explore ways that we can work together.
A video of the keynote can be found HERE.