I have to admit that it’s embarrassing when The SSD Guy misses something important in the world of flash storage, but I only recently learned of a paper that Baidu, China’s leading search engine, presented at the ASPLOS conference a year ago. The paper details how Baidu changed the way they use flash to gain significant benefits over their original SSD-based systems.
After having deployed 300,000 standard SSDs over the preceding seven years, Baidu engineers looked for ways to achieve higher performance and more efficient use of the flash they were buying. Their approach was to strip the SSD of all functions that could be better performed by the host server, and to reconfigure the application software and operating system to make the best of flash’s idiosyncrasies.
You can only do this if you have control of both the system hardware and software.
The result was SDF, or “Software-Defined Flash”, a card that appears to be based on a Huawei PCIe SSD whose FPGAs were reprogrammed to remove several internal functions. The designers omitted the internal DRAM buffer, and the logic for garbage collection, inter-chip parity coding, and static wear leveling, while exposing all 44 internal NAND channels to the system software. With these changes SDF delivers high-throughput I/O with consistent low latency to highly concurrent applications.
The host system software was redesigned to bypass the standard Linux I/O stack, reducing a 75μs delay down to 2-4μs, and writes are now consolidated, to convert random 4KB writes into 2MB streaming erase/write operations that perform 7-10 times faster.
This approach paid off – in its intended environment Baidu’s SDF delivers three times the I/O bandwidth at a 50% lower hardware cost per-megabyte than the company’s original SSD-based system. Bandwidth improvements stem from the large writes, and the removal of SSD’s internal garbage collection and static wear leveling. Most of the cost benefit resulted from the fact that the SSD no longer needs overprovisioning once these functions have been eliminated.
As of the paper’s publication over 3,000 of these boards had been deployed into production servers with a daily data processing volume reaching dozens of petabytes.
The paper provides much more detail about the design, and benchmarks SDF against the original Huawei SSD for a number of test cases. At low string counts the standard SSD outperforms SDF, but at 16 or more strings the SDF provides significant benefits over a standard SSD.
Baidu’s approach might at first appear to borrow from Fusion-io’s unique architecture, by moving card management to the host processor and by bypassing the I/O stack. In fact, Fusion-io leverages host CPU and memory resources to reduce the design cost and complexity of the FTL hardware, while Baidu leaves only a highly-simplified version of the FTL layer within the flash card. SDF is actually an application-driven hardware/software co-design with a different design principle, software/hardware architecture, and program interface.
I highly recommend reading this paper to anyone who is interested in optimizing flash use in highly-concurrent systems where hardware and software can both be used to exploit flash’s benefits.