One of the SSD Guy’s favorite subjects is caching and SSDs. This is because I wrote a book on processor caches in the early 1990s, and the advent of SSD caches in storage systems hearkens back to the technology detailed in that book.
Caching works well whenever there are two layers in the memory hierarchy since the fast expensive layer can replicate data in the slow inexpensive layer to accelerate the processor’s performance.
One interesting point is that the cache works best if left to its own devices to determine what data should be stored in the faster layer. Yes, people are always tempted to take control of the situation themselves, but it’s amazing how easy it is to overlook certain drains on computer performance that a relatively simple algorithm can find without human intervention.
The image for this blog comes from the report: The Enterprise SSD: Technologies & Markets which can be ordered for direct download from the Objective Analysis website. The data for the chart was provided by IBM researchers. This group compared the operation of database inquiries on an all-HDD and an all-SSD system. These are the columns on the far left and right of the chart, respectively. The all-SSD system performed at ten times the speed of the all-HDD system.
This team then configured a system with a full complement of HDDs and a smaller set of SSDs whose capacity was about 20% that of the HDDs. First the researchers placed the files that they understood to be the most important into the SSDs, with all other data remaining in the HDDs. This is called: “Manual Data Placement.” The second column from the right shows the performance of this approach. It is, quite respectably, twice as fast as the HDD-only system.
The researchers then used the same configuration, but allowed a relatively simple algorithm to determine which data to place in the SSDs and which to leave in the HDDs. Using this “Automatic Data Placement” approach they found that the system performed eight times as fast as the HDD-only system, or 80% as fast as the SSD-only system, all with only 20% of the system’s storage replicated on SSDs.
Given the significantly higher cost of SSDs compared to HDDs, and with the understanding from The SSD Guy’s previous post that this gap is unlikely to close, then it is only natural to anticipate that systems will probably move to the use of small NAND caches (either in the form of an SSD, hybrid HDD, or other formats) coupled with a spacious HDD for general storage.