This post is a continuation of a four part series in The SSD Guy blog to help explain Intel’s two recently-announced modes of accessing its Optane DIMM, formally known as the “Intel Optane DC Persistent Memory.”
App Direct Mode
Intel’s App Direct Mode is the more interesting of the two Optane operating modes since it supports in-memory persistence, which opens up a new and different approach to improve the performance of tomorrow’s standard software. While today’s software operates under the assumption that data can only be persistent if it is written to slow storage (SSDs, HDDs, the cloud, etc.) Optane under App Direct Mode allows data to persist at memory speeds, as also do other nonvolatile memories like NVDIMMs under the SNIA NVM Programming Model.
App Direct Mode implements the full SNIA NVM Programming Model described in an earlier SSD Guy post and allows software to be written to use Optane as a small pool of extraordinarily fast storage. I have put a copy of the diagram I used to explain the SNIA NVM Programming Model just below the following paragraph.
Applications software can access the Optane DIMM by using standard operating system storage calls through the file system the same way that slower storage devices like SSDs and HDDs are accessed. (This is represented by the solid vertical arrows towards the left of the diagram.) Accessing Optane through the file system will provide an important speed boost to applications, but the DDR hardware interface on an Optane DIMM provides a whole lot more speed than this approach can support, simply because the software stack for the file system gets in the way. File system calls have been developed over a long period to support a very wide variety of storage devices, having originally been written with HDDs in mind, so they weren’t tuned for speed until recently. That recent tuning was put in place to help support SSDs, but SSDs are still about three orders of magnitude slower than persistent memory (Optane or NVDIMM-Ns) so file system calls are not the fastest way to access persistent data in an Optane DIMM.
The real speed benefit is realized when software accesses Optane as a memory, reading and writing persistent data the same way that it would read or write data to a DRAM DIMM. This is represented by the vertical dashed line at the far right side of the diagram, and is often referred to as “DAX Mode” for “Direct Access.” When the Optane DIMM is treated as memory data can be written to it in less than a microsecond. This requires the application program to understand that the system may have different kinds of memory – persistent and volatile – and that makes things more complicated for the program. This architecture has been given the name “NUMA” (Nonuniform Memory Architecture) to distinguish it from the more conventional DRAM-only systems to which PC and server programmers are accustomed.
Applications can’t avail themselves of persistent memory writes, though, unless they can manage the two different memory domains, and such software is just now being written. It may take a while before we see a wealth of off-the-shelf applications that support persistent memory. Proprietary software for closed systems (like SANs and hyperscale datacenters) already uses Optane and NVDIMM-Ns to their advantage, and they can do this because the programmers are in close communication with the people who specify the hardware. Both sides know exactly how much persistent and volatile memory the system has and where it is located. But broad-based hardware will have varying amounts of volatile and persistent memory at varying addresses and might even have no persistent memory at all, and the application program must be written to perform optimally in any configuration. Application programmers will have to think this through and do a lot of testing, and this will take some time.
The industry has a name for programs that can operate in App Direct Mode: They are called “PM Aware” programs, with the “PM” part standing for “Persistent Memory.”
Why is persistent memory so exciting? The excitement has a lot to do with recovery from power failures.
A gentleman from IBM once put things into perspective by telling me: “When a PC crashes the user’s vocabulary is enriched, but when a banking computer crashes it can cause a financial catastrophe!” The programmers who work on financial and other critical systems take special steps to assure that their system will always be able to recover from a power failure without losing a single transaction. Part of this involves saving transaction data in persistent storage through a rigorous protocol:
- Write the data into persistent storage
- Read the data back and verify that it has been correctly written
- Write a flag to persistent storage to indicate that the data is valid
- Read back and verify that the flag has been correctly written
Only after all of these steps have been completed can the transaction be closed. This means that it takes four I/Os to perform any single write. The benefit, though, is that the transaction is not final until the software is absolutely certain that it has been saved to persistent storage.
Naturally, any system that uses this protocol will see a great speed improvement by moving from HDDs to SSDs, and an even greater improvement by moving to persistent memory, especially if the persistent memory is accessed as memory rather than going through the file system to be accessed as storage. It is these applications that are the most interested in App Direct Mode.
This four-part series, published in early 2019, explores each of Intel’s two modes to explain what they do and how they work in the following sections:
- Overview
- Memory Mode
- App Direct Mode
- Wrap-Up, Comparing the Modes
Minor nit picky comment – NUMA is Non Uniform Memory Access – as in the time it takes to access any piece of memory is not uniform across the memory address space and can vary by a few orders of magnitude.
Tom,
Nice to see you here!
I stand corrected on NUMA. I have heard more than one definition before seeing yours. Yours makes most sense.
Jim
Back around 1980 IBM introduced the System/38 which was called the “Future Machine”. It sported a 48-bit address word (8TB) with the idea that *everything* was simply in memory and there was no concept of secondary storage – at least not visible the the applications. In reality, of course, the machine did have disk drives and tape drives that it used to support the large address space. But from the application’s point of view, all of it’s files were simply a byte-address in memory. So, here we are some 40 years later realizing the Future Machine.
Tom,
Thanks for the comment! I would have put this even further back to the development of Virtual Memory in the 1960s!
My knowledge doesn’t go back that far, but I have heard that this was the first time that any system was designed to fool applications programs into thinking that there was a larger main memory (core at that time) than really existed.
These same techniques were used to implement cache memory, and later for caching software that managed SSDs.
It’s all a shell game!
Jim
Indeed virtual memory goes way back.
See slide 20 onwards of a tutorial that Ymir Vigfusson and I have been delivering at a few different conferences in the last couple of years. We talk about the first virtual memory computer (hint: Atlas) https://drive.google.com/file/d/16CQTygo8Cgqagn3hePz2VrwJoqaf1srf/view
The 1962 Kilburn, T. et al. paper One-level storage system. IRE Transactions EC-11 still reads like it’s fresh.
Irfan
Irfan,
I just today noticed this comment. Many thanks for posting it!
Your tutorial at the link is GREAT! Consider this as a solid compliment since it comes from the guy who wrote The Cache Memory Book.
I found a copy of that Kilburn paper on the University of Glasgow website:
http://www.dcs.gla.ac.uk/~wpc/grcs/kilburn.pdf
Thanks for directing me to it.
Best,
Jim