The Memory/Storage Hierarchy

Memory/Storage ThumbnailIt recently dawned on me that one of the charts that I most frequently use in my presentations has never been explained in The SSD Guy blog.  This is a serious oversight that I will correct with this post.

The Memory/Storage Hierarchy (also called the Storage/Memory Hierarchy, depending on your perspective) is a very simply way to explain why there are multiple memory and storage types within a system: Why is there a cache memory, or a DRAM, or an HDD?  The simple answer is that you can improve the system’s cost/performance ratio of you break the system down into an appropriate mix of fast & slow, expensive & cheap memory or storage.

To explain this I go way back to the 1960s and review the concept of “Virtual Memory”.  This concept was first commercialized by computer maker Burroughs, although it was first implemented by the University of Manchester in England.  The basic concept was to provide programmers with an extraordinarily large memory in which to run their programs by fooling the program into thinking that the memory was as large as the magnetic disk.

I actually look at it from two perspectives: Virtual Memory either makes your memory look as large as your disk, or it makes your disk appear to be as fast as your memory — either analogy is as good as the other.  It does this by taking advantage of the concept of locality, that is, a program tends to loop through a certain part of its address range for a number of iterations, then move to another section and stay there for awhile, and doesn’t move through its entire address space very quickly.  Programs also tend to iterate on a small range of data addresses before moving to another address range.  In a  Virtual Memory System (also called a “Demand Paging” system) clever management software notices where the program has come to rest and maintains a copy of that data and code in the memory where it will run really quickly.  When the program moves on, the code or data that was previously being used is “Evicted” from the memory and new code or data takes its place.

A cache memory behaves the same way, with hardware deciding which code and data should take a place in the fast (but costly) cache and which should remain in the slower and cheaper DRAM memory.  (I wrote a book on Cache Memories.)

Notice that each type of memory or storage I have mentioned, cache, memory, and disk, has a similar relationship with its neighboring levels: It is either slower and cheaper, or it’s faster and more expensive.  This is the basis of the Storage/Memory hierarchy.  I have even found a few charts that depicted this idea back in the 1970s!

Most presenters use a pyramid to abstractly illustrate the Memory/Storage Hierarchy, but I use a chart that I believe provides a more concrete view of the concept.  I will describe it here.

The figure below is a rough depiction of a typical computer before SSDs became popular.  There were processor caches (labeled as L1-L3 in the orbs to the upper right), then DRAM memory, then Disks (HDD), and finally tape, although this could just as well have been a floppy disk or a CD/DVD-R/W.

Storage/Memory Hierarchy pre-SSD

The bottom axis represents the price of each of these, going from really cheap tape on the left to prohibitively expensive L1 cache on the far right.  The vertical axis is an approximation of bandwidth going from very low bandwidth tape at the bottom to extremely fast L1 cache at the top.  These two axes are logarithmic (shown in powers of ten) to make the chart easier to read.  Even if logarithms worry you, you will appreciate it when I tell you that this makes it a lot easier to understand: If the chart had been drawn on linear axes then the red L1 orb in the upper right corner would consume nearly the entire chart, rendering it impossible to even see the other orbs.

As I mentioned before, each of these orbs has a relationship with the other orbs: the orbs to its right are more expensive and the orbs to its left are cheaper.  Similarly the orbs that are higher are faster and the orbs that are lower are slower.  This means that each orb is faster and more costly than the orb to its left, and slower and cheaper than the orb on its right.

There are products that don’t fit into the neat line on this chart.  For example, today’s MRAM is about the same speed as DRAM but is significantly more costly, so its orb would appear to the right of the DRAM orb by a couple of orders of magnitude but would vertically be at the same level.  It might be in the same price range as an L3 or even an L2 cache, but would be as slow as a DRAM.  This implies that today’s MRAM is not a good choice for mainstream computing applications, although it can’t be beat for the niches that it serves where its radiation tolerance or other attributes are worth the higher price.

Ignoring such outliers, we can use the basic concept of the Memory/Storage Hierarchy to optimize the cost/performance of most computing systems.  This is how SSDs have become so popular.  I’ll explain why.

The orbs in this chart have not stood still over time.  Prices of all of these technologies have declined (moved to the left) over time and the speed of the semiconductor memories has increased (moved up on the chart) at a faster pace than that of the magnetic technologies.  The graph below depicts that.

The Growing Speed/Price GapThe arrow at the top illustrates what has been happening with the semiconductor technologies: DRAM and the L1-L3 processor caches.  Over time they get faster and cheaper.

The lower arrow roughly depicts the motion of the magnetic technologies: HDD (disk) and magnetic tape.  Over time these technologies get cheaper, but they have not seen anywhere near the speed improvements enjoyed by the semiconductor technologies.

The end result of these disparate movements is that a gap has grown between the two groups.

Before NAND flash SSDs became cost competitive this gap was addressed by various means that have now fallen out of favor.  The most common one was to use high spindle speed “Enterprise” hard disks that would run at speeds up to 15,000 RPM.  More extreme solutions involved running multiple disks in parallel to improve their collective bandwidth and “Short Stroking” or “De-Stroking” hard drives to improve their latency by wasting a part of the storage.

The price of NAND flash fell below that of DRAM in 2004 and suddenly the Storage/Memory Hierarchy changed.  Flash-based SSDs, which already existed for niche applications, suddenly became an attractive way to fill this gap.  This is illustrated in the chart below.

Hierarchy with NAND Flash SSDNAND flash prices have dropped sharply over the past 15 years.  NAND has gone from being more expensive than DRAM before 2004 to costing less than 1/20th the price of DRAM today, and the gap is poised to widen even more!  Over time readers should expect to see increasing use of NAND flash in computing.

SSDs have caused complex changes to the computing marketplace: Over the past decade NAND flash SSDs have been used to reduce the amount of DRAM required in servers, and, in certain cases, to reduce overall server count.  This has led to reduced software licensing fees.

Many industry participants find it difficult to comprehend the impact that this has had in the semiconductor and SSD businesses and how it will impact the industry in the future.  One of the strengths of my company, Objective Analysis, is that we accurately predict such changes, and we have helped many of our clients to plan successful strategies to deal with such changes.

Before leaving this topic I will venture to one more new layer in the Memory/Storage Hierarchy, and that’s the Intel/Micron 3D XPoint Memory, which is the subject of a report recently released by Objective Analysis.

Although DRAM and the processor caches have been speeding up, NAND flash SSDs have not been keeping pace.  In other words, even though NAND flash is a solid state technology, it has been following an arrow more similar to the one for magnetic technologies in this post’s second graphic.  That means that a new gap has grown between DRAM and SSDs.

Intel is determined to fill this gap with 3D XPoint Memory, which the company has branded “Optane”.  Both The SSD Guy and The Memory Guy have written several posts about this technology.   Optane’s place in the Storage/Memory Hierarchy appears in the chart below.

3D XPoint or Optane in the Memory/Storage Hierarchy3D XPoint, or Optane, is a fit to the X-axis of this curve as long as it is faster than a NAND flash SSD but slower than DRAM.  This is certain, and is determined by the physics behind all three of these technologies.

But it must also fit the Y-axis in order to make sense.  This means that it must cost less than DRAM, although it can be more expensive than NAND flash.

Intel’s approach is to sell the technology at half the price of DRAM.  While this fits the product into the Memory/Storage Hierarchy, the cost to actually produce 3D XPoint chips is greater than the prices that Intel is charging, causing Intel to lose significant sums.  This will not last forever, since, at least in theory, 3D XPoint Memory should be cheaper to produce than DRAM.  The Memory Guy tells us that once sufficient volumes ship the economies of scale will drive production costs down to where they belong, and Intel will make a profit.  In the meantime, though, Intel continues to lose important sums on its Optane technology.

To sum it all up, this is a very useful chart that illustrates how each layer of the Memory/Storage hierarchy fits in, and that’s why it gets so much use in my presentations.  It visually explains why any new layer in the hierarchy must be priced right and must provide the right performance in order to grow into an important market.

3 thoughts on “The Memory/Storage Hierarchy”

  1. Very interesting. Makes sense.

    And this ties into what Kioxia has recently presented about how they view 3D Xpoint, and what they see as the future….

    But I think they are just trying to blow smoke up our a$$es….

    I haven’t got a copy of the Kioxia presentation just a bit from what the Anandtech guy grabbed, but the key is a very misleading graph they show of Bit Cost or Memory Types.

    It’s on the Anandtech page.

    I reproduced the curves, and made a table of the data.

    And then de-normalised it to produce a column of Layers vs Total Cost for a given layer (you get that by multiplying the normalised cost by number of layers).

    And then I added a nominal offset to the 3D SCM values of 2.

    You can choose your own value, but I added something to the Total Cost per Layer of 3D SCM to remove the normalisation to the NAND flash.

    So I then get a table of Layers, from 1 to 16.

    And two columns of 3D SCM Total Cost, from 3 to 20.7.

    And for NAND Total Cost from 1 to 1.71

    Can graph that.

    But THEN, I normalised it BACK to what Kioxia did.

    And THAT is a BIG change.

    You see that, yes, there is a big price difference at 1 layer, but as number layers increases, the NAND Cost/layer is steady.

    But the 3D SCM Cost/layer drops to its lowest at around 10 layers, and then does NOT rise rapidly as Kioxia’s graph shows, but VERY gradually rises.

    So THAT means that:

    – The offset in price could be brought closer by the price drops expected when you develop a new process, and doesn’t need drastic improvements
    – And just a bit of overall improvement in making more layers would reduce any price difference with large number of layers to even less

    So I think this is just Kioxia trying to divert attention away from a process they haven’t spent time and money on.

    If you email me, I can send you the graphs I did.

    1. Alan,

      Thanks for a VERY THOROUGH note!

      I am sure that there are issues that you, I , and Kioxia don’t know anything about since none of us is in volume production of a stacked crosspoint array like Intel is. Frankly, I am amazed that they went from two to four layers rather than to shrink the 20nm process to 15nm. Either should double the bits on a wafer, and the folks at Mircon’s Lehi fab (where 3D XPoint is manufactured) have lots of experience producing 15nm planar NAND flash in high volume.

      You are very correct to note that it’s easy for a company to throw stones at a process that they don’t make. In the early 2000s Intel and AMD used to do this frequently, each telling me that the other had a bad multi-bit approach (MLC for Intel & MirrorBit for AMD). In the end both were able to manufacture their particular processes just fine!

      It’s generous of you to offer to e-mail your graphs.

      Your e-mail isn’t visible to the readers, but they can come to me and I will forward their requests to you.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.