With all the recent interest in CXL, and its ability to connect a processor to any memory, no matter the speed, it’s only natural that someone would try using it for SSDs. This notion is the basis for the Memory-Semantic SSD, or MS-SSD.
But MS-SSDs suffer from the same problem as SSDs, hard drives, and other mass storage. The basic concept requires for the SSD to try to anticipate the processor’s upcoming requirements. If it guesses correctly, then the SSD can perform the processor’s next operation rapidly, but if it guesses wrong then that operation will be slow.
In the case of the MS-SSD, the device must anticipate the next several addresses that the processor will read from, and load them from the SSD into the MS-SSD’s DRAM.
What if the MS-SSD instead would let the processor tell it what it would need for the next several memory cycles, allowing the MS-SSD to prefetch exactly what the processor was about to ask for? This is the same thinking that has been applied to many new non-CXL SSD architectures like the Open-Channel SSD.
CXL controller design house Wolley decided to apply this thinking, and presented the company’s new architecture at November’s Super Computing conference SC23.
They call it “NVMe over CXL” or NVMe-oC and say that it’s:
an implementation of CXL to optimize the host-device data movement where most hosts only use a fraction of data retrieved from the storage devices.
The basic idea is that many SSD reads are for smaller chunks of data than the standard 4KB delivered by an SSD access. Why move all that data over CXL-io or NVMe over PCIe if the processor only needs one 64-byte cache line of it? NVMe-oC is expected to both reduce the I/O traffic and the effort spent by the host to move the data itself.
The approach uses CXL.io to access the SSD and CXL.mem to access the memory. Special commands tell the SSD to write data into that memory or to write from the memory into the SSD, without any interaction from the host, to reduce host-device data movement. A block diagram appears in a thumbnail at the top of this post and full sized below:
In a way, this approach establishes a halfway point between two CXL-attached devices that have been in discussion since the early days of CXL: Standard CXL-attached memory, and CXL-attached computational memory modules. While the CXL-attached computational memory modules offload certain computation tasks into the CXL module, the NVMe-oC device offloads a much more basic task, simple data movement, away from the host and into the module. This approach simplifies adoption by reducing the amount of application software re-work necessary to take advantage of the technology. In fact, Wolley claims that the NVMe-oC device can accelerate I/O virtualization on Virtio without requiring any changes to the application software. All that is necessary is a special NVMe-oC driver.
Wolley has built a prototype using an FPGA that contains a CXL controller, an NVMe controller for the NAND, DDR controllers for the memory, and an NVMe-oC bridge to manage the new data movement functions.
The SSD Guy is always fascinated by modest changes that solve significant problems, and this new architecture does a good job of that. As data centers shift from conventional computing to more AI, data movement becomes a bigger and bigger issue. This SSD’s data movement capabilities should dramatically reduce the amount of unnecessary data that is moved to the host as 4KB blocks, rather than 64-byte memory transfers, while offloading wasteful data movement tasks from the host processor.