Marc Greenberg of Cadence explains in an ARM TechCon Paper why DRAM Latency is Getting Worse: “The DDR core timing is staying relatively constant as measured in nanoseconds and thus is increasing when measured in clock cycles. The doubling of frequency and bandwidth while keeping DRAM core timing constant is achieved in DRAM by exploiting parallelism within the DRAM array.”

The fundamental bottleneck is due to DDR core latency of around 13ns, which hasnt changed much in recent years, even as systems have gotten faster and have gone multi-core, adding more DDR demand. It’s been a long-term trend, that DRAM performance has improved slowly, at around 9%/yr, less that micrprocessor performance (see this PPT on CPU and memory trends. The CPU to DRAM bus bottleneck has been relieved by parallelism, but parallelism only increases bandwidth versus latency mismatch. Bandwidth has improved by 32X in 10 years, while access time has been reduced by a 2.5X factor. We tolerate this system because DRAM is cheap per bit and we have cache memory to keep systems (more or less) efficient and performing, except when they have too much ‘cache miss’. Memory controllers are now integrated on chip in order to reduce latency, and to get better we need to consider the next step.

There is a better way: multi-chip package based DDR memory. Faster access speed and bandwidth is possible if you put the memory chip face-to-face with the CPU in a multichip package and connect them via copper pillar. With that configuration, you could redesign the interfaces for wide (~256 bits) and fast (2ns) access, reducing real access time closer to the theoretical read time limits. This would be on the order of 256Gbit/sec, or 10X what current DDR is doing. It might be used as another level of memory cache, huge but reasonably fast, or all the memory could be added in a stackable package format.

SSD (flash-based solid-state drive) is displacing the hard drive in high-performance systems because it improves access time (and makes for a lighter system for laptops). A fix for the DRAM latency problem will similarly be a boon to system performance, so multi-chip-package solutions DRAM and other techniques may be used to reduce memory latency in systems.

By Patrick