Building a Memory System

4 min readNov 11, 2024

In modern computers, memory remains one of the primary bottlenecks to achieving optimal performance. While DRAMs have become significantly faster compared to those in previous decades, they still fall short of keeping up with processor speeds. To tackle this challenge, computer architects have designed a multi-level memory systems, which helps bridge the speed gap between memory and processors.

In this article, I’ll explain the fundamental structures of memory units and the different levels within the modern memory systems.

General Memory Structure

Memory can be constructed with various technologies, such as flip-flops, SRAM, and DRAM. DRAM is the most cost-effective option due to its minimal circuitry requirements for storing each bit, making it the primary choice for main memory in modern systems.

However, DRAM alone is not used to build large memory systems. Increasing the capacity of a single memory unit often makes it slower and more cumbersome. To address this, an hierarchy is introduced within main memory itself.

Components of Hierarchy

Memory is organized into several layers to optimize both access speed and storage density. Here’s an overview of these components, from the most general to the deepest levels within the hierarchy:

Channel: A memory channel serves as a communication path connecting the memory controller to the memory modules. Multiple channels, like in dual-channel or quad-channel setups, enable parallel data transfers, effectively increasing the system’s memory bandwidth. For instance, a dual-channel configuration doubles the data transfer capability by using two channels simultaneously, enhancing performance over a single-channel setup.
Rank: A rank is a group of DRAM chips on a memory module that can be accessed together by the memory controller. It’s often defined by the width of the data bus (64-bit on most modern systems).
Bank: A bank is a subdivision within a DRAM rank and allows for simultaneous access to different rows of data within a rank. By using multiple banks, DRAM can overlap operations. For example, while one bank is busy refreshing its data, another bank can be accessed, thereby improving overall throughput.
Subarray: Subarrays are smaller, independently accessible sections within a bank. Each subarray consists of multiple rows and columns of DRAM cells and is organized to reduce the area required for wiring within a bank.
Mats: Mats are the smallest addressable units within a DRAM subarray. Each mat is essentially a mini-array of DRAM cells that stores data in rows and columns.

Array Organization

Many memory systems use an array organization to increase efficiency, enabling them to handle multiple bits of data simultaneously. In this setup, the system is configured to accept N bits of input, such as addresses or commands, and return M bits of output data at once.

In the example above, we have a 2x3 memory array with 2-bit addressing and 3-bit data output. When the address 10 is provided, the decoder identifies the correct path to activate, allowing data stored in memory cells (DRAM, SRAM, etc.) along that path to be accessed. Each bit of data stored in this row is then forwarded to the output, yielding the result 100. This setup allows multiple bits to be accessed in parallel, enhancing data retrieval efficiency.

Banking

The concept here is to divide memory into smaller arrays, or “banks,” because large arrays are too slow and insufficient for meeting performance demands. Memory banks can be accessed independently, even in the same or consecutive cycles, which is especially beneficial for SIMD (Single Instruction, Multiple Data) processors that require rapid, parallel data access. This approach optimizes both speed and efficiency, enabling high-performance computation across multiple data streams.

Address and data buses are shared across all memory banks, so in any given cycle, only one bank can be accessed at a time. This setup, while economical, requires careful management of memory access patterns to optimize performance and minimize delays.

Each bank contains a buffer known as the row buffer. When accessing a specific memory cell, the entire row containing that cell is read into the row buffer. This approach significantly improves access times for consecutive accesses within the same row, as the data is already available in the row buffer. However, accessing data from a different row will still incur additional latency since it requires loading a new row into the buffer.

Resources

Digital Design and Computer Architecture 2nd Edition by David Harris & Sarah Harris (All pictures used in this article are taken from this book)
Onur Mutlu Lectures