Pipelined Processors
Today, pipelining is one of the most widely used techniques for improving processor performance. Nearly all general-purpose processor manufacturers utilize pipelining to enhance the performance of their products. In this article, I will explain how pipelining works, using a MIPS processor as a basis. If you want to learn the fundamental concepts of processors, please refer to my previous article.
Pipelined processors are designed by subdividing single-cycle or multi-cycle processors into several stages. The number of stages determines how many instructions can run concurrently on the processor. However, like every technique in computer science, pipelining comes with its own costs and challenges that need to be addressed. Despite these challenges, pipelining ultimately increases the system’s throughput.
To keep this article simple and easy to understand, I will use a pipelined MIPS processor based on a single-cycle architecture as an example.
Datapath
A pipelined datapath is created by adding additional registers to a non-pipelined design. These registers are responsible for holding data passed from one stage to the next. In our example, there are four such registers. We divide our datapath into five stages: Fetch, Decode, Execute, Memory, and Writeback.
Control Logics
The pipelined processor uses the same control signals as a single-cycle machine. In addition, these control signals are also pipelined.
The picture above demonstrates how control signals are pipelined. As you can see, the pipeline registers are extended to include the control signals. This is crucial because control signals determine the current state of the instruction being executed.
Hazards and Challenges
Let’s start by considering an ideal pipeline. In this desired scenario, the pipeline is filled with instructions, allowing us to execute multiple instructions simultaneously. Specifically, if each instruction is completely independent of the others, we could execute up to five instructions together.
In the example above, there are no dependencies between instructions, allowing the pipeline to be fully utilized. As a result, the processor can run at maximum performance.
However, in real-life scenarios, achieving an ideal pipeline is not always possible. There are usually dependencies between instructions, often involving registers used for calculations. For example:
In this case, the and
instruction uses the $s0
register as an input. However, by the third cycle, $s0
has not been calculated yet, meaning the result of the and
operation stored in the $t0
register will not be correct. These types of dependencies are known as RAW (Read After Write) hazards.
Hazards are classified into two main types: data hazards and control hazards. A data hazard occurs when an instruction attempts to read a register that has not yet been written back by a previous instruction. On the other hand, a control hazard occurs when the decision about which instruction to fetch next has not been made by the time the fetch takes place.
Data Forwarding
One way to resolve data hazards is through data forwarding. Essentially, this technique involves forwarding the calculated value from the current instruction’s memory or writeback stage directly to the next instruction’s execute stage. Implementing data forwarding requires adding multiplexers in front of the ALU to select the operand from either the register file, the memory stage, or the writeback stage. Additionally, to handle data forwarding, a new component called a hazard detection unit is added to the processor.
Stalls
Stalls are another method to resolve hazards. In some situations, data forwarding is not feasible. For instance, if the dependent instruction is a lw
(load word) instruction, subsequent instructions must wait until the result of the lw
instruction is written to the register file. In such cases, operations are paused until the required source registers are available.
Stalling a stage is achieved by disabling the pipeline registers, preventing their contents from changing. When a stage is stalled, all preceding stages must also be stalled to ensure no subsequent instructions are lost. While stalls help manage hazards, they degrade performance and should be used only when absolutely necessary.
You can see that the IM
(Instruction Memory) and RF
(Register Fetch) steps of execution are repeated for the and
and or
instructions due to stalls. These instructions resume execution only when the necessary data is available.
Control Hazards
Control hazards occur during branch or jump operations. In a pipelined architecture, the stages are typically filled with instructions in sequence. This process works smoothly as long as there are no branch or jump operations. However, when such an operation occurs, the next instruction to place into the pipeline must be chosen correctly. If not, the pipeline may need to be flushed, which can lead to a decrease in performance.
To address these hazards, we can use a branch prediction algorithm. Additionally, predicting branches earlier with a dedicated unit, separate from the ALU, can reduce the penalties of branch mispredictions. However, this solution might introduce a RAW (Read After Write) hazard, as we might attempt to use a register whose value has not yet been calculated for the prediction. To handle this issue, we can apply one of the solutions mentioned above, such as data forwarding or stalling.
The new units designed to address these hazards are shown in the picture above. Although implementing these solutions adds extra complexity to the architecture, the overall improvement in throughput makes it worthwhile. As a result, nearly all modern processors are designed with pipelining.
Resources
- Digital Design and Computer Architecture 2nd Edition by David Harris & Sarah Harris (All pictures used in this article are taken from this book)
- Onur Mutlu Lectures