Branch Prediction Algorithms
Branch prediction is one of the most important techniques for improving performance in multi-cycle pipelined machines. It allows us to correctly fill the pipeline and continue processing incoming instructions efficiently. In this article, I will explain several branch prediction algorithms and provide examples to illustrate them.
Branch prediction approaches can be divided into two categories: Static Branch Prediction and Dynamic Branch Prediction. Let’s begin with the static techniques.
Static Branch Prediction
Static branch prediction techniques are methods used in computer processors to predict the outcome of conditional branches during the fetch stage of instruction execution, without relying on runtime information.
Always Not Taken: Assumes that a branch will never be taken, continuing along the sequential instruction path. It has low accurancy which is ~30–40% for conditional branches.
Always Taken: Assumes that a branch will always be taken, directing the processor to follow the branch path immediately. It has better accuracy compared to the “Always Not Taken” approach because branches are typically taken in common code structures, especially within loops.
Backward Taken, Forward Not Taken: Uses the direction of the branch to make predictions. If the branch is backward (e.g., loops), it’s predicted to be taken; if forward, it’s predicted not to be taken.
Profile-Guided Prediction: Utilizes compile-time profiling data to predict branches. The compiler collects statistics on branch behavior from previous program runs and embeds these predictions into the code. Accuracy depens on dynamic branch behavior.
All of these techniques share a common drawback: they cannot adapt to dynamic changes in program behavior. While this limitation can be mitigated by a dynamic compiler (e.g., JIT), the adaptation is not at a fine granularity. Additionally, dynamic compilers introduce their own overheads.
Dynamic Branch Prediction
The idea behind dynamic branch prediction is to predict branches based on information collected at run-time. This approach has both advantages and disadvantages. On the positive side, predictions are based on the execution history of branches, allowing the system to adapt to changes in branch behavior. However, these techniques are more complex than static predictions and require additional hardware to implement.
Here is the simplest illustration of the required hardware:
The slim box at the top represents a table used by the branch predictor. The program counter is used as an index into this table. To decide whether the branch is taken or not, the result from this table is checked.
The larger box below it is called the Branch Target Buffer (BTB). BTB is a cache used to improve the efficiency of branch prediction. It stores the target addresses of recently executed branches, allowing the CPU to quickly retrieve the destination address of a branch instruction when it is encountered again.
Depending on the selected prediction technique, these components can be extended or modified.
Last Time Predictor
This is the most basic dynamic branch prediction approach, where the prediction is based on the direction taken in the last instance. The behavior of previous instances can be stored in the Branch Target Buffer (BTB). The number of bits used for this purpose can be adjusted: if only the last step is tracked, a single bit is sufficient. However, to increase accuracy, more bits are needed to retain information about earlier instances.
The accuracy of this approach is quite good for loops, as it typically only mispredicts the first and last cycles of the loops.
Two - Level Prediction
This prediction type uses two levels of history, and there are two different implementations.
Global Branch Correlation
In this approach, the outcomes of branches are associated with a global register known as the Global History Register (GHR). Predictions are then made based on this history for all branches. To track the history and necessary actions, an additional table called the Pattern History Table (PHT) is used, which is indexed by potential values of the GHR. The size of the GHR can be adjusted as needed; if more branch history is to be tracked, more bits are required. However, this will also increase the size of the PHT.
To improve accuracy, additional information can be provided to the PHT. For example, the GHR can be hashed with the program counter (PC) of the current branch. This will offer more contextual information to enhance the prediction process.
Local Branch Correlation
The second implementation of Two-Level Prediction is quite similar to Global Branch Correlation. However, in this technique, branch history is maintained separately for each branch, allowing for distinct tracking.
Hybrid Branch Predictors
There is significant heterogeneity in the predictability behavior of branches, making a one-size-fits-all branch prediction algorithm ineffective. The hybrid approach utilizes multiple types of predictors and selects the best one for processing each instruction.
Advantages of this approach:
- Better accuracy
- Reduced warmup time (Dynamic predictors require “previous instances” for their history, and the process of collecting this data is referred to as warmup.)
Disadvantages:
- Needing a selector to choose predictor
- Larger latency (longer critical path and longer clock cycle)
- More hardware is needed
There are various prediction types, which we will not cover here, designed for specific purposes, such as optimizing for loops. These predictors can be very effective when used with this approach.
Perceptron Branch Predictor
The Perceptron Branch Predictor is an advanced branch prediction technique that uses a machine learning model, specifically a perceptron (a type of neural network), to predict whether a branch will be taken or not. Unlike traditional predictors that rely on simple history patterns, the perceptron predictor learns and adapts based on past branch outcomes and global branch history. It uses weighted inputs from branch history to predict the branch outcome, similar to how a perceptron in machine learning works.
Each branch associated with a perceptron containing a set of weights. Each weight corresponds a bit in the GHR. This aproach represents how much the bit is correlated with the direction of the branch.
TAGE
Different branches require varying history lengths for optimal prediction accuracy. The idea behind TAGE (Tagged Geometric History Length) is to have multiple Pattern History Tables (PHTs) indexed by Global History Registers (GHRs) of different lengths, and to intelligently allocate PHT entries to different branches based on their history requirements.
Advantages of TAGE:
- Chooses the best history length to predict each branch
- Enables long branch history lengths
Disadvantages
- Hardware complexity
- Need to choose a good has functions and table sizes to maximize accuracy and minimize latency
These are some of the common and basic branch prediction techniques. Modern processors employ much more complex methods. This article serves as an introduction to the topic. For further reading, please refer to the resource section.
Resources
- Digital Design and Computer Architecture 2nd Edition by David Harris & Sarah Harris (All pictures used in this article are taken from this book)
- Onur Mutlu Lectures
- Digital Design and Computer Architecture 2nd Edition by David Harris & Sarah Harris (All pictures used in this article are taken from this book)
- Onur Mutlu Lectures