Cache Management Overhead
\(A,B,C)\text{number of sets } S = \frac{C}{A\cdot B}$
The lines of cache are given by: .
Cache payload = , bits
Management overhead:
- Valid bits:
- Tag bits: , where
- Dirty bits:
- Replacement bits: Approximately
Understanding Cache Misses
Each memory access can be defined as either a cache hit or cache miss
Why do Cache Misses Happen
- Are cache misses happening because it is somehow unavoidable?
- Are cache misses happening because of the limited capacity of the cache?
- Are cache misses happening because of the limited associativity of the cache?
This leads to a cache miss classification model called the 3C model
The 3C Cache Miss Model
Our cache is defined as \_{1}=$(A,B,C;S)$.
Based on our cache defined above, let’s consider a fully-associative cache of the same block size and the same capacity: \_{3}=$\left( \frac{C}{B}, B, C;1 \right)$
Compulsory Miss
Since we start with all the data in and nothing in $, the first access to any memory block is necessarily a miss. This is called a compulsory miss.
- This access would be a miss even in an infinite-capacity cache \_{2}=$(A,B,\infty)$
- The number of compulsory misses is a tight lower bound on the minimum possible number of cache misses.
Capacity Miss
If a memory reference that is a non-compulsory miss in \{1}${3}$, then limited associativity can’t be the cause of the miss, and it must be because of limited capacity. This is called a capacity miss.
Conflict Miss
If a memory reference that is a non-compulsory miss in \{1}${3}$, then limited associativity is the cause of the miss. This is called a conflict miss.
Memory Performance Equation
Let be the total number of memory references and let be the number of these references that miss in $. Define to be the miss rate and to be the hit rate. We want to be large and to be small.
Let be the hit time and let be the additionally miss penalty (both in processor cycles)
- Typically, .
Average memory access time (AMAT): is
- e.g., if , AMAT cycles
Memory Effects Dominate Performance
Compare the magnitudes of the numbers in the memory performance equation with the corresponding magnitudes in the Pipeline Performance Equation.
The AMAT term would totally swamp out the load-use-penalty (lp), misprediction penalty (mp), and return penalty (rp) terms.
Furthermore, this is an average. An individual memory access takes between 1 to 101 cycles.
Dealing with this in hardware involves a complete rethinking of the processor implementation (Advanced Computer Architecture)
There are good programming practices to reduce the miss rate , we talk about this in Writing Cache-Friendly Code.