# 7.2: Cache performance

The solution to this problem, or at least a partial solution, is caching. A “cache” is a small, fast memory that is physically close to the CPU, usually on the same chip.

Actually, current computers typically have several levels of cache: the Level 1 cache, which is the smallest and fastest, might be 1–2 MiB with a access times near 1 ns; the Level 2 cache might have access times near 4 ns, and the Level 3 might take 16 ns.

When the CPU loads a value from memory, it stores a copy in the cache. If the same value is loaded again, the CPU gets the cached copy and doesn’t have to wait for memory.

Eventually the cache gets full. Then, in order to bring something new in, we have to kick something out. So if the CPU loads a value and then loads it again much later, it might not be in cache any more.

The performance of many programs is limited by the effectiveness of the cache. If the instructions and data needed by the CPU are usually in cache, the program can run close to the full speed of the CPU. If the CPU frequently needs data that are not in cache, the program is limited by the speed of memory.

The cache “hit rate”, $$h$$, is the fraction of memory accesses that find data in cache; the “miss rate”, $$m$$, is the fraction of memory accesses that have to go to memory. If the time to process a cache hit is $$T_{h}$$ and the time for a cache miss is $$T_{m}$$, the average time for each memory access is

$h T_{h} + m T_{m} \nonumber$

Equivalently, we could define the “miss penalty” as the extra time to process a cache miss, $$T_{p} = T_{m} - T_{h}$$. Then the average access time is

$T_{h} + m T_{p} \nonumber$

When the miss rate is low, the average access time can be close to $$T_{h}$$. That is, the program can perform as if memory ran at cache speeds.