Fetch Issue Delay Problem
The Fetch Issue Delay Problem is when the use of an unlimited write buffer causes the L2 cache (unified versions only, and by extension, any unified non-L1 cache) to have replacement delays that exceed the commit timeout of the simulator.
Exceeding commit timeout is only a consequence of the problem.
Using the default settings, bzip2 and gzip would exceed the commit timeout if simulated (not fast-forwarded) during their initialization phases.
Since there was an unlimited number of write buffers, store instructions could evict blocks from the cache but not incur the latency penalty (in selection, stores commit in essentially zero time, the latency of this is never used anywhere!) Therefore, a cache set within the L2 cache could be so backlogged with replacements that a non-store would be forced to wait for all of them to occur. While this could occur with a load to the same set, the problem occurred at instruction fetch in bzip2 and gzip.
From the log: Fetch (il1) at Cycle 220142. Access type: Read to Set 231, Tag 294916. Current Tags (0,0). Miss. Propagate to next level: UL2 at Cycle 220142. Access type: Read to Set 57, Tag 73729. Current Tags (82849,82848,82847,82846,82845,82844,82843,82842). Miss. Propagate to Main Memory: Evict 82842 (LRU). 82842 ready at 708165. Latency is 708165 - 220142 = 488023. Bus from UL2 to Main Memory is tied up for 493430 cycles. Net Latency at UL2 is 493792 cycles. Net Latency at il1 is 493792 cycles.
This would cause the fetch issue delay for the context running bzip2/gzip to be huge and ultimately stall the processor.
The has been partially dealt with with the May 5th 2009 introduction of a CPU to L1 write buffer. Additional write buffers would be needed to fully solve this problem. The default setting is 16 write buffer entries per core. These retain the latency incurred by stores and prevent additional stores from being committed if there is no room left in the buffer.