05-05-09-Cache

From Msim

Jump to: navigation, search

Various fixes were made to the cache code. See the table below.

Contents

Variable Types

These reflect changes made to both cache.h and cache.c

  • int nsets (number of sets) is now an unsigned int
  • int bsize (block size) is now an unsigned int
  • int balloc (allocate blocks for real?) is now a bool
  • int assoc (cache associativity) is not an unsigned int
  • in the blk_access_fn function pointer
    • parameter int bsize (block size) is now unsigned int
    • returns an unsigned long long rather than unsigned long.
  • unsigned int cache_flush(...) now returns an unsigned long long.
  • unsigned long long cache_access(...)
    • parameter int nbytes (number of bytes requested) is now an unsigned int
  • int cache_probe(...) now returns a bool
  • unsigned int cache_flush(...) now returns an unsigned long long
  • in cache explicit constructor:
    • parameter int nsets is now an unsigned int
    • parameter int bsize is now an unsigned int
    • parameter int assoc is now an unsigned int
    • parameter int balloc is now a bool
    • parameter unsigned int (*blk_access_fn) now returns unsigned long long.
      • blk_access_fn parameter int bsize is now unsigned int

The following changes are in sim-outorder.c

  • All *_access_fn (such as dl1, dl2)
    • return unsigned long long instead of unsigned int.
    • No longer explicitly identify mem_cmd as an enum
    • Use unsigned int for bsize instead of it.
    • Unsigned int lat is now unsigned long long lat.

Variable Types (temporaries)

  • In explicit constructor:
    • Both for loops (bindex, j) now use unsigned ints rather than int.
    • Creation of user_data blk->user_data = (usize != 0 ? new byte_t[usize] : NULL);
      • is now: blk->user_data = (usize != static_cast<unsigned int>(0) ? new byte_t[usize] : NULL);
  • In destructor:
    • Both for loops (bindex, i, j) now use unsigned ints rather than int.
    • No use of temporary pointer blk. Now use delete directly on the user_data.
  • In reset_cache_stats
    • Both for loops (bindex, i, j) now use unsigned ints rather than int.
    • No use of temporary pointer blk. Now set ready directly.
  • in cache_access:
    • cache_blk_t *blk is now initialized to NULL
  • In cache_flush
    • int lat is now unsigned long long lat
    • The for loop now uses an unsigned int.
  • In cache_flush_addr
    • int lat is now unsigned long long lat
    • cache_blk_t *blk is not initialized to NULL

Code Reduction (cache_fast_hit)

  • in cache_access:
    • goto cache_fast_hit removed, all it did was save one conditional (which short-circuits to not taken in a cache_fast_hit case)

Bus usage fix (bus_free)

  • in cache_access:
    • bus_free should not be updated on a replacement only on a writeback (the following code is moved to the conditional: if(repl->status & CACHE_BLK_DIRTY)
      • bus_free = 1 + MAX(bus_free, (tick_t)(now + lat));
      • lat += blk_access_fn(Write, CACHE_MK_BADDR(this, repl->tag, set), bsize, repl, now+lat, context_id);
    • After we //update block tags
      • Read the data block into the cache in all cases (this was changed to cmd as a hack to handle the bzip2 and gzip problem).

CPU to Data L1 Cache Buffer

Since stores commit in zero time, the selection code sends them through without any delay. The latency of the store is ignored. A later load can be held up for pending stores that evicted blocks. This is hugely problematic for bzip2 and gzip during their initialization phases. The use of a write buffer that blocks the stores prevents this problem.

The Write Buffer is per core and shared among the threads. It is defined by "-write_buf:size X", where X is the unsigned int that represents the size of the write buffer.

The write buffer is a set (stl set) that contains the tick_t (time of) that reflects the latency of a store.
At commit:
* If the write buffer is full, check if any of the entries can be evicted (their time is less than the current time). Evict those.
* If it is still full, the store can't complete.
* Otherwise, send the store to the write buffer (realistically, send the store to the cache and get the latency).

This requires various changes:

In cmp.h: 
 Add: "#include<set>"
 After "bool pred_perfect;"
  Add: unsigned int write_buf_size;
  Add: std::set<tick_t> write_buf;
In cmp.c:
 After "pred_perfect(FALSE),"
  Add: write_buf_size(16),
In sim-outorder.c
 After "opt_reg_uint(odb, "-res:fpmult",offset,"total number of floating point multiplier/dividers available",&cores[i].res_fpmult, /* default */cores[i].fu_CMP[FU_FPMULT_INDEX].quantity,/* print */TRUE, /* format */NULL);"
  Add: opt_reg_uint(odb, "-write_buf:size",offset,"write buffer size (for stores to L1, not for writeback)",&cores[i].write_buf_size, /* default */16,/* print */TRUE, /* format */NULL);

The following is a major change in sim-outorder.c:
The original code is as follows (from commit(...))

if((MD_OP_FLAGS(contexts[context_id].LSQ[contexts[context_id].LSQ_head].op) & (F_MEM|F_STORE)) == (F_MEM|F_STORE))
{
 //stores must retire their store value to the cache at commit try to get a store port (functional unit allocation)
 res_template *fu = res_get(cores[core_num].fu_pool, MD_OP_FUCLASS(contexts[context_id].LSQ[contexts[context_id].LSQ_head].op));
 if(fu)
 {
  //reserve the functional unit
  if(fu->master->busy)
   panic("functional unit already in use");
 
  //schedule functional unit release event
  fu->master->busy = fu->issuelat;
 
  //go to the data cache
  if(cores[core_num].cache_dl1)
  {
   //Wattch -- D-cache access
   cores[core_num].power.dcache_access++;
 
   //commit store value to D-cache
   lat = cores[core_num].cache_dl1->cache_access(Write, (contexts[context_id].LSQ[contexts[context_id].LSQ_head].addr&~3),context_id, NULL, 4, sim_cycle, NULL, NULL);
 
   if(lat > cores[core_num].cache_dl1_lat)
    events |= PEV_CACHEMISS;
  }
 
  /* all loads and stores must to access D-TLB */
  if(cores[core_num].dtlb)
  {
   /* access the D-TLB */
   lat = cores[core_num].dtlb->cache_access(Read, (contexts[context_id].LSQ[contexts[context_id].LSQ_head].addr & ~3),context_id, NULL, 4, sim_cycle, NULL, NULL);
   if(lat > 1)
    events |= PEV_TLBMISS;
  }
 }
 else
 {
  //no store ports left, cannot continue to commit insts
  contexts_left.erase(contexts_left.begin()+current_context);
  if(contexts_left.empty())
   break;
  current_context%=contexts_left.size();
  continue;
 }
}

Is now:

if((MD_OP_FLAGS(contexts[context_id].LSQ[contexts[context_id].LSQ_head].op) & (F_MEM|F_STORE)) == (F_MEM|F_STORE))
{
 if(cores[core_num].write_buf.size() == cores[core_num].write_buf_size)
 {
  while(*(cores[core_num].write_buf.begin()) < sim_cycle)
  {
   cores[core_num].write_buf.erase(cores[core_num].write_buf.begin());
  }
 }
 //stores must retire their store value to the cache at commit try to get a store port (functional unit allocation)
 res_template *fu = res_get(cores[core_num].fu_pool, MD_OP_FUCLASS(contexts[context_id].LSQ[contexts[context_id].LSQ_head].op));
 if(fu && (cores[core_num].write_buf.size() < cores[core_num].write_buf_size))
 {
  //reserve the functional unit
  if(fu->master->busy)
   panic("functional unit already in use");
 
  //schedule functional unit release event
  fu->master->busy = fu->issuelat;
 
  tick_t write_finish = sim_cycle;
 
  //go to the data cache
  if(cores[core_num].cache_dl1)
  {
   //Wattch -- D-cache access
   cores[core_num].power.dcache_access++;
 
   //commit store value to D-cache
   lat = cores[core_num].cache_dl1->cache_access(Write, (contexts[context_id].LSQ[contexts[context_id].LSQ_head].addr&~3),context_id, NULL, 4, sim_cycle, NULL, NULL);
 
   if(lat > cores[core_num].cache_dl1_lat)
    events |= PEV_CACHEMISS;
 
   write_finish = std::max(write_finish, sim_cycle + lat);
  }
 
  /* all loads and stores must to access D-TLB */
  if(cores[core_num].dtlb)
  {
   /* access the D-TLB */
   lat = cores[core_num].dtlb->cache_access(Read, (contexts[context_id].LSQ[contexts[context_id].LSQ_head].addr & ~3),context_id, NULL, 4, sim_cycle, NULL, NULL);
   if(lat > 1)
    events |= PEV_TLBMISS;
 
   write_finish = std::max(write_finish, sim_cycle + lat);
  }
  cores[core_num].write_buf.insert(write_finish);
  assert(cores[core_num].write_buf.size() <= cores[core_num].write_buf_size);
 }
 else
 {
  //no store ports left, cannot continue to commit insts
  contexts_left.erase(contexts_left.begin()+current_context);
  if(contexts_left.empty())
   break;
  current_context%=contexts_left.size();
  continue;
 }
}


Unused Macros

Removed the following unused macros:

  • #define CACHE_DOUBLE(data, bofs)
    • __CACHE_ACCESS(double, data, bofs)
  • #define CACHE_FLOAT(data, bofs)
    • __CACHE_ACCESS(float, data, bofs)
Personal tools