3.0.15

From Msim

Jump to: navigation, search

This page refers to the changes made in the 3.0.15 release of M-sim.

Release Date:

Contents

loader.c

"could not read `.text' from executable" has been replaced with "could not read .init or .fini from executable".

Memory Mapping

In memory.c, acquire_address(...) now handles mapping conflicts differently.
Originally, it would call acquire_address again with 0 as the first parameter (forcing the acquired address to be chosen from a high memory location). This behavior did not agree with /sbin/loader trying to place objects into virtual space. Instead, it expected that the address would be the next possible sequential location.

       if(!bounds_verify(memory_map,addr,len) || !bounds_verify(internal_map,addr,len))
       {
               return acquire_address(0,len);
       }

is now:

       if(!bounds_verify(memory_map,addr,len) || !bounds_verify(internal_map,addr,len))
       {
               return acquire_address(addr+MD_PAGE_SIZE,len);
       }

sys_syscall

The parameter traceable has been removed. It does not do anything in M-sim.

PALcalls

Preliminary support for PAL calls has been added. Primiarly, a sys_palcall function similar to sys_syscall.

Bug Report

vortex (spec2K) using input set 1 may be generating accesses outside of it's allotted stack size. This wouldn't be an issue unless the corresponding virtual addresses were allocated however the new fetch code checks for boundaries and may be flagging this when we wouldn't want to do so.

     could not find memory page associated with 0x11ff2f180

Disabling the assertion will remedy the issue (in the case of vortex, this is not a problem).

Fetch (SMT) Bug

If a thread fetched an instruction that missed into either the TLB or I-Cache, fetch was suspended for that thread. It was possible, in SMT, that other threads could then evict the instruction that was missed. When the thread resumes, it would miss again.
Additionally, the flags set to indicate these misses were never used.
The fetch logic has been reorganized, no longer do we "continue;" when there is an I-cache/I-TLB miss. The adjustment to fetch_issue_delay ensures that no further instructions will be fetched. At the time we insert the instruction into the IFQ, we increment fetched_cycle by (lat - 1). This does not affect normal instructions but ensures that the miss delay is taken into account for this instruction. Original Code:

               //is this a bogus text address? (can happen on mis-spec path)
               if(mem->ld_text_base <= contexts[context_id].fetch_regs_PC
                       && contexts[context_id].fetch_regs_PC < (mem->ld_text_base+mem->ld_text_size)
                       && !(contexts[context_id].fetch_regs_PC & (sizeof(md_inst_t)-1)))
               {
                       //read instruction from memory
                       MD_FETCH_INST(inst, mem, contexts[context_id].fetch_regs_PC);

                       //address is within program text, read instruction from memory
                       int lat = cores[core_num].cache_il1_lat;
                       if(cores[core_num].cache_il1)
                       {
                               //access the I-cache
                               lat = cores[core_num].cache_il1->cache_access(Read, IACOMPRESS(contexts[context_id].fetch_regs_PC),
                                       context_id, NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle, NULL, NULL);
                               last_inst_missed = (lat > cores[core_num].cache_il1_lat);
                       }
                       if(cores[core_num].itlb)
                       {
                               //access the I-TLB, NOTE: this code will initiate speculative TLB misses
                               int tlb_lat = cores[core_num].itlb->cache_access(Read, IACOMPRESS(contexts[context_id].fetch_regs_PC),
                                       context_id, NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle, NULL, NULL);
                               last_inst_tmissed = (tlb_lat > 1);
                               //I-cache/I-TLB accesses occur in parallel
                               lat = MAX(tlb_lat, lat);
                       }

                       //I-cache/I-TLB miss? assumes I-cache hit >= I-TLB hit (assuming 1 cycle)
                       if(lat != cores[core_num].cache_il1_lat)
                       {
                               //I-cache miss, block fetch until it is resolved
                               contexts[context_id].fetch_issue_delay += lat - 1;
                               continue;
                       }
                       //else, I-cache/I-TLB hit
               }

When lat != cores[core_num].cache_il1_lat, the fetch is stopped (this also means that last_inst_missed and last_inst_tmissed are never used). Stopping fetch is necessary, however, with SMT other threads will now make accesses to these caches and can cause a miss to occur when the original thread resumes. This wasn't a problem in simplescalar since there were no other threads, the original thread would generate a hit when it tried again (also, last_inst_tmissed and last_inst_missed were global and were retained).
If we are going to fetch, we now handle it as follows:

               if(do_fetch)
               {
                       //read instruction from memory
                       MD_FETCH_INST(inst, mem, contexts[context_id].fetch_regs_PC);

                       //Then access Level 1 Instruction cache and Instruction TLB in parallel
                       lat = cores[core_num].cache_il1_lat;
                       if(cores[core_num].cache_il1)
                       {
                               //access the I-cache
                               lat = cores[core_num].cache_il1->cache_access(Read, IACOMPRESS(contexts[context_id].fetch_regs_PC), context_id, NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle, NULL, NULL);
                               last_inst_missed = (lat > cores[core_num].cache_il1_lat);
                       }

                       if(cores[core_num].itlb)
                       {
                               //access the I-TLB, NOTE: this code will initiate speculative TLB misses
                               int tlb_lat = cores[core_num].itlb->cache_access(Read, IACOMPRESS(contexts[context_id].fetch_regs_PC), context_id, NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle, NULL, NULL);
                               last_inst_tmissed = (tlb_lat > 1);
                               lat = MAX(tlb_lat, lat);
                       }

                       //I-cache/I-TLB miss? assumes I-cache hit >= I-TLB hit (assuming 1 cycle)
                       if(lat != cores[core_num].cache_il1_lat)
                       {
                               //I-cache/I-TLB miss, block fetch until it is resolved
                               contexts[context_id].fetch_issue_delay += lat - 1;
                       }
               }

On a miss, we still continue this current fetch. However, once we place the instruction into the IFQ we add the latency of the miss. Register_rename will ensure that the latency is obeyed and fetch_issue_delay stops further fetches until the miss resolves - this allows the current miss to be placed into the IFQ with a delay that reflects the cache misses. We added "+ (lat - 1)" to the IFQ insertion.

               contexts[context_id].IFQ[contexts[context_id].fetch_tail].fetched_cycle = sim_cycle + (lat - 1);

Syscalls

class osf_sockaddr incorrectly used 24 bytes for sa_data instead of 14.

osf_sys_kill

Partial support for negative pids. Does not propagate kills but does kill the initial child.

osf_sys_getrlimit

Returns static maximum values as defined in syscall.c

osf_sys_setrlimit

Does not allow setting, maximum values are not changable by the thread.

osf_sys_getsysinfo

Additional requests are now handled: GSI_PROC_TYPE, GSI_CPU_INFO, GSI_PLATFORM_NAME (buffer overflow not handled correctly, name truncated instead), GSI_PHYSMEM, GSI_MAX_CPU, GSI_CPUS_IN_BOX, GSI_TIMER_MAX.

osf_sys_usleep_thread

This is now partially supported. useconds is multiplied by 1000 and added to a variable number (starts at 200, incremented by 1 each time any thread uses it, modulo 1000). In fast-forward mode, the thread is skipped for that many cycles. This is not implemented in full-simulation since pipeline scheduling should prevent usleep/load_conditional/store_conditional problems.

For example, observe the performance of swim and wupwise:
../sim-outorder -fastfwd 10000000 -max:inst 1000000 -rf:size 1024 bzip2NS.1.arg swimNS.1.arg twolfNS.1.arg wupwiseNS.1.arg
THROUGHPUT IPC: 3.12943
IPC 0 (bzip2NS.1.arg): 0.258778
IPC 1 (swimNS.1.arg): 0.000915008
IPC 2 (twolfNS.1.arg): 2.86837
IPC 3 (wupwiseNS.1.arg): 0.000915008
In this case, there is a conditional load paired with a memory barrier and a conditional store. During fast-forward, they are lock-stepped and neither can progress. After fast-forward, they end up in a similar situation (however, it is possible for pipeline conditions to allow this to end). After this syscall change:
THROUGHPUT IPC: 5.03576
IPC 0 (bzip2NS.1.arg): 0.253994
IPC 1 (swimNS.1.arg): 2.22891
IPC 2 (twolfNS.1.arg): 1.18452
IPC 3 (wupwiseNS.1.arg): 1.36776

A sleep variable has been added to context_t. This is set by usleep_thread and checked by the fast_forwarding logic (ff_context in sim-outorder.c).

Personal tools