MDI_Logo
Customers   |   Contact Us   |  Evaluation  |  Demonstrations   |    
    Company     Technology     Products
Home | Demonstrations | Semiconductors | Functional Cache Model

      Cycle-, Functional- and Transaction-Accurate Cache Model

VisualSim provides a graphical modeling and simulation environment to conduct performance analysis and architecture exploration studies of complex systems prior to implementation. This examples shows the use of the Functional Cache module in VisualSim. Using this detailed cache model, customers can quickly configure and experiment with a multi-level cache at implementation-accurate detail.

The cache architecture along with the bus and processor configurations can be modified to target power and performance metrics. Experiments by customers have shown accuracy of over 90% with this VisualSim approach. Moreover, this analysis is done prior to any implementation. Hence the rework cost is almost fully eliminated. This example shows a two level cache that are at a implementation detail and the bus and SDRAM are abstract definitions..

The VisualSim model associated with this description is provided below. You can view, change parameter values and run simulation right from within the Web Browser. No additional software is required. This shows how you can use a pre-built VisualSim model for doing trade-studies.

To use the models at the links, click on the GO button to run the simulation. Double-click on any parameter in the model window to change the parameter value.

Click here to view the interactive VisualSim Block Diagram and model - (Model - 1)

High-Performance Functional Cache Model with ARM1136J (F)-S

This system model was constructed to design the peripherals around a complex embedded processor such as the ARM 1136J (F)-S for a MPEG4 decoder application. It was determined through other exercises that the maximum acceleration can be gained by developing a highly optimized cache. So, the cache was modeled in complete functional accuracy to explore various strategies including N-Way Associative, Write-Back, Write-Through, replacement strategies, Pre-fetch lines and timing.  As the MPEG decoder code was already developed and debugged, there was no additional need for performing any software verification.  Moreover the software executed at every clock cycle without any external interruption.  The peripherals considered contained 2-levels of cache, bus and memory.

This model does not include the ARM ISS, as a processor trace was sufficient to trigger the model. A program trace was generated from the ARM processor running independently.  This trace was entered directly into the VisualSim model.  The cache was modeled as lines of cache with the matching of the valid and tag bits leading to the Word selection, as described in Figure 1.  The bus and SDRAM was abstracted using the VisualSim performance resource models and treated as active schedulers with queues and a processing delay.  The specific details of the cache operations modeled and the system model are explained below.


Figure 1.  Functional Cache Flow.

Introduction

Cache comes from the French word “cacher”, meaning to hide, as a cache is transparent to, or hidden from, the user.  The Functional Cache Model investigates the effects of memory requests being submitted to a cache from a trace file. The cache obtains the requested address based on the type of cache organization, performing cache line replacements with a line replacement algorithm, pre-fetching cache addresses based on spatial locality, and calculating the hit/miss ratio statistics.  This model could be used for either an instruction, or data cache; sometimes referred to as Harvard cache architecture.

Definitions

Hit Ratio: Ratio of memory accesses when data is available in the cache

Hit Time: Cache access time plus time to determine hit/miss

Miss Ratio: Ratio of memory accesses when data is not available in the cache

Miss Time: Time to update cache with valid data

Equation 1.  Hit Ratio

Hit Ratio = Hits / (Hits + Misses)

Equation 2.  Miss Ratio

Miss Ratio = Misses / (Hits + Misses) = 1.0 - Hit Ratio

The Hit Time is less than the Miss Time in a typical system the Hit Ratio is typically much higher than the Miss Ratio, the advantage of using a cache memory. 

If a write miss occurs, the processor can write to the cache, and memory can be updated when the replacement algorithm executes.  This is sometimes referred to as a “write-back” cache.  If a write miss occurs, the processor can write to both the cache and memory at the same time, sometimes termed a “write-through” cache.  In addition, there may be “write-buffers” to avoid stalling write through operations.

Types of Caches

Direct-Mapped Cache: Main Memory many to one mapping, based on cache lines.

Implies cache address is memory address modulo cache lines, sometimes referred to as index bits.  Valid bit indicates whether cache entry is valid.  Tag field indicates upper address bits, used to determine if the requested address and valid will result in a cache hit.

Fully Associative Cache: Compares all tag bits in parallel, i.e., associative memory for address compare function.  No index bits.  Implies more hardware comparators in the cache memory organization, usually consumes more power, and increases cost.

N-Way Set Associative Cache: Cache index selects a “set” from the cache and performs tag field/valid bit comparisons in parallel (N Comparators). Valid bit indicates whether cache entry is valid.  Tag field indicates upper address bits, used to determine if the requested address and valid bit will result in a cache hit.  Additional delay needed to multiplex associative sets in determining hit rate, including data access.  Increasing set associativity reduces the miss rate, however, increases hit time slightly.

Algorithms

Least Recently Used Algorithm: Cache entry that can be evicted, written back to memory, depending on cache structure.  This implies bits to indicate age of a cache entry added to typical valid bit, tag bits, and actual data bits.

Round Robin: This is the simplest policy. The cache just selects the next cache line

modulo # of lines.  E.g. if the previously allocated line was line 7, the next allocated

line will be 0, then 1, then 2, etc.

Write-Back Cache: Also called "copy back" cache, this policy is "full" write caching of the system memory. When a write is made to system memory at a location that is currently cached, the new data is only written to the cache, not actually written to the system memory. Later, if another memory location needs to use the cache line where this data is stored, it is saved ("written back") to the system memory and then the line can be used by the new address.

Write-Through Cache: With this method, every time the processor writes to a cached memory location, both the cache and the underlying memory location are updated. This is really sort of like "half caching" of writes; the data just written is in the cache in case it is needed to be read by the processor soon, but the write itself isn't actually cached because we still have to initiate a memory write operation each time.

Cache Pre-Fetch: Cache can pre-fetch new cache lines prior to completing current cache line, assuming a sequential execution, a reasonable assumption.

Model Overview

The Functional Cache Model utilizes a Traffic_Reader Block to laod the Trace file into the Cache, as shown in Figure 2, below. The Trace File is used to pre-load the cache with some random memory locations. The Functional Cache Model utilizes the actual cache address streams from the trace.

Figure 2.  Functional Cache Model

The parameters on the right of the model can be modified to create different variations of the cache.  Additional caches can be attached to create L3 or greater cache levels.  The output of the L2 cache is sent to the Bus and SDRAM. The output of the SDRAM is sent back to the L2 cache which sends the data out on the hit port. If there was a miss, the data would come out on the miss port.

The Functional Cache Model plots a real time hit ratio and real time address map of requests being made to the cache.  These two plots allow the user to “see” what the cache is doing at all times.  The real time address map depicts cache misses as the negative values of the index address, while cache hits are represented as positive values of the index address.  The hit ratio plotted is as the ratio of total hits to total requests for the duration of the model.

Cache Pre-Fetching

The Functional Cache Model performs pre-fetching of cache data based on the incoming address and is selectable by the parameter “Check_How_Many_PreFetch_Lines” parameter.  The model examines 0 to N cache lines from the current address to see if they have their valid bit set true.  If the Valid bits are set true, then they are not pre-fetched.

Cache Addresses

Cache addresses are represented as integers internally to provide for maximum performance.  There are three fields for each address, Addr_Bits, Tag_Bits, and Word_Bits.  The Addr_Bits correspond to the Index bits and are represented by 12 bits, or three HEX characters in the original address trace. The ADdr_Bist length is varied based on the size of the cache.

The Tag_Bits are the upper sixteen bits of the complete cache address.

Cache Least Recently Used

Cache Least Recently Used augments other cache operations, whereby the cache looks for indexes that have aged and writes them back to SDRAM. The model contains an integer array that is used for LRU calculations, named:

DEF Cache_Age         block  {4096, 0}  ;

In addition, there is a Cache Lock array that could be used to lock certain addresses in the cache, so they could not be replaced:

DEF Cache_Lock        block  {4096, 0}  ;

Typical Results

The Functional Cache Model can provide real-time Hit Ratio and Address Map outputs.

Figure 3, illustrates the hit ratio for the target application and processor trace file.  Figure 4 shows the corresponding Address Map.

 


Figure 3.  Cache Hit Ratio


Figure 4.  Cache Address Map (+Hit, -Miss)

Go To Top



  Copyright 2008© Mirabilis Design Inc. All Rights Reserved.