VisualSim
provides a graphical modeling and simulation environment to conduct
performance analysis and architecture exploration studies of complex systems prior to implementation.
This examples shows the use of the Functional Cache module in VisualSim. Using this
detailed cache model, customers can quickly configure and experiment with
a multi-level cache at implementation-accurate detail.
The cache architecture along with the
bus and processor configurations can be modified to target power and performance metrics.
Experiments by customers have shown accuracy of over 90% with this VisualSim approach.
Moreover, this analysis is done prior
to any implementation. Hence the rework cost is almost fully eliminated. This
example shows a two level cache that are at a implementation detail and the bus and SDRAM are abstract definitions..
The
VisualSim model associated with this description is provided below.
You can view, change parameter values and run simulation right from
within the Web Browser. No additional software is required. This
shows how you can use a pre-built VisualSim model for doing trade-studies.
To use the models at the links,
click on the GO button to run the simulation. Double-click on any parameter in
the model window to change the parameter value.
Click
here to view the interactive VisualSim
Block Diagram and model - (Model
- 1)
High-Performance
Functional Cache Model with ARM1136J (F)-S
This system model was constructed to design the peripherals
around a complex embedded processor such as the ARM 1136J (F)-S for a MPEG4 decoder application. It was determined through other exercises
that the maximum acceleration can be gained by developing a highly optimized
cache. So, the cache was modeled in
complete functional accuracy to explore various strategies including N-Way
Associative, Write-Back, Write-Through, replacement strategies, Pre-fetch lines
and timing. As the MPEG decoder code was
already developed and debugged, there was no additional need for performing any
software verification. Moreover the
software executed at every clock cycle without any external interruption. The peripherals considered contained 2-levels
of cache, bus and memory.
This model does not include the ARM ISS, as a processor trace
was sufficient to trigger the model. A program trace was generated
from the ARM processor running independently.
This trace was entered directly into the VisualSim model. The cache was modeled as lines of cache with
the matching of the valid and tag bits leading to the Word selection, as
described in Figure 1. The bus and SDRAM
was abstracted using the VisualSim performance resource models and treated as
active schedulers with queues and a processing delay. The specific details of the cache operations
modeled and the system model are explained below.
Figure 1. Functional
Cache Flow.
Introduction
Cache comes from the French word “cacher”, meaning to hide, as
a cache is transparent to, or hidden from, the user. The Functional Cache Model investigates the
effects of memory requests being submitted to a cache from a trace file. The cache obtains
the requested address based on the type of cache organization, performing cache
line replacements with a line replacement algorithm, pre-fetching cache
addresses based on spatial locality, and calculating the hit/miss ratio statistics. This model could be used for either an
instruction, or data cache; sometimes referred to as Harvard cache architecture.
Definitions
Hit Ratio: Ratio of memory accesses when data is available in
the cache
Hit Time: Cache access time plus time to determine hit/miss
Miss Ratio: Ratio of memory accesses when data is not
available in the cache
Miss Time: Time to update cache with valid data
Equation 1. Hit Ratio
Hit Ratio = Hits / (Hits + Misses)
Equation 2. Miss Ratio
Miss Ratio = Misses / (Hits +
Misses) = 1.0 - Hit Ratio
The Hit Time is less than the Miss Time in a typical system
the Hit Ratio is typically much higher than the Miss Ratio, the advantage of
using a cache memory.
If a write miss occurs, the processor can write to the cache,
and memory can be updated when the replacement algorithm executes. This is sometimes referred to as a
“write-back” cache. If a write miss
occurs, the processor can write to both the cache and memory at the same time,
sometimes termed a “write-through” cache.
In addition, there may be “write-buffers” to avoid stalling write
through operations.
Types of Caches
Direct-Mapped Cache:
Main Memory many to one mapping, based on cache lines.
Implies cache address is memory address modulo cache lines,
sometimes referred to as index bits.
Valid bit indicates whether cache entry is valid. Tag field indicates upper address bits, used
to determine if the requested address and valid will result in a cache hit.
Fully Associative
Cache: Compares all tag bits in parallel, i.e., associative memory for
address compare function. No index
bits. Implies more hardware comparators
in the cache memory organization, usually consumes more power, and increases
cost.
N-Way Set Associative
Cache: Cache index selects a “set” from the cache and performs tag
field/valid bit comparisons in parallel (N Comparators). Valid bit indicates
whether cache entry is valid. Tag field
indicates upper address bits, used to determine if the requested address and
valid bit will result in a cache hit. Additional
delay needed to multiplex associative sets in determining hit rate, including
data access. Increasing set
associativity reduces the miss rate, however, increases hit time slightly.
Algorithms
Least Recently Used
Algorithm: Cache entry that can be evicted, written back to memory,
depending on cache structure. This implies
bits to indicate age of a cache entry added to typical valid bit, tag bits, and
actual data bits.
Round Robin: This
is the simplest policy. The cache just selects the next cache line
modulo # of lines.
E.g. if the previously allocated line was line 7, the next allocated
line will be 0, then 1, then 2, etc.
Write-Back Cache:
Also called "copy back" cache, this policy is "full" write
caching of the system memory. When a write is made to system memory at a
location that is currently cached, the new data is only written to the cache,
not actually written to the system memory. Later, if another memory location
needs to use the cache line where this data is stored, it is saved ("written
back") to the system memory and then the line can be used by the new
address.
Write-Through Cache:
With this method, every time the processor writes to a cached memory location,
both the cache and the underlying memory location are updated. This is really
sort of like "half caching" of writes; the data just written is in
the cache in case it is needed to be read by the processor soon, but the write
itself isn't actually cached because we still have to initiate a memory write
operation each time.
Cache Pre-Fetch:
Cache can pre-fetch new cache lines prior to completing current cache line,
assuming a sequential execution, a reasonable assumption.
Model Overview
The Functional Cache Model utilizes a Traffic_Reader Block
to laod the Trace file into the Cache, as shown in Figure 2,
below. The Trace File is used to pre-load the cache with some random memory locations.
The Functional Cache Model utilizes the actual cache address streams from the trace.

Figure 2. Functional Cache Model
The parameters on the right of the model can be modified to
create different variations of the cache.
Additional caches can be attached to create L3 or greater cache
levels. The output of the L2 cache is
sent to the Bus and SDRAM. The output of the SDRAM is sent back to
the L2 cache which sends the data out on the hit port. If there was a miss, the data would come out
on the miss port.
The Functional Cache Model plots a real time hit ratio and
real time address map of requests being made to the cache. These two plots allow the user to “see” what
the cache is doing at all times. The
real time address map depicts cache misses as the negative values of the index
address, while cache hits are represented as positive values of the index
address. The hit ratio
plotted is as the ratio of total hits to total requests for the duration of the
model.
Cache Pre-Fetching
The Functional Cache Model performs pre-fetching of cache
data based on the incoming address and is selectable by the parameter
“Check_How_Many_PreFetch_Lines” parameter.
The model examines 0 to N cache lines from the current address to see if
they have their valid bit set true. If the
Valid bits are set true, then they are not pre-fetched.
Cache Addresses
Cache addresses are represented as integers internally to
provide for maximum performance. There
are three fields for each address, Addr_Bits, Tag_Bits, and Word_Bits. The Addr_Bits correspond to the Index bits
and are represented by 12 bits, or three HEX characters in
the original address trace. The ADdr_Bist length is varied based on the size of the cache.
The Tag_Bits are the upper sixteen bits of the complete cache
address.
Cache Least Recently Used
Cache Least Recently Used augments other cache operations,
whereby the cache looks for indexes that have aged and writes them back to SDRAM.
The model contains an integer array that is used for LRU calculations, named:
DEF
Cache_Age block {4096, 0} ;
In addition, there is a Cache Lock array that could be used to lock
certain addresses in the cache, so they could not be replaced:
DEF Cache_Lock block
{4096, 0} ;
Typical Results
The Functional Cache Model can provide real-time Hit Ratio
and Address Map outputs.
Figure 3, illustrates the hit ratio for the target application and processor trace file.
Figure 4 shows the corresponding Address Map.

Figure 3. Cache Hit
Ratio

Figure 4. Cache Address Map (+Hit, -Miss)