VisualSim
provides a graphical environment to explore multi-processor architectures
and optimize the architecture to meet the requirement of the target applications prior to implementation.
No software code or RTL (Verilog/VHDL) is required to conduct this analysis.
The exploration can target the cache hierarchy or arrangement, pipeline details,
selection of peripherals, partition the target application based on
profile and performance metrics. Experiments by customers have shown accuracy
of over 90% with this VisualSim approach. Moreover, this analysis is done prior
to any implementation. Hence any rework cost is almost fully eliminated. This
example shows the use of performance modeling with VisualSim for evaluating
the communication between the processors cores for multi-threaded applications.
The
VisualSim model associated with this description is provided below.
You can view, change parameter values and run simulation right from
within the Web Browser. No additional software is required. This
shows how you can use a pre-built VisualSim model for doing trade-studies.
To use the models at the links,
click on the GO button to run the simulation. Double-click on any parameter in
the model window to change the parameter value.
Click
here to execute the VisualSim model
and view analysis results
Introduction
The
Processor Array Model investigates
the effects of generating fixed
tasks, sending them via a serial
virtual connection to nine processors,
processing the task and sending
to a task collector block.
The model compares the actual number
of tasks completed versus Amdahl’s
Speedup Law. Amdahl’s
Speedup Law compares the speed of
a serial plus parallel time (single
process) to the speed of serial
plus parallel time divided by the
number of processors (parallel processes).
Amdahl’s Speedup Law is a
figure of merit.
Figure
1. Processor Array Model.
Amdahl’s
Law
Model
Overview
The
Processor Array Model has two top
level parameters:
Tparallel
Tserial
These parameters are used to calculate
the equation values, and in the Processor
Array Model. In the Processor
Array Model they determine:
Tparallel:
Processor execution time, in Processor
Array.
Tserial: Task distribution time,
shown in Task Generator.
The Task Generator on the left creates
Transactions that are sent to N Processors
in the Array, through a common virtual
bus. Each Array Processor then executes
a Transaction and sends to Task Collector
on the right hand side. The Task Collector,
collects statistics on the incoming
Transactions, and sends to Plotting
Block.
Model
Results
The
Processor Array Model results compare
a figure of merit ratio to actual
tasks in the model. At small
or large serial values, the model
and equation compare favorably.
At serial values equal to the 1/N
parallel values, the model shows
twice as many tasks completing relative
to the equation. This effect
seems related to the pipelining
of tasks generated, and processors
not sitting idle, since the serial
time is proportional to the parallel
processor execution time.
In Figure 2(model link on top),
one will observe that the Processor
Array Model completes approximately
20% more tasks than predicted by
Amdahl’s Law. Each X
axis point corresponds to the number
of processors executing, where:
10 corresponds to processor 1
90 corresponds to processor 9
Figure
3. Relative Processing with
Tserial Time = 1/N Tparallel
In Figure 3, one will observe that
the Processor Array Model completes
approximately 2X more tasks than predicted
by Amdahl’s Law. Each
X axis point corresponds to the number
of processors executing, where:
10 corresponds to processor 1
90 corresponds to processor 9
Figure
4. Relative Processing with
Large Tserial Time = 5/N Tparallel
In Figure 4, one will observe that
the Processor Array Model completes
approximately 20% more tasks than
predicted by Amdahl’s Law.
Each X axis point corresponds to the
number of processors executing, where:
10 corresponds to processor 1
90 corresponds to processor 9
Other
Model Variations
•
One can modify the Processor Array
Model by replacing the nine processors
with a Resource_N blockset that executes
using the First-Come-First-Serve mode
of operation. The results will
be identical with fewer blocks.
• One could
restrict the processor execution to
be clock-synchronized, requiring more
uCode lines to align with a common
clock boundary. This will change
the results.