Dual Processor Design

Parent Previous Next

Learning Objectives

Use the framework available in VisualSim to create a multi-core processor design for the following objectives:

Design Methodology

The block diagram of this system is shown in Figure 1. The system has two identical cores on a shared bus with a cache and SRAM.


Dual Processor Design Block Diagram

Figure 1: Dual Processor Design Block Diagram


The system supports the following possible tasks:


The latency of tasks depends on the accessibility of data for processing of transactions. The data may be available at the CPU or cache or RAM.


In this modeling exercise we assume, CPU has the data 60% of the time and remaining 40% of the time it must access from external Cache.

VisualSim Model

The key concept in modeling is to model all the elements that are relevant to the analysis and abstract out the rest. The five architectural elements (Bus, Cache, SDRAM and CPUs) and the instruction set are relevant. Items such as the RTOS are not included in this study as they are common overhead for all the operations.

To translate the above requirements into a model, consider the model to have five basic elements - architecture, behavior, workload, model parameters, and analysis. All of the requirements above must fit into one of these structures.

Refer Fig 2 for the proposed VisualSim model.

Dual Processor Design VisualSim Model

Figure 2: Dual Processor Design VisualSim Model


VisualSim Model can be found in the following location

$VS/doc/Training_Material/Tutorial/WebHelp/Tutorial/Performance_Modeling/Dual_Processor_System.xml

Description of the VisualSim Model

A brief description of the VisualSim Model is listed below:


Building of the VisualSim Model

Assumptions


The procedures outlined below makes the following assumptions:


Blocks Used


The following table list the blocks to be used in this model.

S.No.
Library Block
Description
1
Digital Simulator

Digital Block


The Digital Simulator implements the discrete-event Model of Computation (MoC). The simulator maintains a notion of current time, and processes events chronologically in this time. Used to model elements that change with time such as hardware, software and networks.




2
Traffic

Traffic block
Outputs a new Data Structure (DS) at time intervals specified by the "Time_Distribution" setting. A Data Structure is also knowns as a transaction and contains a list of Field Names + Values.


3
ExpressionList

ExpressionList Block
Executes a sequence of expressions in order.

The default block contains one input and one output. The user can add multiple input and output ports.


4
Mapper

DPD Mapper Block

Works with the separation of the behavior and architecture methodology. In this methodology, the mapper block is placed in behavior flow at every location where timed resources are required.



5
Text_Display

Text Display Block

Displays the values arriving on the input port in a text display dialog.


6
TimeDataPlotter

TimeDataPlotter Block

Plots the incoming data on the Y-Axis against the current simulation time on the X-axis. Every wire connected to this block input is considered a separate dataset and plotted separately.



7
SystemResource_Extended

DPD SystemResourceExtended Block

Forms the architecture part of the behavior and architecture separation methodology.



8
BusArbiter

DPD BusArbiter Block

The Bus Arbiter block is the Arbiter for a Bus Interface.



9
BusInterface

DPD BusInterface Block
Connects devices to the BusArbiter and has a queue for each port.



10
RAM

DPD RAM Block

Combines the operation of a basic memory controller (delay function) and the memory array. Handles pre-fetch, read, write, refresh, and controller operations.


11
ArchitectureSetup

DPD ArchitectureSetup

Handles all the address mapping, routing, plotting, statistics, and debugging for the hardware modeling components.



12
Cache

DPD Cache
Emulates a cache in an architectural model. There are interfaces on both side of the block for connectivity.



13
ResourceStatistics

DPD ResourceStatistics Block
Outputs or resets the statistics for all the SystemResource, Channel, Channel_N, Server, and Queues in the model.




Initial Setup


In the initial setup, you set the model parameters and instantiate a Digital Simulator. Model parameters are used to control the system configurations.



Parameter CPU_Time is the task processing time excluding memory reference time. The parameter Number_of_CPUs allows the designer to conduct trade-off between single core and multi-core architectures. Task_Rate, controls the rate at which task will be generated for a application.
 

DPD Digital Simulator
Figure 3: Digital Simulator



The rest of the parameters remain as default settings.

DPD Digital Simulator Parameters
Figure 4: Digital Simulator Parameters

Architectural Elements

In this stage, you configure blocks to represent the architectural elements - Bus, Cache, SRAM, and CPUs.



The reason for selecting a statistical CPU block is to gather more information about the processing platform and the performance at system level. Once the architecture is selected, a detailed processor block can be used to replace a statistical processor block.



The rest of the parameters remain as default settings.



DPD CPU1 Parameters
Figure 5: CPU1 Parameters

DPD CPU2 Parameters
Figure 6: CPU2 Parameters


The rest of the parameters retain the default settings.
DPD BusArbiter Parameters
Figure 7: BusArbiter Parameters


The BusInterface receives/sends data traffic from/to master/slave device based on the arbitration algorithm defined in Bus Arbiter.


The rest of the parameters retain the default settings. The first BusInterface retains all the default parameter settings.

DPD BusInterface 2 Parameters
Figure 8: BusInterface 2 Parameters


DPD Cache Parameters
Figure 9: Cache Parameters

DPD RAM Parameters
Figure 10: RAM Parameters


DPD Device Interface Parameters
Figure 11: DeviceInterface Parameters

Please note that for the second DeviceInterface block IO_Name must be set to CPU2.
 
DeviceInterface blocks helps in communicating between Memory Subsystem and CPU’s.

DPD Model Current Stage
Figure 12: Model A

Note that the hardware architecture definitions are not complete yet as we need more information on External Cache access. This information is explained later in this tutorial.

Transaction Generation


To generate transactions, use the Traffic block. In addition, you use a Mapper block to send the transactions to the relevant CPU.


The rest of the parameters remain as default settings.

DPD Traffic Parameters
Figure 13: Traffic Parameters



/*select the processor1 or processor2 */
input.Select_Processor = irand (0, 1)
input.Task_Destination = (input.Select_Processor == 0 && 2 == Number_of_CPUs)?"CPU1":"CPU2"
input.Task_Number = (input.Task_Destination=="CPU1")?1:2

/* Data Size */
input.A_Bytes = Data_Size

/* command */
input.A_Command = "Read"

The rest of the parameters remain as default settings.

If you notice, we have defined a simple logic to decide task execution on either CPU1 or CPU2. We also defined a simple logic here to identify if CPU needs to access data from external Cache or data is available in Registers.



/*Select the type of transaction */
input.Get_Data = rand(0.0, 100.0)
input.Need_Data = (input.Get_Data < 60.0)?"Reg":"Cache_Hit" /*  "Reg" means Data available in Register and "Cache_Hit" means get data from cache */

DPD Decision 1 Parameters
Figure 14: Decision 1 Parameters



Task_Plot_ID retains the default value.

DPD Mapper Parameters
Figure 15: Mapper Parameters


DPD Model Next Stage
Figure 16: Model B


Now the model is functionally complete. To understand the system performance and resource statistics we must insert blocks to capture the reports. The next section talks about generating Statistics and Reports.


Resource Statistics and Reports

Multiple blocks are available to generate statistics and report. In this tutorial, we generate processing latency for the tasks, Resource Utilizations (Bus, Cache, DRAM and CPU’s) and activity profile for CPU’s.

Processing latency plot will help the designer to understand the total time that the target platform takes to process a task which includes memory reference time as well. Resource Utilization stats provides details on platform utilization for the current set of applications and helps to make decisions for future task processing requirements as well.

You may use the blocks detailed in the following steps to generate the reports mentioned above.


DPD Customize Ports
Figure 17: Customize Ports


DPD ExpressionList 2 Parameters
Figure 18: ExpressionList 2 Parameters



The Block_Name and Statistics parameters retain the default settings.

DPD Resource Statistics Parameters
Figure 19: Resource Statistics Parameters


The model must look as below.
DPD Model C
Figure 20: Final Model

Statistics and Reports

Statistics and Reports must be analyzed to understand if the system is meeting the requirement s. If we look at the latency plot below; the plot tells that external memory access is very limited as majority of the times requested data is available in external cache. This tells us that the designer must select a shared L2 Cache with a min Hit rate of 95%.

The following images depict the results of the analysis.


DPD Transactions Out
Figure 21: Transactions Out

DPD Collective Statistics
Figure 22: Collective Statistics

DPD Processing Latency
Figure 23: Processing Latency

By default Timing Diagram plot that shows the timeline activity for the CPU’s does not provide names for the timing diagram sequence as shown in figure below. To get the names associated with each plot select Edit > Format and define as below for Y-ticks.


DPD Set Plot Format
Figure 24: Set Plot Format

DPD TimeLine Usage
Figure 25: TimeLine Usage

Further Analysis

Perform the following modifications to conduct further analysis.