MDI_Logo
      About Us   |   Contact Us   |   Evaluation
MDI_Topstrip
Mirabilis Design Navigation
MDI - Demonstrations MDI - Products MDI - Modeling MDI - Presentations MDI - Evaluation
Home | Demonstrations | Computing | Dual Processor Server

      Dual Processor Server

Dual Processor Server Optimization

Click here to view the interactive VisualSim Block Diagram and model
Figure 1 VisualSim Simulation Model of the Dual Processor Server

Click here to execute the VisualSim model and view analysis results
Figure 2 Analysis using VisualSim Dual Processor System

Introduction
Company X is developing a state-of-the-art dual processor server appliance to support a variety of applications in networking and security. The company has limited information on the software application but has knowledge of the arrivals rate of data and instructions to be processed. The project will deliver the lowest cost server appliance for application rates of up to 40 Gbps.

Purpose of Analysis
A proposed system has been developed by the systems engineers. This system must be analyzed for different application rates and break-off points between a single or dual CPU option. Being a cost-sensitive product, it is important to design the components for the worst-case but not to over-design the product.

Evaluation Criteria
There are two unknown criteria for that must be evaluated for:

  • Rate of arrival of instructions from the application
  • Data requirements for each instruction to be executed
This particular modeling effort looks at the impact of these two constraints against the number of CPUs and bus speed.
Go To Top

System Block Diagram
The architect has developed a basic block diagram with the physical elements and knowledge of the various instructions and request flows through the system. The block diagram of the system is shown in Figure 3.

System Block Diagram
Figure 3 Block Diagram of Dual Processor Server

The architecture of this server blade consists of a shared bus with two CPUs, cache and SDRAM. There are four communication scenarios for the applications through the system.

  • CPU = (CPU_1, CPU_2)
  • Hit = (CPU_1, CPU_2) -> BUS -> Cache
  • Miss = (CPU_1, CPU_2) -> BUS -> Cache -> Bus -> RAM
The starting assumptions for the distribution of data requirements for various instructions and the processing time on each architectural component are shown in Figure 4. The CPU processes 60% of all the instructions without requesting for external data and 40% requests for data from the Cache. Of this 40%, 36% has a match (Hit) in the cache and responds to the CPU with the data while 4% has to take the additional step to the external SDRAM to get the data.

Starting Assumptions
Figure 4 Starting Assumptions
Go To Top

Model in VisualSim
The block diagram and starting assumptions are modeled and simulated in VisualSim. The VisualSim model and the analysis windows are shown in Figure 1 ( Top)


Methodology:
The VisualSim modeling methodology separates the workload, physical topology and the communications of the instruction. The VisualSim model consists of the following parts:

  • Topology: The top left part of the diagram describes the physical elements that form the block diagram. The queue occupancy, latency, context switching, scheduling and preemption are determined at the physical elements.
  • Communication: The top right part of the diagram describes the flow of the instructions and requests through the system. It starts at the IN3 and IN4 elements.
  • Application: The bottom part is the abstraction of the application using a stochastic process, data structure as a traveling token and a multiplexer to determine the CPU that the instructions need to be delivered too.
  • Based on the incoming data structure, the CPU determines if a data request needs to be sent to the Cache or the instruction can be processed internally.

Notes on the Model:

  • The Bus and CPUs are described using the hardware schedulers while the Cache and RAM use the software scheduler.
  • The Bus, Cache and SDRAM allow for preemption while the CPUs are scheduled as first-come, first-serve without preemption.
  • The flow in the model starts at the workload generator at the bottom of Figure 1.
  • Based on the decision at Decision_Point, the instructions are executed in CPU1 or 2.
  • The data requirements for the instruction execution are randomly generated and are included as fields in the data structure. Based on the data requirements, the request can be processed entirely within the CPU or routed through to the hit or miss paths on the right.
  • The blocks in the communication flow on the right communication in a connectionless mode with the physical elements on the left to execute the incoming instructions.
  • Statistics generators collect analysis data from the physical elements and display the results.
Go To Top

Scenarios simulated
All of the blocks in the model are parameterized for exploring different scenarios. The model also has some global parameters that can change the operation of the simulation. For the purpose of this analysis, four scenarios have been created. In Figure 2 (Top), experiment with the following:


Click on Go and execute simulation with the preset parameters.

  • Start with the Base configuration of parameters provided. The top results window shows the utilization of the various devices such as CPU, Cache, Memory and Bus
  • Modify the Task_Rate to Task_Time*0.2. Click "GO". The simulation executes about 20% of the way and then generates an exception. These messages are natural and form an important configuration. This message indicates that the buffer to the CPU1 has overflowed and is dropping transaction. This forms the upper boundary of data that can be transmitted through the system
  • Next modify Task_Rate back to Task_Time*1.0 and modify Number_of_CPUs to “1”. Click "GO" The utilization of the CPU2 will be significantly higher than in Case 1. The utilization of CPU1 will be zero indicating that this CPU has not been turned on.supported by this transaction.
  • Finally keeping the Number_of_CPUs at 1, modify Task_Rate to Task_Time*0.43. Click "GO". This will come up with a similar error message as the Case 2 indicating that this is the upper limit of transaction rate that can be done.
Go To Top

Results
The Analysis window in Figure consists of three sections:

  • The left side has all the run controls associated with the simulation. The performance engineer has provided these as the simulation attributes that can be modified.
  • The top display window displays the value of the data structure as it completes the processing at the CPU, Cache, Bus and SDRAM. Once the simulation has completed, the aggregated statistics is presented for each physical element- CPU, Bus, Cache and SDRAM.
  • The second graph is a timeline that displays the amount of cycle consumed for each arriving instruction at the different physical element. The graph is captured on a timeline.
As the various run control parameters are modified, you will notice the utilization of the CPU changes dramatically. The change in utilization of the Cache, SDRAM and Bus has not changed considerably during this period. This indicates that the Cache, SDRAM and Bus have been over-designed and can be optimized further. Also, the bottleneck does not occur as originally conceived at the Bus but at the queuing stage feeding the two CPUs.

Summary
This experiment can be used to optimize the design for performance and cost. As the model details are increased other explorations can also be performed. These include optimizing the micro-code execution order, power estimation and functional discrepancies. This model was created in about 2 hours and about 2 days was spent on analysis and further refinement. This documentation took ½ day to be completed.

Go To Top
 
  Copyright 2008© Mirabilis Design Inc. All Rights Reserved. Best Viewed in 800x600 resolution. | Site Map | Technical Support