Graphics Processor

cycle-accurate model of GPU for architecture exploration

Shader_Core_64

Browsable image of the model.

  • To download OpenWebStart click on the links -
    Windows- Compatibility:Windows 10 or higher (*)
    macOS - Compatibility:macOS 10.15 (Catalina) or higher (*)
    Linux - Compatibility:Ubuntu 18.04 LTS or higher (*)
  • For an executable version,
  • Mouse over the icons to view parameters. Click on hierarchy and plotters to reveal content (if provided).
  • To simulate, click on Launch button, open downloaded file and click Run on the Java Security Page.
Shader_Core_64model <h2>AckDisplay</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>rowsDisplayed</td><td>10</td><td>10</td></tr><tr><td>columnsDisplayed</td><td>40</td><td>40</td></tr><tr><td>suppressBlankLines</td><td>false</td><td>false</td></tr><tr><td>title</td><td>&nbsp;</td><td>&nbsp;</td></tr><tr><td>ViewText</td><td>true</td><td>true</td></tr><tr><td>saveText</td><td>false</td><td>false</td></tr><tr><td>fileName</td><td>Enter Filename to save text</td><td>&quot;Enter Filename to save text&quot;</td></tr><tr><td>Append_Time</td><td>true</td><td>true</td></tr><tr><td>_flipPortsVertical</td><td>true</td><td>true</td></tr><tr><td>_flipPortsHorizontal</td><td>false</td><td>false</td></tr><tr><td>_rotatePorts</td><td>180</td><td>180</td></tr></table> <h2>ExpressList2</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>InitProgram_Idx is global memory\\n</td><td>InitProgram_Idx is global memory\\n</td></tr><tr><td>Expression_List</td><td>\\ninput.Program_Index  = 2\\ninput.Repetitions    = 2</td><td>\\ninput.Program_Index  = 2\\ninput.Repetitions    = 2</td></tr><tr><td>Output_Ports</td><td>output</td><td>&quot;output&quot;</td></tr><tr><td>Output_Values</td><td>input</td><td>&quot;input&quot;</td></tr><tr><td>Output_Conditions</td><td>true</td><td>&quot;true&quot;</td></tr></table> <h2>Traffic2</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Data_Structure_Name</td><td>&quot;Header&quot;</td><td>&quot;Header&quot;</td></tr><tr><td>Start_Time</td><td>1.0e-9</td><td>1.0E-9</td></tr><tr><td>Value_1</td><td>1.0e-3</td><td>1.0E-3</td></tr><tr><td>Value_2</td><td>2.0</td><td>2.0</td></tr><tr><td>Random_Seed</td><td>123457L</td><td>123457L</td></tr><tr><td>Time_Distribution</td><td>Single Event</td><td>Single Event</td></tr></table> <h2>ExpressList</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>InitProgram_Idx is global memory\\n</td><td>InitProgram_Idx is global memory\\n</td></tr><tr><td>Expression_List</td><td>\\ninput.Program_Index  = 1\\ninput.Repetitions    = 2</td><td>\\ninput.Program_Index  = 1\\ninput.Repetitions    = 2</td></tr><tr><td>Output_Ports</td><td>output</td><td>&quot;output&quot;</td></tr><tr><td>Output_Values</td><td>input</td><td>&quot;input&quot;</td></tr><tr><td>Output_Conditions</td><td>true</td><td>&quot;true&quot;</td></tr></table> <h2>Traffic</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Data_Structure_Name</td><td>&quot;Header&quot;</td><td>&quot;Header&quot;</td></tr><tr><td>Start_Time</td><td>1.0e-9</td><td>1.0E-9</td></tr><tr><td>Value_1</td><td>1.0e-3</td><td>1.0E-3</td></tr><tr><td>Value_2</td><td>2.0</td><td>2.0</td></tr><tr><td>Random_Seed</td><td>123457L</td><td>123457L</td></tr><tr><td>Time_Distribution</td><td>Single Event</td><td>Single Event</td></tr></table> <h2>SelectInst4</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Instance_Field_Name</td><td>Instance_Field_Name</td><td>&quot;PE_Idx&quot;</td></tr><tr><td>Conversion_Type</td><td>DS(0)_to_DS(n)</td><td>DS(0)_to_DS(n)</td></tr><tr><td>_flipPortsVertical</td><td>true</td><td>true</td></tr><tr><td>_flipPortsHorizontal</td><td>false</td><td>false</td></tr><tr><td>_rotatePorts</td><td>180</td><td>180</td></tr></table> <h2>SelectInst3</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Instance_Field_Name</td><td>Instance_Field_Name</td><td>&quot;PE_Idx&quot;</td></tr><tr><td>Conversion_Type</td><td>DS(n)_to_DS(0)</td><td>DS(n)_to_DS(0)</td></tr></table> <h2>SelectInst2</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Instance_Field_Name</td><td>Instance_Field_Name</td><td>&quot;PE_Idx&quot;</td></tr><tr><td>Conversion_Type</td><td>DS(n)_to_DS(0)</td><td>DS(n)_to_DS(0)</td></tr><tr><td>_flipPortsVertical</td><td>true</td><td>true</td></tr><tr><td>_flipPortsHorizontal</td><td>false</td><td>false</td></tr><tr><td>_rotatePorts</td><td>180</td><td>180</td></tr></table> <h2>SelectInst</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Instance_Field_Name</td><td>Instance_Field_Name</td><td>&quot;PE_Idx&quot;</td></tr><tr><td>Conversion_Type</td><td>DS(0)_to_DS(n)</td><td>DS(0)_to_DS(n)</td></tr></table> <h2>DI_64_Proc_Elements</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>nInstances</td><td>1</td><td>1</td></tr><tr><td>showClones</td><td>false</td><td>false</td></tr></table> <h2>Tables</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>CSV_File_Arr</td><td>{&quot;Warp_Table.csv&quot;,&quot;Program_Table.csv&quot;,&quot;Trace_Sequence_Table.csv&quot;,&quot;Memory_Address_Trace.csv&quot;}</td><td>{&quot;Warp_Table.csv&quot;, &quot;Program_Table.csv&quot;, &quot;Trace_Sequence_Table.csv&quot;, &quot;Memory_Address_Trace.csv&quot;}</td></tr><tr><td>Number_Warps</td><td>NumPEs*WarpsPerPE</td><td>512</td></tr></table> <h2>Plots</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr></table> <h2>RCU_SDRAM</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>WarpsPerPE</td><td>WarpsPerPE</td><td>8</td></tr><tr><td>NumPEs</td><td>NumPEs</td><td>64</td></tr><tr><td>SpeedMhz</td><td>SpeedMhz * 20.0</td><td>20000.0</td></tr><tr><td>simTime</td><td>simTime</td><td>1.0E-4</td></tr><tr><td>Enable_Output</td><td>true</td><td>true</td></tr></table> <h2>Warp_Sched</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>NumPEs</td><td>64</td><td>64</td></tr><tr><td>SpeedMhz</td><td>SpeedMhz</td><td>1000.0</td></tr><tr><td>WarpsPerPE</td><td>WarpsPerPE</td><td>8</td></tr><tr><td>CSV_File_Arr</td><td>CSV_File_Arr</td><td>{&quot;Warp_Table.csv&quot;, &quot;Program_Table.csv&quot;, &quot;Trace_Sequence_Table.csv&quot;}</td></tr><tr><td>BusWidthBytes</td><td>BusWidthBytes</td><td>4</td></tr><tr><td>ThreadsPerWarp</td><td>ThreadsPerWarp</td><td>32</td></tr></table>

Graphics Processor Model

Overview:

If outstanding count is not equal to zero, then Warp is in Wait state

The cycles in PE = Maximum number of instruction for all threads * 4 * Number of Threads / Lanes in PE

All PEs execute in parallel and are independent

All Memory and Shared instructions are sent from the PE to the Hopper. Sent at the end of the delay at the PE for that Warp

Shared Compute Instructions cycle= Number of SCU instructions for all Threads in Warp/ 4

Memory Instruction delay = Sent to L1/L2/Memory sequence

The address, size and type are attached to the execution Data Structure at the hopper

When all SCI and Mem Instr are done, the Outstanding Count is done and the information is sent via an event to the Group_Warp_x

Details of the specification

Table 1 is used to create the baseline Data Structure

Table 2 and 3 are used to create a executable Data Structure

Warp states- 0-Invalid or empty,1-Ready,2-Execute,3-Wait

Counter: Outstanding of memory (memory count) and share compute (SCI count) per warp. Format is Array

The data structures are placed in the respective Warps based on the number in Warp Table.- Performed in Warp_Group_x. The order is not important.

Each Warp runs a program which is made of multiple traces.

Warps are assigned to PE in equal quantity

When a Warp finishes all the instruction in a trace, a Round-robin selection is done from the next Warp to find the next “ready” warp for execution at the PE

If outstanding count is not equal to zero, then Warp is in Wait state

The cycles in PE = Maximum number of instruction for all threads * 4 * Number of Threads / Lanes in PE

All PEs execute in parallel and are independent

All Memory and Shared instructions are sent from The PE to the Hopper. Sent at the end of the delay at the PE for that Warp

Shared Compute Instructions cycle= Number of SCU instructions for all Threads in Warp/ 4

Memory Instruction delay = Sent to L1/L2/Memory sequence

The address, size and type are attached to the execution Data Structure at the hopper

When all SCI and Mem Instr are done, the Outstanding Count is done and the information is sent via an event to the Group_Warp_x