Benefits

  • Enables early GPU architecture decisions without hardware
  • Reduces risk in selecting the right GPU for performance, power, and cost
  • Identifies bottlenecks in compute, memory, and interconnects
  • Supports realistic workload modeling using task graphs
  • Scales seamlessly from embedded to AI data center designs
  • Accelerates time-to-market by avoiding late-stage redesigns

The GPU library in VisualSim Architect enables system-level modeling and exploration of modern graphics and compute accelerators, ranging from edge-class GPUs to large-scale data center and AI GPUs. The library supports commercial vendor GPUs—including NVIDIA Pascal, Maxwell, Drive-PX, Ampere, and Blackwell, AMD A100-class devices, and ARM Mali—as well as a GPU Builder for proprietary, in-house architectures.

These models allow architects to study performance, power, memory behavior, and interconnect scaling long before hardware is selected or deployed.

Overview

The VisualSim GPU library provides detailed architectural representations of GPU subsystems and execution behavior, including:

  • Streaming Multiprocessors (SMs) / Compute Units
  • Warp- and wavefront-based execution models
  • Tensor units, vector units, and general-purpose pipelines
  • Multi-level cache hierarchies and shared memory
  • Configurable memory interfaces and coherency mechanisms
  • Dynamic instantiation of hundreds to thousands of GPU cores
  • Support for graphics, AI, and general-purpose compute execution modes

The GPU models are fully integrated with VisualSim task graphs, schedulers, memory, and interconnect libraries, enabling end-to-end system exploration.

Supported Architectures and Platforms

This System Modeling Component Library supports:

  • NVIDIA GPUs: Pascal, Volta, Turing, Ampere, Hopper, Maxwell, Jetson, Blackwell
  • AMD GPUs: A100-class and data center GPUs
  • ARM Mali GPUs for mobile and embedded platforms
  • Custom / Proprietary GPUs using the GPU Builder
  • Hybrid CPU–GPU systems with cache coherency and shared memory

The models scale from single-GPU embedded designs to multi-GPU clusters interconnected through high-speed fabrics.

Key Parameters

Typical configurable parameters include (non-exhaustive):

  • Number of SMs / compute units
  • Warp / wavefront size
  • Tensor unit configuration
  • Cache sizes and policies
  • Memory bandwidth and latency
  • Pipeline depth and execution width
  • Interconnect bandwidth and topology

Applications

  • GPU selection and sizing for edge, workstation, and data center systems
  • AI training and inference architecture exploration
  • Image and video processing pipeline design
  • Heterogeneous CPU–GPU system analysis
  • Multi-GPU scaling and workload partitioning
  • Data movement and interconnect evaluation
  • Power, cooling, and cost trade-off studies

Interconnect and System Integration

The GPU library integrates with system-level interconnects to evaluate:

  • PCIe (Gen1–Gen6 and beyond)
  • CXL-based memory and device sharing
  • NVLink and NVSwitch fabrics
  • 800 Gb Ethernet for disaggregated and data center systems

This enables analysis of GPU-to-GPU, GPU-to-CPU, and GPU-to-memory performance at scale.

Schedule a consultation with our experts

    Subscribe