Benefits

  • Enables early NPU architecture decisions without RTL
  • Guides cache, SRAM, and DRAM sizing decisions
  • Identifies optimal memory placement for AI workloads
  • Helps partition AI code across NPU cores efficiently
  • Evaluates power savings from dynamic core shut-down
  • Reduces risk in performance, power, and cost trade-offs
  • Accelerates time-to-market for AI-enabled systems

The NPU library in VisualSim Architect enables system-level modeling and exploration of AI and machine-learning accelerators used across edge devices, embedded platforms, automotive systems, mobile SoCs, and data centers. VisualSim supports both commercial NPUs and a powerful NPU Builder that allows architects to construct proprietary, in-house NPU designs.

These models are used to size internal resources, optimize memory hierarchies, define interconnect topologies, and guide software partitioning decisions—well before RTL or silicon exists.

Overview

The VisualSim NPU library captures the architectural and execution behavior of modern neural accelerators, including:

  • Compute cores and vector / matrix engines
  • Local scratchpads, SRAM clusters, and shared memories
  • Instruction pipelines and execution latency
  • Internal interconnects and data movement fabrics
  • Address maps and memory placement strategies
  • Power-aware execution and core shut-down behavior

The NPU models integrate seamlessly with VisualSim’s task graphs, memory systems, interconnects, and power modeling, enabling full end-to-end AI system exploration.

Supported Standards

VisualSim supports NPUs built from:

  • AMD AI accelerators and proprietary NPUs
  • Xilinx / AMD adaptive compute platforms
  • Qualcomm AI engines for mobile and automotive
  • Samsung NPUs for SoC and edge devices
  • Custom and in-house NPUs using the NPU Builder

These models scale from single-NPU embedded systems to multi-NPU clusters connected to shared memory and external fabrics.

Key Features

  • NPU Builder for proprietary and next-generation designs
  • Configurable NPU cores, pipelines, and instruction latency
  • Flexible internal SRAM and scratchpad sizing
  • Configurable address maps and memory placement
  • Evaluation of local vs cluster-level vs external memory
  • Power-aware modeling with selective core shut-down
  • Integration with AI task graphs and software workloads
  • Seamless scaling from edge to data center systems

Applications

  • NPU architecture sizing for edge AI and embedded systems
  • AI inference and real-time analytics platforms
  • Automotive ADAS and autonomous systems
  • Mobile and low-power AI SoCs
  • AI accelerators for industrial and defense applications
  • Early evaluation of proprietary NPU designs

Integrations

NPU systems integrate with VisualSim interconnect libraries to evaluate:

  • Internal NPU fabrics
  • NPU-to-memory interfaces
  • NPU-to-CPU and NPU-to-GPU connections
  • PCIe, CXL, Ethernet, and custom fabrics

Schedule a consultation with our experts

    Subscribe