ARM AMBA Corelink

CoreLink CMN-600 Coherent Mesh Network has been designed for extremely high data transfer

Quick Explanation

  • Support Interconnect as a series of AHB/AXI buses and a Network-on-Chip
  • Support for RN-F, RN-I, HN-F, HN-I and SH-F
  • Support two devices per node and 4 network interfaces for CMN600
  • Support bridges and DMA interfaces
  • Supports up to 8 memory controllers
  • Supports cache coherency using the Snoop messages for shared memory addresses

Protocol

  • ARM CMN-600
  • Corelink NiC400
  • Corelink NiC550

VisualSIm ARM AMBACoreLink CMN-600 Coherent Mesh Network

The Arm CoreLink CMN-600 Coherent Mesh Network is designed for intelligent connected systems across a wide range of applications including networking infrastructure, storage, server, HPC, automotive, and industrial solutions. The highly scalable mesh is optimized for Armv8-A processors and can be customized across a wide range of performance points.

The CMN600 model uses the baseline NoC library shipped with VisualSim Architect. This library models the routing, unique crossbar for each type of traffic- Request, Response, Data, and Snoop. There is a virtual channel support on the Egress port. All the standard Request Nodes, Slave Nodes, Bridges and Home Nodes are supported. The Snooping protocol is implemented and the XP can connect to AXI buses, Memory controllers and other Slaves.

Overview

The base model demonstrates the implementation of a NoCon a semiconductor.  Traffic originates at a NoC_TG block and traverses the network to arrive at a destination.  The Source and Destination are set at the TG block. The link configuration is set in a Lookup Table. The Routers are connected to each other in a mesh pattern.

Router: There is one Device connected to each Node of the Router.  The routing from a Device to a Destination is determined by the path provided in the Configuration Table.  There will NO be duplicate paths for a Source Destination.  The path defined in the Configuration Table must exist.

Router Traffic Management: There are Virtual Links that can be defined for flow control within the Router.  These enable the Router to control the bandwidth provided to each stream of Source and Destination combination.

Traffic: The TG block generates the request or data at a certain rate and to a fixed destination.  The packet is divided into flits at the TG and sent to the connected Router.  The Router selects the direction to send the flit out based on the Configuration table. At the destination, the Destination device discards all the flits except the last one.  The last flit is the signal to indicate that all flits have arrived.  This is used to send out the assembled packet on the output port.

Statistics: VisualSim captures the latency of the requests and buffer occupancy at each port of each Router.

Router is a called XP or CrossPoint.  The Router support two devices per node and four network connections. The flows are based on message type and Quality of Service.

  • Support for Retry- Need to understand what this is.
  • Flow control at the Router
  • Support for Request Node, Home Node and Slave Node
  • Requesting Node: There are two types of Requesting Nodes- RN-F that are full-fledged Processors with caches and RN-I that are Masters to connect with I/O. The first is connected through a RN-F controller directly to the mesh.  The controller handles the QPV.  The second is not directly connected to the mesh.  Rather it is connected vi a ACE-Lite and a RN-I bridge to the Mesh.  The QPV is added between the ACE and the bridge.
  • Home Nodes: There are two types (primarily) – HN-F and HN-I. All memory requests from RN-F are sent to the HN-F. There can be multiple HN-F but each RN will be assigned a single HN-F.  All nodes around a HN-F are considered a single cluster. The HN-I sends the request out to the ACE
  • Master port. There can be multiple ports in the ACE setup but the HN-I will see only one.  The snoop filter reduces snoop traffic in the system by tracking cache lines that are present in the RN-Fs, generally favoring directed snoops over snoop broadcasts when possible. This substantially reduces the snoop response traffic that might otherwise be required.

The following are the list of Nodes implemented:

  • RN-I Request Node I/O A non-caching request node, bridging I/O master requests from 1-3 AXI or ACE-Lite interfaces
  • RN-D DVM Request Node An RN-I node that can accept DVM messages on the snoop Channel. (Not implementing)
  • RNF_CHIA . Request Node Full with built-in SAM. CHI Issue A compliant. CHI Issue A compliant processor, cluster, GPU or other request node with a coherent cache and a built-in SAM
  • RNF_CHIA_ ESAM Request Node Full without a built-in SAM. CHI Issue A compliant. CHI Issue A compliant processor, cluster, GPU or other request node with a coherent cache but without a built-in SAM
  • RNF_CHIB Request Node Full with built-in SAM. CHI Issue B compliant. CHI Issue B compliant processor, cluster, GPU or other request node with a coherent cache and a built-in SAM
  • RNF_CHIB_ ESAM Request Node Full without a built-in SAM. CHI Issue B compliant. CHI Issue B compliant processor, cluster, GPU or other request node with a coherent cache but without a built-in SAM
  • HN-F Home Node Full A fully coherent home node, typically configured with SLC and/or SF
  • HN-I Home Node I/O A non-coherent home node, bridging I/O slave requests to an ACELite interface
  • HN-T Home Node I/O with debug trace control An HN-I with a built-in debug trace controller
  • HN-D DVM Home Node An HN-I with a built-in debug trace controller, DVM Node (DN), Configuration Node (CFG), Global Configuration Slave, and the Power/Clock Control Block (PCCB)
  • SN-F Slave Node A memory controller consisting of a native CHI Issue B SN interface
  • SBSX CHI to AXI bridge A CHI to AXI bridge that allows an AXI memory controller to be connected to CMN-600
  • CXG CHI to CXS (CCIX port) bridge A CHI to CXS (CCIX port) bridge that enables CML

The following are the list of transactions:

ReadNoSnp, WriteNoSnp, ReadShared, ReadClean, ReadNotSharedDirty, and CleanUnique

The RN-F generates ReadNoSnp and WriteNoSnp Exclusives for memory locations that are marked Non-cacheable or Device.

ReadShared, ReadClean, and CleanUnique exclusives are used for shareable and coherent memory locations.

Broadcasts: Processor to all Processor. When there is a flush at the Home Nodes

Channels

There are 4 channels. The only place where there are delays is in the Crossbar.  The Router checks the message type and select the right channel.  There are multiple types of message and the message meaning will change depending on whether it comes from Requesting node, home node, responding node or a slave node.

  • Request- 34, 44, 48 bits
  • Response- 32 bits
  • Data- 256 bits
  • Snoop- 32 bits

Quality of Service

CMN-600 includes end-to-end QoS capabilities which support latency and bandwidth requirements for different types of devices. The QoS device classes are:

Devices with bounded latency requirements: These are primarily real-time or isochronous devices that require some or all of theirtransactions to be complete within a specific time period to meet overall system requirements.These devices are typically highly latency-tolerant within the bounds of their maximum latencyrequirement. Examples of this class of device include networking I/O devices and displaydevices.

Latency-sensitive devices: These are devices whose performance is highly impacted by the response latency incurred by their transactions. Processors are traditionally highly latency-sensitive devices, although a processor can also be a bandwidth-sensitive device depending on its workload.

Bandwidth-sensitive devices: These are devices that have a minimum bandwidth requirement to meet system requirements. An example of this class of device is a video codec engine, which requires a minimum bandwidth to sustain real-time video encode and decode throughput.

Bandwidth-hungry devices: These are devices that have significant bandwidth requirements and can use as much system bandwidth as is made available, to the limits of the system. These devices determine the overall scalability limits of a system, with the devices and system scaling until all available bandwidth is consumed.

QoS regulator operation

The values of the base QPV, AxQOS for ACE-Lite/AXI interfaces or RXREQFLIT.QOS for AMBA 5 CHI ports are inputs to the QoS sub-block.

When latency regulation or period regulation is enabled, these values are replaced by the valuegenerated by the regulators. For an RN-F, a single QoS regulator monitors CHI transactions that return data to the RN-F such as reads, atomics and snoop stash responses. The regulated QPV is applied to all CHI requests from that RN-F. For an RN-I or RN-D, separate QoS regulators exist for AR and AW channels.

The QoS regulators can operate in either latency regulation mode or period regulation mode. The registers to configure the QoS regulators exist in each RN-I, RN-D, and XP.

Latency regulation mode

When configured for latency regulation, the QoS regulator increases the QPV whenever actual transaction latency is higher than the target, and decreases the QPV when it is lower:

  • For every cycle that the latency of a transaction is more than the target latency, the QPV is increased by a fractional amount, determined by the scale factor Ki.
  • For every cycle that the latency of a transaction is less than the target latency, the QPV is decreased by the same fractional amount, determined by the scale factor Ki.

The QoS Latency Target register specifies the target transaction latency in cycles.TheQoSLatency Scale register specifies the scale factor Ki. It is coded in powers of two, so that a programmed value of 0x0 = 2-3 and a programmed value of 0x7 = 2-10.

The QoS regulator can be programmed to operate in latency regulation mode by programming the following bits in the QoS Control register:

  • Set the qos_override_en bit to 1.
  • Set the lat_en bit to 1.
  • Set the reg_mode bit to 0.
  • Set the pqv_mode bit to 0.

Period regulation mode for bandwidth regulation

When configured for period regulation, the QoS regulator increases the QPV whenever the period between transactions is larger than the target, and decreases the QPV when it is lower:

  • For every cycle that the period between transactions (as measured at dispatch time) is more than the target period, the QPV is increased by a fractional amount, the scale factor Ki.
  • For every cycle that the period between transactions is less than the target period, the QPV is decreased by the same fractional amount, the scale factor Ki.

The QoS Latency Target register specifies the target period in cycles.TheQoS Latency Scale register specifies the scale factor Ki. It is coded in powers of two, so that a programmed value of0x0 = 2-3 and a programmed value of 0x7 = 2-10.The QoS regulator can be programmed tooperate in period regulation mode by programming the following bits in the QoS Control register:

  • Set the qos_override_en bit to 1.
  • Set the lat_en bit to 1.
  • Set the reg_mode bit to 1.

There are two modes of period regulation:

  • In normal mode, the QPV neither increases nor decreases when there are zero outstanding transactions.
  • In quiesce high mode, the QPV increases by a fractional amount, determined by the scale factor Ki, in every cycle where there are zero outstanding transactions.

The mode of period regulation can be selected by programming the pqv_mode bit in the QoSControl register.

RN-I/RN-D bridge QoS support

In addition to the QoS regulators, the RN-I/RN-D bridge provides QoS-aware arbitration mechanisms. To simplify arbitration logic, all transactions are split into two QoS Priority Classes (QPCs), high and low. QoS-15 transactions make up the high class. All other transactions are considered to be in the low class.

Port multiplexer arbitration

An RN-I/RN-D bridge includes three ACE-Lite/ACE-Lite+DVM ports. The RN-I/RN-D bridgeselects between these ports for allocation into its transaction tracker, which makes the allocated transaction a candidate for issuing to a home node. The port multiplexer is arbitrated using the following strategy:

  • High QPC first, then the low QPC.
  • Round-robin arbitration among the AMBA ports within a QPC.

Tracker allocation

When transactions are allocated into the tracker, they are scheduled for issuance to a home node based on QPC, following the same strategy as port mux arbitration.

  • High QPC first, then the low QPC.
  • Round-robin arbitration in a QPC among the issuable transactions.

HN-F QoS support

The HN-F is a key shared system resource used for system caching and for communication with the memory controller for external memory access.

The HN-F includes the following QoS support mechanisms:

QoS decoding in HN-F

The HN-F interprets the 4-bit QPV at a coarser granularity, as the following table shows.

Corelink- Implements Network-on-Chip model