OpenVPX

Board and System Interfaces

PCI/PCIx
Bus to connect hardware discrete components on a board. Used to evaluate the latency impact
Enhanced PCI BAE Systems
PCIe
High speed serial bus for fast access to memory and storage. Also works with NVMe
RapidIO
High-performance packet-switched, interconnect technology. Supports Chip-Chip, Board-Board (Backplane), Chassis-Chassis integration
IDT Vendor Product
IDT implementation for wireless base station infrastructure, high performance computing, data centers, server, video, imaging, military and embedded computing
FibreChannel
High-speed network technology (from 1 to 128 Gbps) providing in-order, lossless delivery of raw block data, from computer data storage to servers
NVMe
Host controller interface to accelerate data transfer between enterprise and solid-state drives across PCIe
Firewire
Serial bus architecture for high-speed data transfer
VME
High speed and high performance bus system with powerful interrupt management and multiprocessor capability
VPX
Switched fabric for cost-effective 10Gb Ethernet and PCI-Express networked systems
Spacewire
SpaceWire utilizes asynchronous communication and allows speeds between 2 Mbit/s and 400 Mbit/s
OpenVPX
OpenVPX replaces VME and VXS standards in defense designs

Systems that support complex avionics and spacecraft processing requiring significant processing, storage and communication bandwidth. With the advent of the Integrated Modular Architecture (IMA), the sharing of resources for multiple concurrent tasks, introduction of multi-core processors and an OpenVPX based system that are interconnected by high-speed serial interfaces makes the computation of latency a challenge.

There are multiple ways to evaluate the performance impact of target hardware architecture. There are analytical, physical tests and discrete simulation. The analytical method can provide a view of the worst case execution time. The problem is that this WCET is not bounded by any probability and 90% of the range will never be achieved in real-life. Using a real test, the system can be tested for a small finite use cases and workloads. Discrete-event simulation can create a realistic model of the end-to-end system including the workload, use cases, Real-Time Operating System (RTOS), processing sub-system, storage, communication, FPGA clusters, and the interconnect network. The user can run 100’s of scenarios to quickly identify the bottlenecks and the best performing areas.

For this paper, we have constructed a fairly common system design. Take a look at the system in Figure 1. The system consists of a FPGA cluster, GPU cluster, Solid-State Device for storage and a communication sub-system that has the telemetry for uplink and downlink communication. These sub-systems have a OpenVPX interface to the network. The OpenVPX systems are interconnected using RapidIO. The topology is constructed with double-redundancy, each sub-system connected to two switches in parallel. The RapidIO is a 5Gbps with 4 lanes providing a 20Gbps link for each of the systems. The FPGA cluster processing is excluded from the analysis to keep the focus on the OpenVPX + RapidIO network. The FPGA cluster generates reads andr write request every 249 ns. The image frames are full HD at 20 frames per second. The GPU cluster is processing ultra HD at 4K MPEG4. The user case is for the FPGA, GPU and the Communication system is to send read/write request to the Storage. Our goal is to get maximum efficiency out of the Storage and to size the RapidIO network for this system design.

We have modelled this system in a commercial system-level simulation package called VisualSim Architect from Mirabilis Design Inc. We modelled the system using the standard libraries in VisualSim Architect for the processors, DMA, memory, communication, OpenVPX and RapidIO. The system is setup with parameters for the common attributes including the speeds, types of images, traffic rates, processing times, packet sizes and the RapidIO network attributes including the switch speed and number of lanes. We executed the simulation for a variety of rates and RapidIO speeds. Here is the table of the different runs:

Run Number	RIO Speed	Lanes	HD Rate	Ultra HD Rate
1	5 Gbps	4	200 fps	300 fps
2	2 Gbps	4	100 fps	133 fps
3	1 Gbps	6	75 fps	75 fps
4	1 Gbps	4	30 fps	75 fps

The major finding was that the RapidIO had more than sufficient capacity to handle the traffic generated by the devices. An interesting finding was when the switch speed increased without an increase in the frames per second. Here we varied the packet sizes between 16 bytes and 64 bytes. You cans ee that the packet sizes do have an impact on the effective throughput.

Run 1 Statistics RIO_1_2 RIO_Input_MBps = 95.16, RIO_Output_MBps = 114.52,	Run 2 Statistics RIO_1_2 RIO_Input_MBps = 93.04, RIO_Output_MBps = 57.24,
Run 3 Statistics RIO_1_2 RIO_Input_MBps = 86.96, RIO_Output_MBps = 55.36,	Run 4 Statistics RIO_1_2 RIO_Input_MBps = 93.36, RIO_Output_MBps = 66.22,

The latency plot has the simulation time on the X-Axis and the end-to-end response time on the Y-axis. End-to-end response time or latency is measured from the completion of the processing, request to the storage device and the response, either the data for a read or an acknowledgement for the write. The FPGA Latency is the Read and Write requests to the Storage system form the FPGA cluster. Similarly the Communication and the GPU latency are the response times from the Clusters to the Storage and the response. The storage latency is the latency for the Read request at the Storage device.

The following plots are included:
1GB4LaneLatency- Latency for Run 4
1GB6LaneLatency- Latency for Run 3
1GB6Lane.png- Throughput plots for Run 3
2GB4Lane.png- Latency for Run 2
5GB4Lane.png- Latency for Run 1

Article: https://ieeexplore.ieee.org/document/7500650/

Board and System Interfaces

Quick Explanation

Protocol

Model Link

OpenVPX using RapidIO interconnect