| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

# Early power and performance optimization of algorithm implementation on ARM processors

## Francesco Regazzoni ALaRI Institute, University of Lugano (CH)

| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

## Outline

Motivation of the work.

**Overview of used tool: Mirabilis VisualSim.** 

- Case study: AES algorithm.
- First system.
- Second system.
- Conclusions.



| ALaRI |
|-------|
|-------|

### **Motivations**

- Achieving the required throughput at the lowest power consumption is a primary concern in the architecture of mobile handheld devices.
- The selection of the processor and the architecture of the full system must be determined before committing to hardware and software.
- A preliminary investigation of the performance vs. power trade-off can be performed leveraging on macro-architecture exploration.
- Using Mirabilis VisualSim a model of the system that provide such accuracy can be constructed in a few hours.

| UniversitàAdvanceddellaLearningSvizzeraand ResearchitalianaInstituteALaRI |
|---------------------------------------------------------------------------|
|---------------------------------------------------------------------------|

### VisualSim Overview 1/4

- VisualSim is a graphical and platform-independent architectural analysis and exploration software tool.
- VisualSim can be used in any application that requires the design of hardware and software elements.
- All of the system design aspects are addressed by VisualSim using the building blocks.
- All of the building blocks, simulation platforms, analysis and debugging required to construct a system are provide within a single framework.
- Thus models in all these analysis can be constructed quickly and easily.

### VisualSim Overview 2/4

#### Key features:

- > Design with multiple abstraction levels.
- Integrated multi-simulation engines and JIT data types.
- Extensive libraries of parameterized models.
- Publish to the Web for communication and remote execution.
- Graphical entry and hierarchical modeling.
- Robust visualization and analysis capabilities.
- Import Java/C/C++ and link to Excel & MatLab.
- > Automatic error checking between SmartBlocks models.
- Enable assertions for system-coverage.

| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

#### VisualSim Overview 3/4

#### Applications

- Design new and custom hardware and software architectures.
- Design sub-systems such as CPU, memory controllers and DMA.
- Sizing CPU speed, Bus width, Cache, Memory & Pipeline stages.
- > Architect embedded software.
- RTOS consideration.
- Design of new wireless and communication protocols.

| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

### VisualSim Overview 4/4

#### Analysis

- > Architecture utilization.
- > Application response time.
- Functional correctness of algorithms.
- >Buffer requirements.
- Implementation and design constraints generation.
- Power

| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |
|---------------------------------------------|------------------------------------------------------------|--|
|                                             |                                                            |  |

### Case Study 1/3

AES (Advanced Encryption Algorithm, FIPS 197):

- Block cipher: block size 128.
- Key size: 128; 192; 256.
- Round operations:
  - Shift row.
  - Mix Column.
  - > Non linear transformation (SBOX).
  - > Add Round key.

The Number of rounds depends on the key length.

|--|

### Case Study 2/3

The algorithm runs on ARM processors in different systems.

Different key length are tested.

- Measurements of:
  - > Latency.

Power consumed by the processor

The trade off is analyzed.

| Learning<br>and Research<br>Institute<br>ALaRI |
|------------------------------------------------|
|------------------------------------------------|

#### Case Study 3/3

- The AES code is an open source version taken from the web.
- The code was annotate and compiled with the gcc compiler to obtain the execution trace.
- The trace was used as input for the model of the CPU inside Mirabilis VisualSim.



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

## System one (Full system details)

- Dual processor ARM7TDMI.
- AES tasks are loaded on the two CPU.
- Parameters checked:
  - > System utilization.
  - Processor latency.
  - Processor instant power.
  - Battery power.



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |
|---------------------------------------------|------------------------------------------------------------|--|
|                                             |                                                            |  |

### System one (Full system in VisualSim)

#### AFS Dual ARM7 Platform Model. Performance Model Parameters Simulator Engine Sim\_Time: 6.0E-04 DE Simulator Scenarios Kev Size Index: 4 Bytes\_Sent: 10 Scenario (1): AES Annotated Software Processor\_Speed\_Mhz: 133.0 HW Architecture State\_Plot\_Block State\_Plot\_Block2 Arch\_Setup Utilization Instant Power\_Manager Dual\_ARM\_7 Task Setup Input Architectu. Display\_Text Battery ARM. Latency Stati.. Instruction\_Set RM Instr Set ARM7\_INST HW Architecture SW Architecture DS\_Expr2 DS\_Expr LineReader Task\_Rate Read C Code Annotation Data Structures

www.alari.ch

1.0E

from annotate.txt file and send to Processor.

#### source: www.arm.com

| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |  |
|---------------------------------------------|------------------------------------------------------------|--|--|
|                                             |                                                            |  |  |

## System one (ARM7TDMI in VisualSim)

#### Dual ARM7 + L2 Cache + SDRAM

Detailed processor portion of model.

#### Parameters.

www.alari.ch

- Processor\_Speed\_Mhz: Processor\_Speed\_Mhz
- I\_Cache\_KB: "16"
- D\_Cache\_KB: "8"
- Bus\_Speed\_Mhz: Processor\_Speed\_Mhz
- Cache\_Speed\_Mhz: Processor\_Speed\_Mhz
- Cache\_Size\_KB: 32
- RAM\_Speed\_Mhz: Processor\_Speed\_Mhz / 2.0
- RAM\_Size\_MB: 8
- RAM\_Access\_Time: "Read 8.0,Prefetch 8.0,Refresh 8.0,Write 7.5"





source: www.arm.com

| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |
|---------------------------------------------|------------------------------------------------------------|--|
|                                             |                                                            |  |

## System one (Processors Utilization)



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |
|---------------------------------------------|------------------------------------------------------------|--|
|                                             |                                                            |  |

#### System one (battery and instant power)



| Learning<br>and Research<br>Institute<br>ALaRI |
|------------------------------------------------|
|------------------------------------------------|

## System two (Full system details)

Single processor ARM-8Cortex.

All AES tasks are loaded on the same CPU.

#### Parameters checked:

- > System utilization.
- Processor latency.
- Processor instant power.
- **>** Battery power.



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |  |  |
|---------------------------------------------|------------------------------------------------------------|--|--|--|
|                                             |                                                            |  |  |  |

## System two (Full system in VisualSim)

#### AES ARM8 Platform Model.

#### Scenarios

Scenario (1) : AES Annotated Software

#### Performance Model Parameters

- Sim Time: 3.0E-04
- Sim\_Time: 5.0E=0
  Key\_Size\_Index: 4
- Bytes\_Sent: 10
- Processor\_Speed\_Mhz: 600.0

#### Simulator Engine

\_\_\_\_\_





| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |
|---------------------------------------------|------------------------------------------------------------|--|
|                                             |                                                            |  |

#### System two (ARM8-Cortex in VisualSim)

#### ARM8 + L2 Cache + SDRAM

#### Detailed processor portion of model.

#### Parameters.

- Processor\_Speed\_Mhz: Processor\_Speed\_Mhz
- I Cache KB: "16"
- D\_Cache\_KB: "8"
- Bus\_Speed\_Mhz: Processor\_Speed\_Mhz
- Cache\_Speed\_Mhz: Processor\_Speed\_Mhz
- Cache\_Size\_KB: 32
- RAM\_Speed\_Mhz: Processor\_Speed\_Mhz / 2.0
- RAM\_Size\_MB: 8
- RAM\_Access\_Time: "Read 3.0, Prefetch 3.0, Refresh 3.0, Write 2.5"





### System two (Processor Utilization)



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |  |
|---------------------------------------------|------------------------------------------------------------|--|
|                                             |                                                            |  |

#### System two (battery and instant power)



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

### **System Comparison**

#### ARM8-Cortex/ARM7TDMI power consumption:

≻9.5

## ARM8-Cortex/ARM7TDMI speed:

≻2

**ARM8-Cortex** is faster

Two ARM7TDMI consume less power



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|---------------------------------------------|------------------------------------------------------------|

### Conclusions

- Performance/Power Trade off is of crucial importance in embedded systems design.
- The presented case study demonstrates that using Mirabilis VisualSim was possible to:
  - Model the full system in few hours (the full experiment was realized in less then 10 hours).
  - Analyze the latency and the power consumed by the full system ad its main components.
  - Explore very quickly different implementation and platform.

| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

# **Questions?**



| Università<br>della<br>Svizzera<br>italiana | Advanced<br>Learning<br>and Research<br>Institute<br>ALaRI |
|---------------------------------------------|------------------------------------------------------------|
|                                             |                                                            |

# Thank you for attention. (regazzoni@alari.ch)

Please fell free to come to Mirabilis Design (booth F-51) for a demo