|
|
| You are here: ac6 > ac6-formation > ARM cores > VFP programming |
| RC0 | VFP programming |
| Objectives | |||
| This course has been designed for programmers wanting to develop algorithm based on hardware floating point calculations. |
|||
| Each instruction family is detailed, first at assembly level, and then at C level using macros. | |||
| Several tricky usage of vector instructions are provided. |
|||
| The underlying cache operation as well as preload mechanisms (instruction and hardware prefetch) are detailed to explain how a processing can be pipelined . |
|||
| The course shows how DSP typical algorithms such as FIR and FFT can be vectorized and then optimized to be executed on VFP unit. | |||
| THIS COURSE IS PROPOSED EITHER AS AN INSTRUCTOR-LED COURSE OR AS E-LEARNING. | |||
| ACSYS has developed an optimized VFP based FFT coded in assembler language | |||
| performance for 1024 complex floating point single precision samples is 220_000 core clock cycles (ARM11) | |||
| for any information contact guillaume.peron@ac6.fr | |||
| Labs are run under RVDS |
|||
| A more detailed course description is available on request at info@ac6-training.com | |||
| Prerequisites | |||
| Knowledge of 4T / V5TE instruction set. |
|||
| Outline |
| IEEE754 STANDARD | |||
| Floating point number coding | |||
| Denormalized numbers | |||
| NaN utilization | |||
| Rounding modess | |||
| VFP FPEXC register | |||
| INTRODUCTION TO VFPv3 | |||
| Register bank, D registers, S registers | |||
| Instruction coding, either ARM or Thumb-2 | |||
| Related system registers | |||
| Alignment issues | |||
| Context switching | |||
| VECTOR vs SCALAR OPERATION | |||
| Length / Stride combinations | |||
| Scalar operations | |||
| Vector operations | |||
| Mixed operations | |||
| VFP LOAD / STORE INSTRUCTIONS | |||
| Addressing modes | |||
| Floating point load / store | |||
| Floating point load / store multiple | |||
| Processor acceleration mechanisms: store merging buffers | |||
| ARITHMERICAL INSTRUCTIONS | |||
| Add / subtract / absolute value instructions | |||
| Multiply and multiply accumulate instructions | |||
| Divide instruction | |||
| Square root instruction | |||
| Compare instructions | |||
| Integer to FP and FP to convert instructions | |||
| VFP CODING EXAMPLES | |||
| FIR filter | |||
| Converting the scalar algorithm into a vector algorithm | |||
| Finding the VFP instructions to encode the vector algorithm | |||
| Optimizing the code | |||
| FFT (DFT) | |||
| Converting the scalar algorithm into a vector algorithm, understanding how circle properties can be used to process 4 angles concurrently | |||
| Finding the VFP instructions to encode the vector algorithm | |||
| Optimizing the code | |||