|
|
|
|
| LOGICAL AND BITFIELD INSTRUCTIONS |
 |
Logical AND, Bit Clear, OR, XOR |
 |
Operations with immediate values |
 |
Bitwise insert instructions, avoiding branches |
 |
Count Leading zeros, ones, signs |
 |
Normalizing floating point numbers when VFP is not implemented |
 |
Scalar duplicate |
 |
Extract |
 |
Shift with possible rounding and saturation |
 |
Bitfield revers |
|
 |
Practical lab: Transposing a matrix, shifting a large bitmap using vector instructions |
| ARITHMERICAL INSTRUCTIONS |
 |
Add, modulo vs saturated arithmetic |
 |
Halving / Doubling the result |
 |
Rounding |
 |
Subtract |
 |
Multiply |
 |
Multiply accumulate / Multiply subtract |
 |
Absolute value |
 |
Min / Max |
 |
Converting Floating Point numbers into Fixed point numbers |
 |
Converting Fixed point numbers into Floating point numbers |
 |
Reciprocal estimate, reciprocal square root estimate, Newton-raphson algorithm |
 |
Pairwise instructions |
 |
Element comparison |
|
 |
Practical lab: implementing a complex multiply accumulate with NEON |
|
 |
Practical lab: converting fixed-point elements into single precision floating point values and adding the resulting elements |
| NEON CODING EXAMPLES |
 |
FIR filter |
|
 |
Converting the scalar algorithm into a vector algorithm |
|
 |
Finding the NEON instructions to encode the vector algorithm |
|
 |
Optimizing the code |
|
 |
Using the performance monitor to tune the algorithm |
|
 |
FFT (DFT) |
|
 |
Converting the scalar algorithm into a vector algorithm, understanding how circle properties can be used to process 4 angles concurrently |
|
 |
Finding the NEON instructions to encode the vector algorithm |
|
 |
Optimizing the code |
|
 |
Using the performance monitor to tune the algorithm |