FPGA Optimization Training | Ac6 Formation

ac6-formation, un département d'Ac6 SAS
EN
EnglishFrench
 
go-up

ac6 ac6-formation Programming FPGA FPGA Optimization
V4FPGA Optimization
Hardware Architecture
Objectives
  • Design and optimize FPGA-based systems for performance, area, and power
  • Understand timing, pipelining, and clock domain crossing challenges
  • Apply efficient memory and resource utilization strategies
  • Develop hardware implementations of DSP and mathematical algorithms
  • Master RTL coding techniques for synthesis and hardware reliability
  • Perform simulation, verification, and timing analysis
  • Understand physical design: floorplanning, place & route, and constraints
  • Integrate FPGA designs with embedded software systems
  • Explore hardware/software co-design and acceleration techniques
  • Use ARM NEON for software-side performance optimization
  • VHDL or Verilog concepts
  • C Language knowledge (see for example our L2 training course)
  • Familiarity with FPGA concepts
  • Theoretical course
    • PDF course material (in English)
    • The trainer to answer trainees’ questions during the training and provide technical and pedagogical assistance
  • Practical activities
    • Practical activities represent from 40% to 50% of course duration
    • Example code, labs and solutions
    • Vivado or Libero for design, synthesis, and timing analysis; ModelSim or Vivado for simulation
  • Any embedded systems engineer or technician with the above prerequisites.
  • The prerequisites indicated above are assessed before the training by the technical supervision of the traineein his company, or by the trainee himself in the exceptional case of an individual trainee.
  • Trainee progress is assessed in two different ways, depending on the course:
    • For courses lending themselves to practical exercises, the results of the exercises are checked by the trainer while, if necessary, helping trainees to carry them out by providing additional details.
    • Quizzes are offered at the end of sections that do not include practical exercises to verifythat the trainees have assimilated the points presented
  • At the end of the training, each trainee receives a certificate attesting that they have successfully completed the course.
    • In the event of a problem, discovered during the course, due to a lack of prerequisites by the trainee a different or additional training is offered to them, generally to reinforce their prerequisites,in agreement with their company manager if applicable.

Course Outline

  • High Throughput
  • Low Latency
  • Timing
    • Add Register Layers
    • Parallel Structures
    • Flatten Logic Structures
    • Register Balancing
    • Reorder Paths
Exercise:  Example of Optimizing a Multiply-Accumulate Block
  • Rolling Up the Pipeline
  • Control-Based Logic Reuse
  • Resource Sharing
    • Impact of Reset on Area
    • Resources Without Reset
    • Resources Without Set
    • Resources Without Asynchronous Reset
    • Resetting RAM
    • Utilizing Set/Reset Flip-Flop Pins
Exercise:  Example of analyzing, comparing and optimizing multiple designs
  • Clock Control
    • Clock Skew
    • Managing Skew
  • Input Control
  • Reducing the Voltage Supply
  • Dual-Edge Triggered Flip-Flops
  • Modifying Terminations
  • AES Architectures
    • One Stage for Sub-bytes
    • Zero Stages for Shift Rows
    • Two Pipeline Stages for Mix-Column
    • One Stage for Add Round Key
    • Compact Architecture
    • Partially Pipelined Architecture
    • Fully Pipelined Architecture
  • Performance Versus Area
  • Other Optimizations
  • Abstract Design Techniques
  • Graphical State Machines
  • DSP Design
  • Software/Hardware Codesign Thread Fundamentals
  • Crossing Clock Domains
    • Metastability
    • Solution 1: Phase Control
    • Solution 2: Double Flopping
    • Solution 3: FIFO Structure
    • Partitioning Synchronizer Blocks
  • Gated Clocks in ASIC Prototypes
  • Clocks Module
  • Gating Removal Runtime Statistics
Exercise:  Show the effects of metastability when crossing asynchronous signal
Exercise:  Measure the probability of metastability by simulating with random input changes
  • Hardware Division
    • Multiply and Shift
    • Iterative Division
    • The Goldschmidt Method
  • Taylor and Maclaurin Series Expansion
  • The CORDIC Algorithm
Exercise:  Example Design: I2S Versus SPDIF
Exercise:  Example Design: Floating-Point Unit
  • Asynchronous Versus Synchronous
    • Problems with Fully Asynchronous Resets
    • Fully Synchronized Resets
    • Asynchronous Assertion, Synchronous Deassertion
  • Mixing Reset Types
    • Nonresetable Flip-Flops
    • Internally Generated Resets
  • Multiple Clock Domains
Exercise:  Observe the differences between async and sync resets on flip-flops
  • Testbench Architecture
    • Testbench Components
    • Testbench Flow
  • Main Thread
  • Clocks and Resets
  • Test Cases
  • System Stimulus
    • MATLAB
    • Bus-Functional Models
  • Code Coverage
  • Gate-Level Simulations
  • Toggle Coverage
  • Run-Time Traps
    • Timescale
    • Glitch Rejection
    • Combinatorial Delay Modeling
Exercise:  Understanding event bit group by synchronizing several threads
  • Design Partitioning
  • Critical-Path Floorplanning
  • Floorplanning Dangers
  • Optimal Floorplanning
    • Data Path
    • High Fan-Out
    • Device Structure
    • Reusability
  • Reducing Power Dissipation
  • Standard Analysis
  • Latches
  • Asynchronous Circuits
    • Combinatorial Feedback
  • Power Supply
    • Supply Requirements
    • Regulation
  • Decoupling Capacitors
    • Concept
    • Calculating Values
    • Capacitor Placement
  • SRC Architecture
  • Synthesis Optimizations
    • Speed Versus Area
    • Pipelining
    • Physical Synthesis
  • Floorplan Optimizations
    • Partitioned Floorplan
    • Critical-Path Floorplan
  • FPGA Memory Types
    • Flip-Flops (FF) vs LUT RAM vs Block RAM (BRAM) vs UltraRAM
  • When NOT to use Flip-Flops
    • Resource explosion and routing impact
  • Efficient Memory Mapping
    • Using BRAM for buffers and FIFOs
    • Inferring RAM in HDL
  • Distributed RAM usage strategies
  • DSP Blocks in FPGA
    • Multipliers and MAC units
  • FFT Architectures
    • Radix-2 / Radix-4 basics
    • Pipelined vs iterative FFT
    • Fixed-point vs floating-point trade-offs
    • Throughput vs resource trade-offs
Exercise:  Implement a dynamic FFT IP from the PS part
More

To book a training session or for more information, please contact us on info@ac6-training.com.

Registrations are accepted till one week before the start date for scheduled classes. For late registrations, please consult us.

You can also fill and send us the registration form

This course can be provided either remotely, in our Paris training center or worldwide on your premises.

Scheduled classes are confirmed as soon as there is two confirmed bookings. Bookings are accepted until 1 week before the course start.

Last update of course schedule: 23 February 2026

Booking one of our trainings is subject to our General Terms of Sales