+ +
- -
Systèmes d'Exploitation
Calendrier  Détails
Programmation
 
Calendrier  Détails
Processeurs ARM
 
Calendrier  Détails
Processeurs PowerPC
 
 
 
 
Calendrier  Détails
Communications
 
Calendrier  Détails
+ +
> >
- -

 
ac6 >> ac6-formation >> Processeurs PowerPC >> IBM processors >> PPC970FX implementation Télécharger le catalogue Télécharger la page Ecrivez nous Version imprimable

PC2 PPC970FX implementation

This course covers the IBM Power 970FX Power G5 CPU

formateur
Objectives
  • The course details the pipeline operation in order to determine code optimization guidelines.
  • Data and instruction paths between SDRAM, L1 caches and L2 cache are highlighted.
  • MERSI cache coherency protocol is introduced in increasing depth.
  • The operation of the elastic bus is described.
  • Through a FFT algorithm, the instructor shows how to vectorize processing and reduce execution time using data streaming.
  • The performance monitor is used to optimize the performance of the FFT.
A more detailed course description is available on request at formation@ac6-formation.com

OVERVIEW
  • Functional units
  • Key features
PPC970 PIPELINE
  • Pipeline basics
  • Deeply pipelined design, superscalar implementation, register renaming
  • Branch prediction mechanism
  • Instruction decode and preprocessing
  • Instruction dispatch, sequencing and completion control, register renaming
  • Dispatch group organization
  • Synchronization-based instruction grouping
  • Instruction latencies and throughputs
  • Software optimisation guidelines
MEMORY MANAGEMENT UNIT
  • MMU goals
  • Data address translation, 128-entry Data ERAT, ERAT Miss Queue
  • Second-level Memory Management Unit consisting of SLB and TLB
  • 1024-entry 4-way set associative TLB, 64-entry fully associative SLB
  • Large page support
  • Real memory limit register
  • Hypervisor vs supervisor
  • Support for 32-bit operating systems
INTERNAL DATA PATHS
  • Data paths between load / store units, instruction queue, L2 and external bus
  • Out-of-order and speculative issue of load operations
  • 32-entry real address based store queues
  • 32-entry load re-order queue, tracking of the order of loads
  • 8-entry load miss queue
  • GUS subsystem
  • Core Interface Unit
  • L2 cache controller
  • Non Cacheable Unit
  • Storage access ordering
  • Hardware controlled data prefetch
  • Prefetch startup sequence, stream detection
  • Synchronization instructions sync, lwsync, ptesync
L1 AND L2 CACHES
  • Cache basics
  • 64 kB direct-mapped instruction cache
  • 32 kB 2-way set associative data cache, FIFO replacement policy, Store-through policy
  • 512 kB L2 cache, fully inclusive of L1 data caches, MERSI coherency protocol
  • Cache coherency, MERSI cache line state, cache state transition tables
PROGRAMMING
  • Branch instructions
  • The system call communication path between applications and RTOS
  • Integer load / store instructions
  • Integer arithmetic and logic instructions
  • IEEE754 basics
  • FPU operation : FPSCR register
  • Float load / store instructions, floating point exceptions
  • Float arithmetic instructions
  • The EABI
  • Code and data sections, small data areas benefits
  • 970FX specific registers
THE PERFORMANCE MONITOR
  • Objectives
  • Event selection
  • Configuring the performance monitor bus
  • Instruction matching and sampling, the 3 stages of eligibility
EXCEPTION MECHANISM
  • Exception recognition and priorities
  • Focus on soft patch and maintenance exceptions
  • Registers updating according to the exception cause
  • Requirements to support exception nesting
  • Precise processing of machine check exceptions
VMX IMPLEMENTATION
  • VMX introduction, SIMD processing
  • Intra vs inter element instructions
  • VMX registers, VSCR initialization
  • ANSI C extension to support vector operators, new C types, new castings, vector declaration and initialization
  • VMX implementation on the PPC970FX
  • Data streams management
  • EABI extension to support VMX
POWER AND THERMAL MANAGEMENT
  • Clocking, PLL design
  • Time Base and decrementer
  • Frequency and voltage scaling
  • Additional dynamic power management
HARDWARE IMPLEMENTATION
  • Unidirectional point-to-point bus segments, source synchronized transfers
  • Packet protocols
  • Snoop response
  • Pipelined transactions
  • Power-on procedure
  • Electrical interface