T1611

Ladda ner som PDF

Parallel and Heterogeneous Computing with Microsoft PPL & AMP in C++

Computationally intensive work is often best solved by moving to parallel processing, to take advantage of multiple cores. In addition many developers are starting to look towards using different types of hardware to optimize their software further.

Audience

Experienced C++ Developers

Prior knowledge

Knowledge of multi-threaded programming using threads

Informator Training Cloud

Our training portal supports your development throughout the training. The portal saves you time and is designed to give you a more effective learning experience - leading to better results and greater knowledge enforcement after the course. Read more >

Today many C++ programmers are looking to the Microsoft PPL (Parallel Patterns Library) and Microsoft AMP (Accelerated Massive Parallelism) to help them write parallel algorithms to run on both the CPU and GPU. Microsoft PPL is designed to help developers quickly get their code running on multiple cores, while the AMP libraries are intended to remove much of the complexity of heterogeneous computing (particularly around mixing hardware from different manufacturers). This course investigates both PPL and AMP, however it has a larger focus on the more advanced AMP functionality.

You’ll come out of “Parallel and Heterogeneous Computing” knowing the following: * How to avoid common optimization pitfalls. * When to benefit from parallelism. * How the underlying hardware contributes to parallelism. * How to take advantage of multiple cores with C++ and Microsoft PPL. * How to take advantage of the GPU across multiple manufacturers with C++ and Microsoft AMP. * How to avoid common parallel and heterogeneous computing pitfalls. * How to debug and dig into parallel and heterogeneous programs.

Dag 1

  • Introduction to Parallelism

  • C++11 Refresher

    • Move semantics
    • Deleted/Defaulted Functions
    • Lambdas
  • Measuring Performance

    • Types of performance
    • Taking good benchmarks
    • Accounting for error
  • Hardware

  • CPU Internals
  • Instruction level... Läs mer

Today many C++ programmers are looking to the Microsoft PPL (Parallel Patterns Library) and Microsoft AMP (Accelerated Massive Parallelism) to help them write parallel algorithms to run on both the CPU and GPU. Microsoft PPL is designed to help developers quickly get their code running on multiple cores, while the AMP libraries are intended to remove much of the complexity of heterogeneous computing (particularly around mixing hardware from different manufacturers). This course investigates both PPL and AMP, however it has a larger focus on the more advanced AMP functionality.

You’ll come out of “Parallel and Heterogeneous Computing” knowing the following: * How to avoid common optimization pitfalls. * When to benefit from parallelism. * How the underlying hardware contributes to parallelism. * How to take advantage of multiple cores with C++ and Microsoft PPL. * How to take advantage of the GPU across multiple manufacturers with C++ and Microsoft AMP. * How to avoid common parallel and heterogeneous computing pitfalls. * How to debug and dig into parallel and heterogeneous programs.

Dag 1

  • Introduction to Parallelism

  • C++11 Refresher

    • Move semantics
    • Deleted/Defaulted Functions
    • Lambdas
  • Measuring Performance

    • Types of performance
    • Taking good benchmarks
    • Accounting for error
  • Hardware

  • CPU Internals
  • Instruction level parallelism

    • Understanding Limitations
    • Caching (NUMA)
    • Cost of shared writes
    • Common Pitfalls
  • Floating Point Numbers

    • 0.1 + 0.2 != 0.3
    • Limitations of floating point
    • Compiler optimizations with floating point
    • Parallel processing and accuracy
  • Identifying algorithms that are parallelizable

    • CPU bound Vs. IO bound
    • Re-writing for better parallelization.
    • Amdahl’s Law
  • Parallel patterns and algorithms

    • Map
    • Reduce
    • Scan
    • Pack
    • Command queues
    • Combined examples
  • Introduction to PPL

    • Why tasks instead of threads?
    • Runtime support
    • Primitives
    • parallel_for
    • non-determinism
  • Synchronization in PPL

    • critical_section
    • readerwriterlock
    • scopedlock/scopedread
    • event
    • costs of synchronization
    • alternatives
  • Visual Studio Debug Tools for Concurrency

    • Concurrency Visualizer
    • Parallel Stacks
    • Parallel Tasks
    • Parallel Watch
    • Identifying anti-patterns
    • Debugging PPL Algorithms
    • Event Tracing
  • Exception Handling

    • Catching task::get/task::wait
    • Exceptions inside parallel_for vs for loops
  • Similar Technologies to PPL

    • pthread
    • Intel TBB
    • Boost

Dag 2

  • Introduction to Vector programming

    • What is SIMD
    • Masking and execution coherence
    • Memory Layout
      • Structure of Array
      • Array of Structs
    • Auto vectorization
  • Introduction to the GPU Hardware

    • Hardware
    • Memory Types and Caching
    • Cores, Threads, Tiles and Warps
    • PCIe Bus
  • Methods of writing code for the GPU

    • OpenCL
    • CUDA
    • DirectCompute
    • Microsoft C++ AMP
  • Introduction to AMP

    • AMP Syntax and Data Types
    • array, array_view
    • index
    • extent
    • grid
    • restrict
  • parallelforeach

    • How to use
    • Optimizing Memory Move/Copy
  • Synchronizing memory with accelerators

    • Implicit synchronization
    • synchronize*()
    • data()
    • Lost Exceptions
  • Concurrency::fastmath and precisemath

    • What’s inside
    • Comparison to “standard” math.
    • Precision
    • Accelerator requirements
    • Example
  • Debugging with Warp

    • Visual Studio Tools
    • GPU Threads
    • Parallel Stacks
    • Parallel Watch
  • Floating Point Numbers

    • How they are handled
    • Why they are different from CPU
    • Performance of float/double operations

Dag 3

  • Tiling

    • Syntax
    • Determining tile size
    • Memory Coalescence
    • Memory Collisions
    • Tile Synchronization
  • AMP Atomic Operations

    • atomic_exchange()
    • atomic_fetch*()
  • Parallel patterns with AMP

    • Map
    • Reduce
    • Scan
    • Pack
  • AMP Accelerators

    • Accelerator properties
    • Shared memory
    • Using multiple accelerators
  • Concurrency::graphics

    • Exploiting the texture cache.
  • AMP Error Handling

    • Exceptions
    • Detecting/Recovering from TDR

Utbildningen levereras i samarbete med

Kurs-ID: T1611
Längd: 3 dagar
Pris exkl moms: 26 950 kr

Frågor om kursen!?

Har du frågor om kursens innehåll, leveransdatum/ort eller behöver en företagsanpassad variant? Fyll i formuläret nedan!


Kan betalas med:
TRAINING CARD

Ort och datum

Stockholm
19 dec – 21 dec
Boka nu!
Göteborg
19 dec – 21 dec R
Boka nu!
Malmö
19 dec – 21 dec R
Boka nu!
Cloud Access
i Läs mer

Delta på kursen från ditt hem, jobb eller annan plats.

19 dec – 21 dec
Boka nu!

Tipsa