All talks
Accelerated Computing

De-mystifying PyTorch for ASICs

When, and why, to move your model development onto AI accelerators.

De-mystifying PyTorch for ASICs title slide
Type
Talk
Category
Accelerated Computing
Level
Intermediate
Duration
30 min
Language
English
PyTorchH100TPUTrainiumASICsBenchmarking

Abstract

How and why to move PyTorch development onto AI accelerators. Covers how ASICs and the XLA/Neuron compilers work, PyTorch/XLA on Google TPU and TorchNeuronX on AWS Trainium, the performance picture and the 'compiler tax', the common ASIC errors you'll hit, and a practical migration decision framework, benchmarked across H100, TPU v6e, and Trainium.

Outline

  1. 01Deep learning computations and how ASICs and XLA work
  2. 02Google TPU with PyTorch/XLA
  3. 03AWS Trainium with TorchNeuronX
  4. 04Performance analysis and the 'compiler tax'
  5. 05Common ASIC errors (device busy, OOM) and their fixes
  6. 06Migration decision notes: when to move
  7. 07The way forward: TorchTPU, MAIA 200, TPU v7 vs Trainium3 vs NVIDIA

Key takeaways

  • PyTorch/XLA and TorchNeuronX let you keep PyTorch while targeting TPU and Trainium
  • Budget for the 'compiler tax': graph compilation and torch_xla.sync() change how you code
  • Most ASIC failures are device-busy or OOM, with known, quick fixes
  • ASICs build on top of PyTorch, so your skills transfer

Delivered 2 times