All talksAccelerated Computing 
De-mystifying PyTorch for ASICs
When, and why, to move your model development onto AI accelerators.

- Type
- Talk
- Category
- Accelerated Computing
- Level
- Intermediate
- Duration
- 30 min
- Language
- English
PyTorchH100TPUTrainiumASICsBenchmarking
Abstract
How and why to move PyTorch development onto AI accelerators. Covers how ASICs and the XLA/Neuron compilers work, PyTorch/XLA on Google TPU and TorchNeuronX on AWS Trainium, the performance picture and the 'compiler tax', the common ASIC errors you'll hit, and a practical migration decision framework, benchmarked across H100, TPU v6e, and Trainium.
Outline
- 01Deep learning computations and how ASICs and XLA work
- 02Google TPU with PyTorch/XLA
- 03AWS Trainium with TorchNeuronX
- 04Performance analysis and the 'compiler tax'
- 05Common ASIC errors (device busy, OOM) and their fixes
- 06Migration decision notes: when to move
- 07The way forward: TorchTPU, MAIA 200, TPU v7 vs Trainium3 vs NVIDIA
Key takeaways
- PyTorch/XLA and TorchNeuronX let you keep PyTorch while targeting TPU and Trainium
- Budget for the 'compiler tax': graph compilation and torch_xla.sync() change how you code
- Most ASIC failures are device-busy or OOM, with known, quick fixes
- ASICs build on top of PyTorch, so your skills transfer
Slides
Open in new tabDelivered 2 times
EventOrganizerDateReach
- AI Study Group 2026OnlineData Engineering PilipinasMay 9, 202620
- PyTorch Conference Europe 2026Station F, Paris, FranceLinux FoundationApr 7, 2026100