Accelerated Computing

De-mystifying PyTorch for ASICs

When, and why, to move your model development onto AI accelerators.

Type: Talk
Category: Accelerated Computing
Level: Intermediate
Duration: 30 min
Language: English

PyTorchH100TPUTrainiumASICsBenchmarking

Abstract

How and why to move PyTorch development onto AI accelerators. Covers how ASICs and the XLA/Neuron compilers work, PyTorch/XLA on Google TPU and TorchNeuronX on AWS Trainium, the performance picture and the 'compiler tax', the common ASIC errors you'll hit, and a practical migration decision framework, benchmarked across H100, TPU v6e, and Trainium.

Outline

01Deep learning computations and how ASICs and XLA work
02Google TPU with PyTorch/XLA
03AWS Trainium with TorchNeuronX
04Performance analysis and the 'compiler tax'
05Common ASIC errors (device busy, OOM) and their fixes
06Migration decision notes: when to move
07The way forward: TorchTPU, MAIA 200, TPU v7 vs Trainium3 vs NVIDIA

Key takeaways

PyTorch/XLA and TorchNeuronX let you keep PyTorch while targeting TPU and Trainium
Budget for the 'compiler tax': graph compilation and torch_xla.sync() change how you code
Most ASIC failures are device-busy or OOM, with known, quick fixes
ASICs build on top of PyTorch, so your skills transfer

Slides

Open in new tab

Delivered 2 times

EventOrganizerDateReach

AI Study Group 2026OnlineData Engineering PilipinasMay 9, 202620
PyTorch Conference Europe 2026Station F, Paris, FranceLinux FoundationApr 7, 2026100

De-mystifying PyTorch for ASICs

Abstract

Outline

Key takeaways

Slides

Delivered 2 times

More talks

From Spark to System: Turning Ideas into Cloud-Driven Real-World Solutions

Navigating the Grey: Scaling from Single Worker to Multi-VM Undetectable Scrapers

Introduction to GitHub Copilot and AI Development