Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sergiopaniego 
posted an update 8 days ago
Post
6234
new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler

This is a really detailed guide to PyTorch profiling—definitely a tool that many AI engineers overlook. That said, the data can still be quite overwhelming. A quick cheatsheet decoding what the core metrics mean (especially for custom kernels, Tensor Cores, or multi-GPU setups) would be a great addition.*

For anyone wanting to go even lower-level on NVIDIA hardware, Nsight Compute is also worth a look for some serious profiling: https://developer.nvidia.com/nsight-compute

Edit: OK I saw the summary at the end of the post. But a concise, self-contained cheatsheet with little graphics would help a lot.