NVIDIA Nsight is a family of developer tools used to profile, debug, and optimize GPU applications running on the CUDA platform. It is widely used for: Deep learning optimization HPC workloads CUDA kernel debugging GPU performance analysis Nsight helps engineers understand what is happening inside the GPU during execution.
Nsight is actually a suite of tools, not a single program.
| Tool | Purpose |
|---|---|
| Nsight Systems | System-level performance tracing |
| Nsight Compute | CUDA kernel profiling |
| Nsight Graphics | Graphics debugging |
| Nsight Eclipse / VS integration | CUDA debugging in IDE |
Nsight Systems shows how the CPU, GPU, and other processes interact over time. It answers questions like:
You can visualize:
Example command:
nsys profile python train.py
Nsight Compute analyzes individual GPU kernels in detail. It answers questions like:
Metrics include: | Metric | Meaning | | ———————– | ————————— | | SM occupancy | How busy GPU cores are | | Tensor core utilization | Matrix unit usage | | Warp efficiency | Thread execution efficiency | | Memory throughput | Global memory bandwidth | | Shared memory usage | On-chip memory efficiency |
Example command:
ncu --set full python train.py
Example output:
Kernel: attention_forward
SM Occupancy: 72%
Tensor Core Utilization: 90%
Memory Bandwidth: 63%
In distributed training (Megatron, DeepSpeed, etc.), Nsight helps analyze: