nvidia-generative-ai-notes

Performance benchmark with NCCL

NCCL test (https://github.com/NVIDIA/nccl-tests) provide benchmarking tools for NCCL operations over TCP/IP or RDMA interconnects.

Install steps

  1. git clone https://github.com/NVIDIA/nccl-tests.git
  2. make MPI=1 (turn on MPI for distributed testing)

Example run

mpirun --prefix /usr/local \                                                       
  --launch-agent prted \
  -np 2 -host 10.0.0.131,10.0.0.147 -N 1 \
  ./build/all_reduce_perf -b 8 -e 1G -f 2 -g 1

Command breakdown:

Common pitfalls