nvidia-generative-ai-notes

Sampling

At each step the model produces a probability distribution over the next token. Sampling means: randomly pick the next token from a filtered version of that distribution, instead of always picking the highest-probability token.

Temperature (temperature)

What it does: reshapes the probability distribution before sampling.

Given logits z, temperature T scales them:

z′=z/T

Then softmax is applied to get probabilities.

Top-k (top_k)

What it does: keeps only the k most probable tokens at each step; sets all others to probability 0; then samples from the remaining k.

Example: if top_k=50, you sample from the 50 most likely next tokens.

Top-p / nucleus sampling (top_p)

What it does: keeps the smallest set of tokens whose cumulative probability is at least p, then samples from that set.

So the number of allowed tokens changes depending on how confident the model is.

Typical values: