Prefill is the part of LLM inferencing where the prompt is run through the model. The products of prefill are:
- K and V vectors for every layer of attention and every token in the prompt
- The final hidden state of the model prior to output tokens being generated
Because the whole input prompt is passed in, it is a dense computation and is therefore compute-limited. This contrasts with the next phase, decode.