Sequence length or context length or context window is the maximum number of tokens that can be input into a transformer.

  • GPT-4o has a sequence length of 128K tokens
  • Llama-3.1 405b has a sequence length of 128K tokens
  • DeepSeek-R1 has a sequence length of 128K tokens