Sequence length or context length or context window is the maximum number of tokens that can be input into a transformer.
- GPT-4o has a sequence length of 128K tokens
- Llama-3.1 405b has a sequence length of 128K tokens
- DeepSeek-R1 has a sequence length of 128K tokens