Retrieval-augmented generation (RAG) is a way of including external context into LLM inferencing to enable a model to respond to queries based on both its training and data from external sources (PDF files, the Internet, etc). It was first introduced in 2020 by Meta1 and is the reason why vector databases have become an essential part of modern AI.

Footnotes

  1. https://arxiv.org/abs/2005.11401