What is Retrieval Augmented Generation (RAG)?

Large Language Models (LLMs) like GPT-4 or Mistral-7b are extraordinary in many ways, yet they come with a set of challenges.

For now, let's focus on one specific limitation: the timeliness of their data. Since these models are trained up to a particular cut-off date, they aren't well-suited for real-time or organization-specific information.

Imagine you're a developer architecting an LLM-enabled app for Amazon. You're aiming to support shoppers as they comb through Amazon for the latest deals on sneakers. Naturally, you want to furnish them with the most current offers available. After all, nobody wants to rely on outdated information, and the same holds for data queried from your LLM.

This is where Retrieval-Augmented Generation, commonly known as RAG, significantly improves the capabilities of LLMs.

In a way that might resemble a resourceful friend in an exam setting or during a speech who—figuratively speaking, of course—swiftly passes you the most relevant "cue card" out of a ton of information to help you understand what you should be writing or saying next.

With RAG, efficient retrieval of the most relevant data for your use case ensures the text generated is both current and substantiated.

RAG, as its name indicates, operates through a three-fold process:

  • Retrieval: It sources pertinent information.

  • Augmentation: This information is then added to the model's initial input.

  • Generation: Finally, the LLM utilizes this augmented input to create an informed output.

To put it simply, RAG empowers LLMs to include real-time, reliable data from external databases in their generated text.

For a better explanation, check out this video by Marina Danilevsky, Senior Research Staff Member at IBM Research where she shares two challenges with LLMs that are resolved with the help of Retrieval Augmented Generation.

(Credits: IBM Technology)

Last updated