💪
3 Week Bootcamp: Building Realtime LLM Application
  • Introduction
    • Timelines and Structure
    • Course Syllabus
    • Meet your Instructors
    • Action Items
  • Basics of LLM
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of Large Language Models
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors Simplified
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
      • Neural Networks and Transformers (Bonus Module)
      • Attention and Transformers (Bonus Module)
      • Multi-Head Attention, Transformers Architecture, and Further Reads (Bonus Module)
    • Graded Quiz 1
  • Prompt Engineering
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • Best Practices to Follow in Prompt Engineering
    • Token Limits in Prompts
    • Ungraded Prompt Engineering Excercise
      • Story for the Excercise: The eSports Enigma
      • Your Task
  • Retrieval Augmented Generation and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-Trained and Fine-Tuned LLMs
    • In-Context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • LLM Architecture Diagram and Various Steps
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in Retrieval-Augmented Generation (RAG)
    • Key Benefits of RAG for Enterprise-Grade LLM Applications
    • Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search in Vector Embeddings (Bonus Module)
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites
    • Dropbox Retrieval App in 15 Minutes
      • Building the app without Dockerization
      • Understanding Docker
      • Building the Dockerized App
    • Amazon Discounts App
      • How the Project Works
      • Repository Walkthrough
    • How to Run 'Examples'
  • Bonus Resource: Recorded Interactions from the Archives
  • Bootcamp Keynote Session on Vision Transformers
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Tracks for Submission
    • Final Submission
Powered by GitBook
On this page
  • 1. Fine-Tuning vs RAG
  • 2. Prompt Engineering vs RAG
  1. Retrieval Augmented Generation and LLM Architecture

RAG versus Fine-Tuning and Prompt Engineering

In the rapidly evolving landscape of Large Language Models (LLMs), achieving cost-efficiency and operational simplicity is critical, and this is where Retrieval-Augmented Generation (RAG) shines. When compared to methods like Fine-tuning and Prompt engineering, RAG stands out due to its advantages in cost-effectiveness, simplicity, and adaptability.

Let's individually explore these options to understand where RAG excels.

1. Fine-Tuning vs RAG

For those less familiar with the concept, fine-tuning involves modifying a pre-trained language model (such as GPT-3.5 Turbo, Mistral-7b, or Llama-2) with a smaller, targeted dataset to work optimally for specific use cases.

While fine-tuning avoids the need to build a model from scratch, it does have its drawbacks, which RAG effectively addresses.

Data Preparation Challenges

  • Having control over training data permits steps to address biases, yet implementing such measures is far from straightforward. Interventions like altering variable importance or ensuring balanced data distribution demand in-depth data analysis skills.

  • Furthermore, expertise in the subject matter is essential for accurately annotating data that serves specialized or research-specific functions.

Cost Efficiency

  • Retraining and deployment are not only time-consuming but also financially taxing.

  • For instance, the use of vector embeddings API in RAG models is roughly 80 times less expensive compared to commonly utilized fine-tuning APIs from OpenAI.

  • Consider the need to repeat this process each time your company launches a new product, all to ensure that your teams are not provided with outdated information from your Gen AI model.

Data Freshness

  • At the outset, it's only logical to expect that when developing an LLM application, you'd want your large language model to consistently deliver current and pertinent out.

  • When it comes to fine-tuning, the model's accuracy can significantly decline if the data undergoes changes or isn't regularly updated. Consequently, despite the associated challenges, this task must be performed at frequent intervals to maintain the model's efficacy.

2. Prompt Engineering vs RAG

Prompt engineering might seem like a lighter alternative but comes with its own set of challenges, such as data privacy, inefficient retrieval of information, and the technical constraint of a token limit.

  • Data Privacy: For organizations handling sensitive information, the act of manually copy-pasting large chunks of data to retrieve a specific piece poses a risk of unintended data exposure.

  • Inefficient Retrieval: When dealing with vast data corpora, knowing where to find the relevant data becomes crucial. Manual prompt engineering lacks the efficiency offered by automated mechanisms, such as vector indexing in RAG, which enables quick and semantically accurate data retrieval.

  • Token Limit Constraints: Language models have built-in token limitations, restricting the amount of text they can process in a single prompt. This makes it challenging to include all the necessary information in one interaction.

In contrast, RAG's approach of storing data in efficient vector indexes circumvents these limitations by facilitating quick and semantically relevant information retrieval, making it a more viable option for dealing with large and complex data sets.

PreviousLLM Architecture Diagram and Various StepsNextVersatility and Efficiency in Retrieval-Augmented Generation (RAG)

Last updated 1 year ago