💪
3 Week Bootcamp: Building Realtime LLM Application
  • Introduction
    • Timelines and Structure
    • Course Syllabus
    • Meet your Instructors
    • Action Items
  • Basics of LLM
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of Large Language Models
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors Simplified
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
      • Neural Networks and Transformers (Bonus Module)
      • Attention and Transformers (Bonus Module)
      • Multi-Head Attention, Transformers Architecture, and Further Reads (Bonus Module)
    • Graded Quiz 1
  • Prompt Engineering
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • Best Practices to Follow in Prompt Engineering
    • Token Limits in Prompts
    • Ungraded Prompt Engineering Excercise
      • Story for the Excercise: The eSports Enigma
      • Your Task
  • Retrieval Augmented Generation and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-Trained and Fine-Tuned LLMs
    • In-Context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • LLM Architecture Diagram and Various Steps
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in Retrieval-Augmented Generation (RAG)
    • Key Benefits of RAG for Enterprise-Grade LLM Applications
    • Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search in Vector Embeddings (Bonus Module)
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites
    • Dropbox Retrieval App in 15 Minutes
      • Building the app without Dockerization
      • Understanding Docker
      • Building the Dockerized App
    • Amazon Discounts App
      • How the Project Works
      • Repository Walkthrough
    • How to Run 'Examples'
  • Bonus Resource: Recorded Interactions from the Archives
  • Bootcamp Keynote Session on Vision Transformers
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Tracks for Submission
    • Final Submission
Powered by GitBook
On this page
  1. Retrieval Augmented Generation and LLM Architecture

LLM Architecture Diagram and Various Steps

PreviousDiving Deeper: LLM Architecture ComponentsNextRAG versus Fine-Tuning and Prompt Engineering

Last updated 1 year ago

Now that we've explored the various components that make up the architecture of Large Language Models (LLMs), let's dive into how Retrieval-Augmented Generation (RAG) can work synergistically with these components of an LLM architecture. The aim is to show you how RAG can supercharge an LLM's capabilities by seamlessly integrating real-time or static data sources into the information retrieval and generation processes.

LLM Architecture

For a nuanced understanding of how Retrieval-Augmented Generation (RAG) optimizes Large Language Models, we'll delve into the essential elements and procedural steps that comprise the LLM architecture.

  1. Data Sources: Whether your starting point is cloud storage, Git repositories, or databases like PostgreSQL, the first task is to bring these varied data forms together through pre-configured connectors.

  2. Dynamic Vector Indexing: Text from these data sources is broken down into smaller segments (also called "chunks") and converted into vector representations. Models specialized for text embeddings, such as OpenAI's text-embedding-ada-002, are employed here. These vectors are continuously indexed to facilitate rapid search later on.

  3. Query Transformation: A user’s input query is likewise transformed into a compatible vector representation, ensuring that it can be effectively matched with the indexed vectors for data retrieval.

  4. Contextual Retrieval: Algorithms like Locality-Sensitive Hashing (LSH) are applied to find the closest matches between the user query and the indexed data vectors, staying within the model's token limitations.

  5. Text Generation: With the retrieved context, foundational LLMs like GPT-3.5 Turbo or Llama-2 employ techniques from the Transformer architecture, such as self-attention, to generate an appropriate response.

  6. User Interface: Finally, the generated text is presented to the user via interfaces like Streamlit or ChatGPT.

LLM Architecture Diagram to show how RAG works with Real-time or Static Data Source