💪
3 Week Bootcamp: Building Realtime LLM Application
  • Introduction
    • Timelines and Structure
    • Course Syllabus
    • Meet your Instructors
    • Action Items
  • Basics of LLM
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of Large Language Models
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors Simplified
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
      • Neural Networks and Transformers (Bonus Module)
      • Attention and Transformers (Bonus Module)
      • Multi-Head Attention, Transformers Architecture, and Further Reads (Bonus Module)
    • Graded Quiz 1
  • Prompt Engineering
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • Best Practices to Follow in Prompt Engineering
    • Token Limits in Prompts
    • Ungraded Prompt Engineering Excercise
      • Story for the Excercise: The eSports Enigma
      • Your Task
  • Retrieval Augmented Generation and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-Trained and Fine-Tuned LLMs
    • In-Context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • LLM Architecture Diagram and Various Steps
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in Retrieval-Augmented Generation (RAG)
    • Key Benefits of RAG for Enterprise-Grade LLM Applications
    • Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search in Vector Embeddings (Bonus Module)
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites
    • Dropbox Retrieval App in 15 Minutes
      • Building the app without Dockerization
      • Understanding Docker
      • Building the Dockerized App
    • Amazon Discounts App
      • How the Project Works
      • Repository Walkthrough
    • How to Run 'Examples'
  • Bonus Resource: Recorded Interactions from the Archives
  • Bootcamp Keynote Session on Vision Transformers
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Tracks for Submission
    • Final Submission
Powered by GitBook
On this page
  • Step 1: Cloning the Repository
  • Step 2: Setting up Environment Variables
  • Step 3 - Optional: Creating a Virtual Environment
  • Step 4 - Installing the Dependencies
  • Step 5 - Running the App
  • Step 6 - Launching UI with Streamlit
  • Connecting the Dots
  1. Hands-on Development
  2. Dropbox Retrieval App in 15 Minutes

Building the app without Dockerization

PreviousDropbox Retrieval App in 15 MinutesNextUnderstanding Docker

Last updated 1 year ago

Next, you can explore a video tutorial by Richard Pelgrim, a Developer Advocate in the stream data processing space, demonstrating how they harnessed the Dropbox document sync application to create a RAG app.

Link to the Project

  • The repository being referred to can be found here - . Make sure to star it.

  • If you struggle to build the application with the help of README on the GitHub repo above, the video and the description below should help you with it.

Navigating the maze of new regulations, like the EU AI Act, can be a complex challenge for founders and data practitioners. This app which leverages the Dropbox example, aims to make understanding these regulations more straightforward. Imagine a tool that helps you dissect and comprehend these intricate policies, easing the process of staying compliant and informed.

As you explore this application, think of the diverse scenarios you can open just with the Dropbox AI Chat example that we're seeing here.

Step 1: Cloning the Repository

git clone https://github.com/pathway-labs/dropbox-ai-chat 
cd dropbox-ai-chat

Step 2: Setting up Environment Variables

Create a .env file in the root directory and populate it with your configurations. Make sure to replace {OPENAI_API_KEY} with your actual OpenAI API key.

OPENAI_API_TOKEN={OPENAI_API_KEY}
HOST=0.0.0.0
PORT=8080
EMBEDDER_LOCATOR=text-embedding-ada-002
EMBEDDING_DIMENSION=1536
MODEL_LOCATOR=gpt-3.5-turbo
MAX_TOKENS=200
TEMPERATURE=0.0
DROPBOX_LOCAL_FOLDER_PATH="../../../mnt/c/Users/bumur/Dropbox/documents"

Make sure to replace DROPBOX_LOCAL_FOLDER_PATH with your local Dropbox folder path and optionally, you customize other values.

Step 3 - Optional: Creating a Virtual Environment

In this case, Richard has used Conda so it's not necessary. To create an isolated environment, execute:

# Setting Up a Virtual Environment

# On macOS and Linux:
# Create and activate a virtual environment
python -m venv pw-env && source pw-env/bin/activate

# On Windows:
# Step 1: Create a new virtual environment in a folder named 'pw-env'
python -m venv pw-env

# Step 2: Activate the virtual environment
pw-env\Scripts\activate

Step 4 - Installing the Dependencies

pip install --upgrade -r requirements.txt

Step 5 - Running the App

Navigate to the root directory and execute main.py.

python main.py

Step 6 - Launching UI with Streamlit

Run the Streamlit app using the following command:

streamlit run ui.py

Access the UI at http://localhost:8501/ on your browser.

Connecting the Dots

  • The prompt is processed as embeddings and used as embedded_query.

  • The data we're getting from our data source, (i.e. Dropbox) is converted into smaller chunks with the help of Pathway (pw) and then converted to embeddings and stored in index.

  • Using these, we're creating the augmented prompt with the help of retrieved information and feeding that into GPT-3.5 turbo.

# Real-time data coming from external unstructured data sources like a PDF file
input_data = pw.io.fs.read(
    dropbox_folder_path,
    mode="streaming",
    format="binary",
    autocommit_duration_ms=50,
)

# Chunk input data into smaller documents
documents = input_data.select(texts=extract_texts(pw.this.data))
documents = documents.select(chunks=chunk_texts(pw.this.texts))
documents = documents.flatten(pw.this.chunks).rename_columns(chunk=pw.this.chunks)

# Compute embeddings for each document using the OpenAI Embeddings API
embedded_data = embeddings(context=documents, data_to_embed=pw.this.chunk)

# Construct an index on the generated embeddings in real-time
index = index_embeddings(embedded_data)

# Generate embeddings for the query from the OpenAI Embeddings API
embedded_query = embeddings(context=query, data_to_embed=pw.this.query)

# Build prompt using indexed data
responses = prompt(index, embedded_query, pw.this.query)

# Feed the prompt to ChatGPT and obtain the generated answer.
response_writer(responses)

# Run the pipeline
pw.run()

By following these steps, you should be able to get the Dropbox AI Chat tool up and running.

However, if you're facing some issues in downloading the dependencies or running the application on your machine, it might be worthwhile to check the next module which gives a comprehensive guide for implementing this through Docker.

Let's analyze a few elements from this video together and get a good grasp on it. But remember, using Docker, which we'll cover in the , is a great way to make things smoother and avoid tricky errors. We're excited to walk you through Docker soon (hopefully in an easy to understand way) – and, it's going to be really helpful!

With this, your app should be up and running.

If you look closely at the repo and visit , you'll be able to connect the dots from what we've learned so far. Here:

😄
next submodule
api.py
⭐
https://github.com/pathway-labs/dropbox-ai-chat