Building the app without Dockerization

Next, you can explore a video tutorial by Richard Pelgrim, a Developer Advocate in the stream data processing space, demonstrating how they harnessed the Dropbox document sync application to create a RAG app.

Link to the Project

  • If you struggle to build the application with the help of README on the GitHub repo above, the video and the description below should help you with it.

Navigating the maze of new regulations, like the EU AI Act, can be a complex challenge for founders and data practitioners. This app which leverages the Dropbox example, aims to make understanding these regulations more straightforward. Imagine a tool that helps you dissect and comprehend these intricate policies, easing the process of staying compliant and informed.

As you explore this application, think of the diverse scenarios you can open just with the Dropbox AI Chat example that we're seeing here.

Let's analyze a few elements from this video together and get a good grasp on it. But remember, using Docker, which we'll cover in the next submodule, is a great way to make things smoother and avoid tricky errors. We're excited to walk you through Docker soon (hopefully in an easy to understand way) – and, it's going to be really helpful!

Step 1: Cloning the Repository

git clone https://github.com/pathway-labs/dropbox-ai-chat 
cd dropbox-ai-chat

Step 2: Setting up Environment Variables

Create a .env file in the root directory and populate it with your configurations. Make sure to replace {OPENAI_API_KEY} with your actual OpenAI API key.

OPENAI_API_TOKEN={OPENAI_API_KEY}
HOST=0.0.0.0
PORT=8080
EMBEDDER_LOCATOR=text-embedding-ada-002
EMBEDDING_DIMENSION=1536
MODEL_LOCATOR=gpt-3.5-turbo
MAX_TOKENS=200
TEMPERATURE=0.0
DROPBOX_LOCAL_FOLDER_PATH="../../../mnt/c/Users/bumur/Dropbox/documents"

Make sure to replace DROPBOX_LOCAL_FOLDER_PATH with your local Dropbox folder path and optionally, you customize other values.

Step 3 - Optional: Creating a Virtual Environment

In this case, Richard has used Conda so it's not necessary. To create an isolated environment, execute:

# Setting Up a Virtual Environment

# On macOS and Linux:
# Create and activate a virtual environment
python -m venv pw-env && source pw-env/bin/activate

# On Windows:
# Step 1: Create a new virtual environment in a folder named 'pw-env'
python -m venv pw-env

# Step 2: Activate the virtual environment
pw-env\Scripts\activate

Step 4 - Installing the Dependencies

pip install --upgrade -r requirements.txt

Step 5 - Running the App

Navigate to the root directory and execute main.py.

python main.py

Step 6 - Launching UI with Streamlit

Run the Streamlit app using the following command:

streamlit run ui.py

Access the UI at http://localhost:8501/ on your browser.

Connecting the Dots

If you look closely at the repo and visit api.py , you'll be able to connect the dots from what we've learned so far. Here:

  • The prompt is processed as embeddings and used as embedded_query.

  • The data we're getting from our data source, (i.e. Dropbox) is converted into smaller chunks with the help of Pathway (pw) and then converted to embeddings and stored in index.

  • Using these, we're creating the augmented prompt with the help of retrieved information and feeding that into GPT-3.5 turbo.

# Real-time data coming from external unstructured data sources like a PDF file
input_data = pw.io.fs.read(
    dropbox_folder_path,
    mode="streaming",
    format="binary",
    autocommit_duration_ms=50,
)

# Chunk input data into smaller documents
documents = input_data.select(texts=extract_texts(pw.this.data))
documents = documents.select(chunks=chunk_texts(pw.this.texts))
documents = documents.flatten(pw.this.chunks).rename_columns(chunk=pw.this.chunks)

# Compute embeddings for each document using the OpenAI Embeddings API
embedded_data = embeddings(context=documents, data_to_embed=pw.this.chunk)

# Construct an index on the generated embeddings in real-time
index = index_embeddings(embedded_data)

# Generate embeddings for the query from the OpenAI Embeddings API
embedded_query = embeddings(context=query, data_to_embed=pw.this.query)

# Build prompt using indexed data
responses = prompt(index, embedded_query, pw.this.query)

# Feed the prompt to ChatGPT and obtain the generated answer.
response_writer(responses)

# Run the pipeline
pw.run()

By following these steps, you should be able to get the Dropbox AI Chat tool up and running.

However, if you're facing some issues in downloading the dependencies or running the application on your machine, it might be worthwhile to check the next module which gives a comprehensive guide for implementing this through Docker.

Last updated