Create Your Personalized AI Assistant

The Personalization Revolution in AI

Imagine having your own AI assistant, uniquely yours, capable of understanding your specific needs and responding intelligently. Thanks to OpenAI’s ChatGPT and tools like LangChain, this is not a far-off dream but an achievable goal. In this blog, we’ll explore a Python project inspired by Frank Nillard | AI Studio on YouTube to create a personalized AI search assistant.

This blog is a learn-along tutorial, so feel free to follow along as we dissect the code step-by-step, make improvements for broader usability, and dive into fascinating topics like embeddings and retrieval. You can find the full code on GitHub repository

Disclaimer

This code is not my creation but an adaptation of work by Frank Nillard | AI Studio. This blog is intended for educational purposes, helping you understand the process while building a similar project.

Getting Started: Prerequisites

Before diving into the code, ensure you have the following:

Python 3.8 or later

An OpenAI API Key

The langchain, tkinter, and chromadb Python libraries

A text dataset for testing

Setup VS Code with GitHub (You can follow this post, if you are doing it for first time)

Directory Structure

To ensure compatibility on different systems, we’ll adjust the file paths dynamically later in the blog. For now, here’s how the directory should look:

my_project/
│
├── .env                # Contains your OPENAI_API_KEY
├── main.py             # Your main Python script
├── data/
│   └── data.txt        # Your data file
├── requirements.txt    # Your project dependencies
├── .gitignore          # Ensure '.env' is added to this file
└── README.md           # Documentation for your project

Step-by-Step Code Breakdown

Install Required Dependencies

To start, you’ll need to install several libraries required to run the application. First, make sure you have Python installed on your system, and then use the following requirements.txt to install the necessary packages:

python-dotenv
langchain
openai
tkinter
tiktoken

Run the following command to install the dependencies:

pip install -r requirements.txt

These libraries are essential for the project:

Langchain: A framework to build AI-powered applications using large language models (LLMs).

OpenAI: This package allows interaction with OpenAI’s GPT models.

Tkinter: Used to create the graphical user interface (GUI).

python-dotenv: To load sensitive information (like your OpenAI API key) from environment variables stored in the .env file.

tiktoken: A tokenizer library required by Langchain and OpenAI Embeddings.

Setup Your OpenAI API Key

For this project, we’ll be using OpenAI’s GPT-3.5 model. You will need an API key to interact with OpenAI’s services. Here’s how you can get your API key and keep it secure:

Go to OpenAI’s API page and create an account or sign in.

Navigate to the API keys section and create a new key.

Copy your API key.

Now, to keep your API key secure, you’ll store it in a .env file in your project directory. This is how your .env file should look:

OPENAI_API_KEY=your_actual_api_key_here

Make sure to replace your_actual_api_key_here with your key from OpenAI.

Understanding the Code

Here’s the main part of the code, where we integrate Langchain with OpenAI’s GPT to process your documents and queries. We’ll break it down step-by-step:

import os
import tkinter as tk
from tkinter import ttk
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from dotenv import load_dotenv  # Import dotenv

# Load environment variables from the .env file
load_dotenv()

# Get the API key from the environment
APIKEY = os.getenv("OPENAI_API_KEY")

Loading Environment Variables

We use the python-dotenv package to load environment variables from the .env file. This ensures that your API key remains private and not hard-coded into the application.
Tkinter for GUI

The following section sets up a simple GUI using Tkinter:

class AppleStyleApp:
    def __init__(self, root):
        self.root = root
        self.root.title("SearchDoc")
        self.root.geometry("400x300")

        title_label = ttk.Label(self.root, text="Welcome to SearchDoc!", font=("Helvetica", 18, "bold"))
        title_label.pack(pady=20)

        self.search_entry = ttk.Entry(self.root, font=("Helvetica", 14))
        self.search_entry.pack(padx=20, fill=tk.X)

        search_button = ttk.Button(self.root, text="Search Some Docs", command=self.perform_search)
        search_button.pack(pady=10)

        self.search_results = tk.Text(self.root, font=("Helvetica", 14), bg="black", fg="white", padx=10, pady=10)
        self.search_results.pack(padx=20, fill=tk.BOTH, expand=True)

This creates a simple app with a search field, button, and text area to display the results. When the user enters a search query, the app loads documents, processes them using OpenAI’s GPT, and displays the results.

Performing the Search

The perform_search method handles loading your documents, processing them, and retrieving relevant results using OpenAI’s GPT. Here’s a breakdown of how this method works:

def perform_search(self):
    try:
        query = self.search_entry.get()
        loader = TextLoader(os.path.join(os.getcwd(), "data", "data.txt"))
        documents = loader.load()

        text_splitter = CharacterTextSplitter(chunk_size=10000, chunk_overlap=0)
        texts = text_splitter.split_documents(documents)

        embeddings = OpenAIEmbeddings(openai_api_key=APIKEY)
        docsearch = Chroma.from_documents(texts, embeddings)

        qa = RetrievalQA.from_chain_type(
            llm=OpenAI(model="gpt-3.5-turbo-instruct", openai_api_key=APIKEY),
            chain_type="stuff", 
            retriever=docsearch.as_retriever()
        )

        qa_results = qa.run(query=query, n_results=2)
        self.search_results.delete(1.0, tk.END)

        if isinstance(qa_results, list):
            formatted_results = "\n\n".join(result.strip() for result in qa_results)
        else:
            formatted_results = str(qa_results).strip()

        self.search_results.insert(tk.END, formatted_results + "\n")
    except Exception as e:
        self.search_results.delete(1.0, tk.END)
        self.search_results.insert(tk.END, f"Error: {e}\n")

Document Loading & Text Splitting: The TextLoader reads documents from a text file (data.txt). The documents are then split into chunks to ensure the GPT model handles them efficiently.
Embedding with OpenAI: We use OpenAI embeddings to convert the text into vectors for semantic search.
Search and Retrieval: The RetrievalQA chain takes the search query, uses the model to generate responses, and returns relevant results based on the documents.

Running the Application

if __name__ == "__main__":
    root = tk.Tk()
    app = AppleStyleApp(root)
    root.mainloop()

The script initializes the GUI, allowing users to interact with the search assistant.

Final Ouput

In the screenshot above, you might notice an error message like “Exceeded your current quota.” This is proof that the code is working as expected and that you would need a paid OpenAI plan to continue making requests and receiving responses from the GPT model.

Final Note

The application we’ve built here demonstrates how to interact with OpenAI’s GPT-3.5 model using the Langchain library. By following along, you’ve learned how to integrate embeddings, search documents, and build a personalized AI agent.

Spread the love