Retrieval Augmented Generation

Retrieval Augmented Generation

Building a Q&A System for YouTube Videos with Python and GPT-3.5 (Jupyter Notebook)

Table of contents

No heading

No headings in the article.

This blog post guides you through creating a Jupyter Notebook that utilizes OpenAI's GPT-3.5 model to answer your questions about YouTube videos.

Why a Jupyter Notebook?

Jupyter Notebooks provide an interactive environment for Python coding, making it easier to experiment, visualize data,and explain your code.


  • Basic Python knowledge

  • An OpenAI API key (OpenAI)

Tools and Libraries

  • Python 3.x

  • openai library: pip install openai

  • pytube library (optional): pip install pytube

  • whisper library (optional): pip install whisper

The Notebook

  1. Set up Environmental variables:

    Create .env file to store API Keys

  1. Import Libraries:
import os
from dotenv import load_dotenv



  1. Setting up the model
from langchain_openai.chat_models import ChatOpenAI
model = ChatOpenAI(openai_api_key = OPENAI_API_KEY, model="gpt-3.5-turbo")
  1. Generate the transcription

    whisper from OpenAI

import tempfile
import whisper
from pytube import YouTube

# only if transcript file does not exist

if not os.path.exists("video_transcript.txt"):
    video = YouTube(YOUTUBE_VIDEO)
    audio = video.filter(only_audio=True).first()

    whisper_model = whisper.load_model("base")

    with tempfile.TemporaryDirectory() as tmpdir:
        file =
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()

        with open("video_transcript.txt", "w") as file:
  1. Test Generated Transcription
with open("video_transcript.txt") as file:
    transcription =

  1. Load Transcription
from langchain_community.document_loaders import TextLoader

loader = TextLoader("video_transcript.txt")
text_transcription = loader.load()
  1. Split Transcription

    Generally, the document is too large, splitting is required to handle it. Recursive Character Splitter, splits the document into chunks of a fixed size. Let's split the transcription into chunks of 100 characters with an overlap of 20 characters

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_transcription_documents = text_splitter.split_documents(text_transcription)
  1. Set up a Vector Store

We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a vector store.
A vector store is a database of embeddings that specializes in fast similarity searches.

from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(text_transcription_documents, embeddings)
  1. Use of Pinecone

Vector Store is an in-Memory vector store, we require a vector store that can handle large amounts of data and perform similarity searches at scale. For this example, we'll use Pinecone, create an account, set up an index, get an API key, and set it as an environment variable PINECONE_API_KEY, then, we can load the transcription documents into Pinecone:

from langchain_pinecone import PineconeVectorStore


pinecone = PineconeVectorStore.from_documents(
    text_transcription_documents, embeddings, index_name=index_name
  1. Define the Chain

    Chain for processing questions and answers using a language model and potentially retrieval system.

from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate
template = """
Answer the question based on the context below.  
If you can't answer the question, reply "I don't know"
Context: {context}
Question : {question}
prompt  = ChatPromptTemplate.from_template(template)
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
  • This chain involves potentially retrieving information (pinecone.as_retriever()) for the context and uses RunnablePassthrough to keep the question unchanged.

  • The retrieved context (context) and the question (question) are combined as a dictionary in the first step.

  • The dictionary is then piped (|) to the prompt object, which will use the context and question to generate the final prompt for the language model.

  • The generated prompt is then piped to the language model (model) for processing.

  • Finally, the model's output is piped to the output parser (parser) to be interpreted as a string.

  1. Ask

    chain.invoke("Ask your question?")


This Blog post demonstrates a basic framework for building a question-answering system for YouTube videos using Python and GPT-3.5. Experiment with the prompt template and explore additional functionalities to enhance this system!

Did you find this article valuable?

Support Nestor Rojas by becoming a sponsor. Any amount is appreciated!