Hernando Abella
TutorialRAGPythonVector Databases

Building a RAG System with Python: Step by Step

Learn how to build an AI assistant that answers questions using your own documents โ€” combining retrieval with generation for accurate, grounded responses.

๐Ÿ“– 15 min read๐Ÿง‘โ€๐Ÿ’ป Hernando Abella๐Ÿ” Intermediate
StackPythonOpenAI EmbeddingsLangChainHugging Face

Large Language Models only know what they were trained on. They cannot access your company's documents, PDFs, or internal knowledge bases โ€” unless you build a RAG system.

RAG combines information retrieval with AI generation, allowing a model to search relevant documents and use that information when generating responses. In this guide, you'll learn how RAG works and build your own application with Python.


What Is RAG?

RAG stands for Retrieval-Augmented Generation. Instead of asking an AI model to answer directly, a RAG system first searches a knowledge base for relevant information.

โ“User QuestionWhat is refund policy?
๐Ÿ“ŠQuestion EmbeddingConvert to vector
๐Ÿ”Similarity SearchFind relevant chunks
๐Ÿ“„Relevant ChunksRetrieved context
๐Ÿง LLMGenerate answer
โœจFinal AnswerGrounded response

Example:

Question:"What is our company's refund policy?"

Instead of guessing, the system โ†’ Searches company documents โ†’ Finds the refund policy โ†’ Sends it to the AI โ†’ Generates an accurate answer.

โŒ Without RAG
Question โ†’ LLM โ†’ Answer
โ€ข Hallucinations
โ€ข Outdated information
โ€ข No access to private data
โœ… With RAG
Question โ†’ Retriever โ†’ Documents โ†’ LLM โ†’ Answer
โ€ข More accurate responses
โ€ข Access to private knowledge
โ€ข Reduced hallucinations

Core Components of a RAG System

Documents
โ†’
Chunking
โ†’
Embeddings
โ†’
Vector Database
โ†’
Retriever
โ†’
LLM

Step 1: Prepare Documents

Every RAG system begins with data. Examples include:

PDFsDocumentationKnowledge basesProduct manualsSupport articlesCompany policies

Step 2: Split Documents into Chunks

Large documents must be divided into smaller pieces for efficient retrieval.

python ยท chunking.py
def chunk_text(text, chunk_size=200):
    chunks = []
    for i in range(0, len(text), chunk_size):
        chunks.append(text[i:i+chunk_size])
    return chunks

Why Chunking Matters:

100-page document โ†’ 500 chunks โ†’ Search only relevant chunks. This makes retrieval much faster and more precise.


Step 3: Generate Embeddings

python ยท embeddings.py
from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Python is a programming language."
)

embedding = response.data[0].embedding
print(f"Vector dimension: {len(embedding)}")

Step 4: Store Embeddings in a Vector Database

Popular options: Chroma, FAISS, Pinecone, Weaviate, Qdrant

terminal
pip install chromadb
python ยท vectorstore.py
import chromadb

client = chromadb.Client()
collection = client.create_collection(name="knowledge_base")

# Add documents
collection.add(
    documents=[
        "Python is a programming language.",
        "Machine learning uses data."
    ],
    ids=["1", "2"]
)

# Search
results = collection.query(
    query_texts=["How is Python used?"],
    n_results=2
)
print(results["documents"])

Step 5: Retrieve Relevant Documents

When a user asks a question, the system:

  1. Creates an embedding for the query
  2. Searches the vector database
  3. Finds the most similar chunks

Step 6: Send Context to the LLM

python ยท generate.py
from openai import OpenAI

client = OpenAI()

prompt = f"""
Context:
{context}

Question:
{question}

Answer using only the provided context.
"""

response = client.responses.create(
    model="gpt-4o",
    input=prompt
)

print(response.output_text)

Full RAG Pipeline

1
Documents
2
Chunking
3
Embeddings
4
Vector Database
โคต
โ†“
6
User Question
7
Question Embedding
8
Similarity Search
9
Relevant Chunks
10
LLM
11
Final Answer

Example Project Structure

Project Structure
rag-project/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ docs/
โ”‚   โ”‚   โ”œโ”€โ”€ guide.pdf
โ”‚   โ”‚   โ””โ”€โ”€ policies.txt
โ”œโ”€โ”€ embeddings/
โ”‚   โ””โ”€โ”€ build_embeddings.py
โ”œโ”€โ”€ vectorstore/
โ”‚   โ””โ”€โ”€ chroma_db/
โ”œโ”€โ”€ rag/
โ”‚   โ”œโ”€โ”€ retrieve.py
โ”‚   โ”œโ”€โ”€ generate.py
โ”‚   โ””โ”€โ”€ pipeline.py
โ”œโ”€โ”€ app.py
โ””โ”€โ”€ requirements.txt

Improving Retrieval Quality

๐Ÿ“ฆ Better Chunking
Paragraph-based or semantic chunks instead of fixed sizes.
๐Ÿท๏ธ Metadata Filtering
Store source, department, date โ€” filter before search.
๐Ÿ”€ Hybrid Search
Combine vector search + keyword search for accuracy.

Common Challenges

๐Ÿ“
Poor Chunk Sizes

Too large = low precision. Too small = missing context.

๐ŸŽญ
Hallucinations

Model may still invent facts โ€” enforce context-only answers.

๐Ÿ”„
Duplicate Results

Multiple chunks with similar info โ€” use reranking.


Real-World RAG Use Cases

๐ŸŽง
Customer Support
Search product documentation before answering.
๐Ÿข
Enterprise KB
Access internal company documents.
โš–๏ธ
Legal Research
Retrieve contracts and regulations.
๐Ÿฅ
Medical Systems
Search approved clinical documentation.
๐Ÿ“š
Educational Platforms
Answer questions from course materials.
๐Ÿ”
AI Search Engines
Combine retrieval with natural language responses.

Key Takeaways

  • โ†’ RAG combines document retrieval with AI generation.
  • โ†’ Documents are split into chunks and converted into embeddings.
  • โ†’ Embeddings are stored in a vector database for fast similarity search.
  • โ†’ Retrieved documents are sent to the LLM as context.
  • โ†’ The model generates answers grounded in real information.

A well-designed RAG system is often one of the most practical and impactful AI applications you can build. It allows organizations to transform their documents into intelligent assistants that deliver accurate, context-aware answers on demand.


๐Ÿ“˜ Ready to go deeper?

Generative AI with Python

Master RAG pipelines, AI agents, tool calling, vector databases, and multimodal systems โ€” with hands-on code throughout.

๐Ÿ” RAG & Vector DBs๐Ÿค– AI Agents๐Ÿ›  Tool Calling๐Ÿ–ผ Multimodal AI
Get it on Amazon โ†’
Generative AI with Python book cover