RAG (Retrieval-Augmented Generation) Implementation Using Google Gemini & FAISS
RAG (Retrieval-Augmented Generation) is one of the most important concepts in modern AI applications. It combines:
Retrieval systems (searching relevant information)
Large Language Models (LLMs) (generating intelligent responses)
Instead of depending only on the LLM’s training knowledge, RAG allows the AI to search custom documents and answer questions from them.
Your project uses:
LangChain
Google Gemini API
HuggingFace Embeddings
FAISS Vector Database
to create a simple RAG chatbot inside Google Colab.
What This Project Does
The workflow is:
User Question
↓
Search Relevant Text Chunks
↓
Send Context + Question to Gemini
↓
Generate Accurate Answer
Example:
You provide Python tutorial text.
User asks:
What is Python used for?
The system:
Finds relevant chunks related to Python usage
Sends them to Gemini
Gemini answers using only retrieved context
Step-by-Step Explanation
1. Installing Required Libraries
!pip install langchain langchain-google-genai langchain faiss-cpu langchain-text-splitters langchain-community
Explanation
This command installs all required Python libraries.
Libraries Used
| Library | Purpose |
|---|---|
| langchain | Framework for building LLM apps |
| langchain-google-genai | Connects Gemini AI with LangChain |
| faiss-cpu | Vector database for similarity search |
| langchain-text-splitters | Splits large text into chunks |
| langchain-community | Community integrations |
| sentence-transformers | Embedding models |
2. Importing Required Modules
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import PromptTemplate
Explanation
These imports provide all core functionalities.
Modules
| Module | Purpose |
|---|---|
| os | Access environment variables |
| ChatGoogleGenerativeAI | Gemini LLM integration |
| HuggingFaceEmbeddings | Converts text into vectors |
| FAISS | Stores vectors for searching |
| RecursiveCharacterTextSplitter | Splits large text |
| PromptTemplate | Creates custom prompts |
3. Setting Gemini API Key
os.environ["GEMINI_API_KEY"] = "your_api_key"
Explanation
This sets the Gemini API key as an environment variable.
Gemini requires authentication to access Google's AI models.
You replace:
"your_api_key"
with your actual API key.
4. Input Text
Your project uses text data containing Python tutorial content.
This acts as the knowledge base for the RAG system.
5. Splitting Text into Chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size = 200,
chunk_overlap = 50
)
Explanation
LLMs cannot efficiently process huge text directly.
So we split text into smaller chunks.
Parameters
| Parameter | Meaning |
|---|---|
| chunk_size=200 | Each chunk contains 200 characters |
| chunk_overlap=50 | 50 characters overlap between chunks |
Why Overlap?
Overlap preserves context continuity.
Without overlap:
sentences may break awkwardly
information may be lost
6. Creating Documents
docs = splitter.create_documents([text])
Explanation
This converts text chunks into LangChain document objects.
Each document contains:
page content
metadata
Example:
Chunk 1 → Python is a programming language...
Chunk 2 → Used in AI, automation...
7. Creating Embeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
Explanation
Embeddings convert text into numerical vectors.
AI cannot understand raw text directly.
So:
"Python programming"
becomes:
[0.234, 0.872, -0.192, ...]
These vectors help measure semantic similarity.
Model Used
all-MiniLM-L6-v2
A lightweight and fast sentence transformer model.
8. Creating FAISS Vector Store
vectorstore = FAISS.from_documents(docs, embeddings)
Explanation
FAISS stores vector embeddings efficiently.
It acts like:
AI-powered search engine
Now the system can:
search similar text
retrieve relevant chunks quickly
9. Creating Retriever
retriver = vectorstore.as_retriever(
search_kwargs = {"k":3}
)
Explanation
Retriever searches the vector database.
Parameter
"k":3
means:
return top 3 most relevant chunks
when user asks a question.
10. Formatting Retrieved Documents
def format_docs(docs):
return "\n\n".join(
d.page_content for d in docs
)
Explanation
This function combines retrieved chunks into one formatted context string.
Example:
Chunk 1
Chunk 2
Chunk 3
11. Creating Prompt Template
prompt = PromptTemplate(
input_variables = ["context","question"],
template = """
Use only the context below to answer the question.
context:
{context}
question:
{question}
Answer:
"""
)
Explanation
This controls how instructions are sent to Gemini.
Important Part
Use only the context below
This reduces hallucination.
Gemini will answer ONLY from retrieved text.
12. Initializing Gemini Model
llm = ChatGoogleGenerativeAI(
model = "gemini-2.5-flash"
)
Explanation
This initializes Google's Gemini Flash model.
Why Gemini Flash?
fast
lightweight
cheaper
good for RAG systems
13. Creating Main RAG Function
def rag_answer(query):
r_docs = retriver.invoke(query)
context = format_docs(r_docs)
full_prompt = prompt.format(
context=context,
question=query
)
response = llm.invoke(full_prompt)
return response.content
Step-by-Step Flow
Step 1
retriver.invoke(query)
Searches relevant chunks.
Step 2
format_docs(r_docs)
Formats retrieved context.
Step 3
prompt.format()
Creates final prompt.
Step 4
llm.invoke()
Sends prompt to Gemini.
Step 5
Returns generated answer.
14. Asking Question
query = "What is python used for ?"
answer = rag_answer(query)
Explanation
The user asks a question.
RAG system processes it and generates response.
15. Printing Output
print("question :", query)
print("\n")
print("answer", answer)
Explanation
Displays:
question
generated answer
16. Final Output
Question:
What is python used for?
Answer:
Python is used for development, and its libraries help with a wide range of tasks, making development easier.
How RAG Works Internally
User Query
↓
Embedding Generated
↓
FAISS Similarity Search
↓
Relevant Chunks Retrieved
↓
Context Added to Prompt
↓
Gemini Generates Answer
Advantages of RAG
| Advantage | Description |
|---|---|
| Reduces hallucination | Uses actual context |
| Supports private data | Can use custom documents |
| Real-time knowledge | New docs can be added |
| Faster than fine-tuning | No retraining needed |
| Scalable | Works with huge datasets |
Real-World Applications
RAG is used in:
ChatGPT-style chatbots
AI search engines
company knowledge assistants
PDF question-answering systems
customer support bots
legal document assistants
medical AI systems
Common Improvements
You can improve this project by adding:
| Upgrade | Benefit |
|---|---|
| PDF Upload | Read PDFs |
| Streamlit UI | Web interface |
| Chat History | Memory |
| Better Embeddings | More accurate search |
| Pinecone/ChromaDB | Cloud vector DB |
| Multi-document support | Multiple files |
| OCR | Read images |
Final Understanding
This project demonstrates:
Generative AI
Semantic Search
Vector Databases
LLM Prompt Engineering
Embeddings
Retrieval Systems
which are core technologies behind modern AI systems like:
ChatGPT
Perplexity AI
GitHub Copilot
AI Search Engines
It is actually a very strong GenAI project for portfolio and interviews.