Friday, May 22, 2026

AI Retrieval System RAG_Implementation Project

 

RAG (Retrieval-Augmented Generation) Implementation Using Google Gemini & FAISS

RAG (Retrieval-Augmented Generation) is one of the most important concepts in modern AI applications. It combines:

  • Retrieval systems (searching relevant information)

  • Large Language Models (LLMs) (generating intelligent responses)

Instead of depending only on the LLM’s training knowledge, RAG allows the AI to search custom documents and answer questions from them.

Your project uses:

  • LangChain

  • Google Gemini API

  • HuggingFace Embeddings

  • FAISS Vector Database

to create a simple RAG chatbot inside Google Colab.


What This Project Does

The workflow is:

User Question ↓ Search Relevant Text Chunks ↓ Send Context + Question to Gemini ↓ Generate Accurate Answer

Example:

You provide Python tutorial text.

User asks:

What is Python used for?

The system:

  1. Finds relevant chunks related to Python usage

  2. Sends them to Gemini

  3. Gemini answers using only retrieved context


Step-by-Step Explanation


1. Installing Required Libraries

!pip install langchain langchain-google-genai langchain faiss-cpu langchain-text-splitters langchain-community

Explanation

This command installs all required Python libraries.

Libraries Used

LibraryPurpose
langchainFramework for building LLM apps
langchain-google-genaiConnects Gemini AI with LangChain
faiss-cpuVector database for similarity search
langchain-text-splittersSplits large text into chunks
langchain-communityCommunity integrations
sentence-transformersEmbedding models

2. Importing Required Modules

import os from langchain_google_genai import ChatGoogleGenerativeAI from langchain_community.embeddings import HuggingFaceEmbeddings from langchain_community.vectorstores import FAISS from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_core.prompts import PromptTemplate

Explanation

These imports provide all core functionalities.

Modules

ModulePurpose
osAccess environment variables
ChatGoogleGenerativeAIGemini LLM integration
HuggingFaceEmbeddingsConverts text into vectors
FAISSStores vectors for searching
RecursiveCharacterTextSplitterSplits large text
PromptTemplateCreates custom prompts

3. Setting Gemini API Key

os.environ["GEMINI_API_KEY"] = "your_api_key"

Explanation

This sets the Gemini API key as an environment variable.

Gemini requires authentication to access Google's AI models.

You replace:

"your_api_key"

with your actual API key.


4. Input Text

Your project uses text data containing Python tutorial content.

This acts as the knowledge base for the RAG system.


5. Splitting Text into Chunks

splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap = 50
)

Explanation

LLMs cannot efficiently process huge text directly.

So we split text into smaller chunks.

Parameters

ParameterMeaning
chunk_size=200Each chunk contains 200 characters
chunk_overlap=5050 characters overlap between chunks

Why Overlap?

Overlap preserves context continuity.

Without overlap:

  • sentences may break awkwardly

  • information may be lost


6. Creating Documents

docs = splitter.create_documents([text])

Explanation

This converts text chunks into LangChain document objects.

Each document contains:

  • page content

  • metadata

Example:

Chunk 1 → Python is a programming language... Chunk 2 → Used in AI, automation...

7. Creating Embeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Explanation

Embeddings convert text into numerical vectors.

AI cannot understand raw text directly.

So:

"Python programming"

becomes:

[0.234, 0.872, -0.192, ...]

These vectors help measure semantic similarity.

Model Used

all-MiniLM-L6-v2

A lightweight and fast sentence transformer model.


8. Creating FAISS Vector Store

vectorstore = FAISS.from_documents(docs, embeddings)

Explanation

FAISS stores vector embeddings efficiently.

It acts like:

AI-powered search engine

Now the system can:

  • search similar text

  • retrieve relevant chunks quickly


9. Creating Retriever

retriver = vectorstore.as_retriever(
    search_kwargs = {"k":3}
)

Explanation

Retriever searches the vector database.

Parameter

"k":3

means:

  • return top 3 most relevant chunks

when user asks a question.


10. Formatting Retrieved Documents

def format_docs(docs): return "\n\n".join( d.page_content for d in docs )

Explanation

This function combines retrieved chunks into one formatted context string.

Example:

Chunk 1

Chunk 2

Chunk 3

11. Creating Prompt Template

prompt = PromptTemplate( input_variables = ["context","question"], template = """ Use only the context below to answer the question. context: {context} question: {question} Answer: """ )

Explanation

This controls how instructions are sent to Gemini.

Important Part

Use only the context below

This reduces hallucination.

Gemini will answer ONLY from retrieved text.


12. Initializing Gemini Model

llm = ChatGoogleGenerativeAI(
    model = "gemini-2.5-flash"
)

Explanation

This initializes Google's Gemini Flash model.

Why Gemini Flash?

  • fast

  • lightweight

  • cheaper

  • good for RAG systems


13. Creating Main RAG Function

def rag_answer(query): r_docs = retriver.invoke(query) context = format_docs(r_docs) full_prompt = prompt.format( context=context, question=query ) response = llm.invoke(full_prompt) return response.content

Step-by-Step Flow

Step 1

retriver.invoke(query)

Searches relevant chunks.


Step 2

format_docs(r_docs)

Formats retrieved context.


Step 3

prompt.format()

Creates final prompt.


Step 4

llm.invoke()

Sends prompt to Gemini.


Step 5

Returns generated answer.


14. Asking Question

query = "What is python used for ?" answer = rag_answer(query)

Explanation

The user asks a question.

RAG system processes it and generates response.


15. Printing Output

print("question :", query) print("\n") print("answer", answer)

Explanation

Displays:

  • question

  • generated answer


16. Final Output

Question:
What is python used for?

Answer:
Python is used for development, and its libraries help with a wide range of tasks, making development easier.

How RAG Works Internally

User Query ↓ Embedding Generated ↓ FAISS Similarity Search ↓ Relevant Chunks Retrieved ↓ Context Added to Prompt ↓ Gemini Generates Answer

Advantages of RAG

AdvantageDescription
Reduces hallucinationUses actual context
Supports private dataCan use custom documents
Real-time knowledgeNew docs can be added
Faster than fine-tuningNo retraining needed
ScalableWorks with huge datasets

Real-World Applications

RAG is used in:

  • ChatGPT-style chatbots

  • AI search engines

  • company knowledge assistants

  • PDF question-answering systems

  • customer support bots

  • legal document assistants

  • medical AI systems


Common Improvements

You can improve this project by adding:

UpgradeBenefit
PDF UploadRead PDFs
Streamlit UIWeb interface
Chat HistoryMemory
Better EmbeddingsMore accurate search
Pinecone/ChromaDBCloud vector DB
Multi-document supportMultiple files
OCRRead images

Final Understanding

This project demonstrates:

  • Generative AI

  • Semantic Search

  • Vector Databases

  • LLM Prompt Engineering

  • Embeddings

  • Retrieval Systems

which are core technologies behind modern AI systems like:

  • ChatGPT

  • Perplexity AI

  • GitHub Copilot

  • AI Search Engines

It is actually a very strong GenAI project for portfolio and interviews.

AI Retrieval System RAG_Implementation Project

  RAG (Retrieval-Augmented Generation) Implementation Using Google Gemini & FAISS RAG (Retrieval-Augmented Generation) is one of the mos...