Skip to main content

Command Palette

Search for a command to run...

Day 4 β€” How Retrieval Works Inside a RAG System

Updated
β€’6 min read
Day 4 β€” How Retrieval Works Inside a RAG System
U

Hi I am Harsha vardhan upadrasta, a 25 year old Web developer, ui/ux designer, and a bug hunter living in Draksharama, India. I am a Computer Science Engineer, currently working with awesome folks at _VOIS.

πŸš€ Advanced AI Engineering Series

In the previous articles of this series, we explored two powerful building blocks behind modern AI systems:

🧠 Embeddings β€” which convert text into numerical vectors representing meaning
πŸ—‚ Vector Databases β€” which store those vectors and enable similarity search

But an important question still remains:

When you ask an AI a question, how does it actually find the right information?

This is where a critical concept comes into play:

πŸ”Ž Retrieval

Retrieval is the engine that allows AI systems to search large knowledge bases and fetch relevant information before answering.

Without retrieval, even the smartest AI model is just guessing based on training data.

With retrieval, AI becomes knowledge-aware.

Let’s understand how this works.


🧠 What Is Retrieval in RAG?

RAG stands for:

Retrieval Augmented Generation

Instead of asking an AI model to generate an answer purely from what it learned during training, a RAG system first searches for relevant information from external sources.

These sources can include:

πŸ“„ company documents
πŸ“š knowledge bases
πŸ“‘ PDFs and manuals
πŸ’» developer documentation
πŸ₯ medical guidelines
🏒 enterprise databases

Once the relevant information is retrieved, it is sent to the language model, which uses it to generate a more accurate response.

In simple terms, the process looks like this:

User Question
      ↓
Convert question to embedding
      ↓
Search vector database
      ↓
Retrieve relevant documents
      ↓
Send context to LLM
      ↓
Generate final answer

This allows AI systems to access real, up-to-date knowledge instead of relying only on training data.


βš™οΈ The Retrieval Pipeline (Step by Step)

Let’s walk through how a RAG system works internally.


πŸ§‘β€πŸ’» Step 1 β€” User asks a question

Everything begins with a user query.

Example:

β€œHow can we optimize slow SQL queries?”

The system now needs to search across thousands (or even millions) of documents to find useful information.


🧠 Step 2 β€” Convert the question into an embedding

The user question is converted into a vector embedding.

This vector represents the meaning of the question.

Instead of searching using exact keywords, the system now searches using semantic similarity.

This allows the system to understand queries like:

β€’ "Fix slow SQL queries"

β€’ "Improve database performance"

β€’ "Optimize database response time"

Even though the wording is different, the meaning is similar.


πŸ”Ž Step 3 β€” Search the vector database

The query embedding is now sent to a vector database.

The database compares the query vector with stored document vectors and finds the most similar ones.

This process is called:

✨ Similarity Search

Even if the database contains millions of documents, this search happens in milliseconds.

The system might retrieve:

πŸ“„ Top 3 documents

πŸ“„ Top 5 documents

πŸ“„ Top 10 documents

These documents become the context for the AI model.


πŸ€– Step 4 β€” Send retrieved context to the LLM

Now the retrieved documents are passed to the Large Language Model (LLM).

Instead of answering blindly, the model now reads the retrieved information first.

Think of it like giving the AI reference material before answering the question.


πŸ’‘ Step 5 β€” Generate the final response

Finally, the AI model generates an answer based on the retrieved information.

Because the response is grounded in real data, it becomes:

βœ… more accurate

βœ… more relevant

βœ… less likely to hallucinate

This is why RAG has become the standard architecture for enterprise AI systems.


🏒 Real-Life Example: Company Knowledge Assistant

Imagine a large company with thousands of internal documents:

πŸ“„ HR policies

πŸ“˜ onboarding guides

πŸ’» technical documentation

πŸ›  troubleshooting manuals

πŸ“Š internal tools documentation

Now imagine a new employee asks the company AI assistant:

β€œHow do I apply for leave in the HR portal?”

Without RAG, the AI might respond with a generic answer based on general HR knowledge.

But with RAG, the system works differently.


Step 1

The question is converted into an embedding vector.


Step 2

The system searches the vector database containing company documents.


Step 3

It retrieves relevant documents like:

πŸ“„ HR leave policy

πŸ“„ HR portal user guide

πŸ“„ internal workflow documentation


Step 4

These documents are passed to the AI model as context.


Step 5

The AI generates a precise answer based on the company’s actual documentation.

Instead of guessing, the AI is reading the company knowledge base before answering.

That’s the power of retrieval-augmented AI.


🌍 Another Real-Life Example: Customer Support AI

Imagine you contact a bank’s support chatbot and ask:

β€œHow do I reset my internet banking password?”

Behind the scenes, the AI does this:

πŸ”Ž Searches internal support documentation

πŸ“„ Retrieves password reset instructions

πŸ“š Sends those instructions to the AI model

πŸ’¬ Generates a clear response for the user

This is why modern customer support AI systems are becoming faster, smarter, and more accurate.


⚑ Why Retrieval Is So Important

Retrieval is the reason modern AI systems are far more powerful than traditional chatbots.

It provides several major advantages.


πŸ” Access to Private Knowledge

AI can retrieve information from internal company databases and documents.


πŸ”„ Real-Time Knowledge Updates

New documents can be added to the vector database without retraining the AI model.


🧠 Reduced Hallucinations

Because the AI answers using retrieved context, it is less likely to invent incorrect information.


πŸ“ˆ Massive Scalability

RAG systems can search across millions of documents instantly.


πŸš€ Where Retrieval Is Used Today

Many modern AI systems rely heavily on retrieval pipelines.

Examples include:

πŸ€– enterprise AI assistants

πŸ“ž customer support bots

πŸ“š internal documentation search tools

βš–οΈ legal research systems

πŸ₯ medical knowledge assistants

πŸ’» developer documentation assistants

These systems depend on fast and intelligent retrieval pipelines.


πŸ’­ Final Thoughts

Large Language Models are incredibly powerful.

But without access to the right information, their usefulness becomes limited.

Retrieval solves this problem.

By combining:

🧠 Embeddings

πŸ—‚ Vector Databases

πŸ€– Language Models

RAG systems connect AI with real-world knowledge sources.

This is why retrieval has become one of the most important components in modern AI architectures.


πŸš€ Coming Next

Day 5 β€” Chunking: The Secret to Better RAG Performance

We’ll explore how breaking documents into smaller pieces dramatically improves AI retrieval accuracy.