veda.ng
Module 6 of 7

6. RAG & Functions

RAG Pipeline

Retrieval-Augmented Generation: ground AI answers in your data

Stage 1: User Query

The natural language question or request

Stage 2: Embed

Convert query to a vector embedding using an embedding model

Stage 3: Retrieve

Search vector database for the most similar document chunks

Stage 4: Augment

Insert retrieved context into the prompt alongside the query

Stage 5: Generate

LLM generates answer grounded in the retrieved context

LLMs are limited by their training data, they don't know about your company's documents, your personal files, or events that happened after their training cutoff. RAG and function calling are the two techniques that solve this limitation, making LLMs far more useful in production.

RAG Pipeline Architecture

Five-stage Retrieval-Augmented Generation flow

1
Ingest

Split documents into chunks (500-1000 tokens each)

2
Embed

Convert chunks to vectors using an embedding model

3
Store

Index vectors in a vector database (Pinecone, Weaviate)

4
Query

Embed the user query, find nearest neighbors

5
Generate

Pass retrieved chunks as context to the LLM