6. RAG & Functions
RAG Pipeline
Retrieval-Augmented Generation: ground AI answers in your data
The natural language question or request
Convert query to a vector embedding using an embedding model
Search vector database for the most similar document chunks
Insert retrieved context into the prompt alongside the query
LLM generates answer grounded in the retrieved context
LLMs are limited by their training data, they don't know about your company's documents, your personal files, or events that happened after their training cutoff. RAG and function calling are the two techniques that solve this limitation, making LLMs far more useful in production.
RAG Pipeline Architecture
Five-stage Retrieval-Augmented Generation flow
Split documents into chunks (500-1000 tokens each)
Convert chunks to vectors using an embedding model
Index vectors in a vector database (Pinecone, Weaviate)
Embed the user query, find nearest neighbors
Pass retrieved chunks as context to the LLM