RAG and Function Calling

RAG Pipeline

Retrieval-Augmented Generation: ground AI answers in your data

Stage 1: User Query

The natural language question or request

Stage 2: Embed

Convert query to a vector embedding using an embedding model

Stage 3: Retrieve

Search vector database for the most similar document chunks

Stage 4: Augment

Insert retrieved context into the prompt alongside the query

Stage 5: Generate

LLM generates answer grounded in the retrieved context

LLMs are limited by their training data, they don't know about your company's documents, your personal files, or events that happened after their training cutoff. RAG and function calling are the two techniques that solve this limitation, making LLMs far more useful in production.

RAG Pipeline Architecture

Five-stage Retrieval-Augmented Generation flow

Ingest

Split documents into chunks (500-1000 tokens each)

Embed

Convert chunks to vectors using an embedding model

Store

Index vectors in a vector database (Pinecone, Weaviate)

Query

Embed the user query, find nearest neighbors

Generate

Pass retrieved chunks as context to the LLM

RAG Pipeline

Retrieval-Augmented Generation: ground AI answers in your data

Stage 1: User Query

The natural language question or request

Stage 2: Embed

Convert query to a vector embedding using an embedding model

Stage 3: Retrieve

Search vector database for the most similar document chunks

Stage 4: Augment

Insert retrieved context into the prompt alongside the query

Stage 5: Generate

LLM generates answer grounded in the retrieved context

RAG Pipeline Architecture

Five-stage Retrieval-Augmented Generation flow

Ingest

Split documents into chunks (500-1000 tokens each)

Embed

Convert chunks to vectors using an embedding model

Store

Index vectors in a vector database (Pinecone, Weaviate)

Query

Embed the user query, find nearest neighbors

Generate

Pass retrieved chunks as context to the LLM

RAG Pipeline

What is RAG (Retrieval-Augmented Generation)?

Embeddings and Vector Databases

Function Calling (Tool Use)

Structured Outputs: Guaranteed JSON

RAG Pipeline Architecture

RAG and Function Calling

RAG Pipeline

What is RAG (Retrieval-Augmented Generation)?

Embeddings and Vector Databases

Function Calling (Tool Use)

Structured Outputs: Guaranteed JSON

RAG Pipeline Architecture