# Introduction to Retrieval

Large language models like GPT-4, Claude, or Llama can write code, answer questions, generate content, and solve complex problems with remarkable sophistication. You can build chatbots, content generators, code assistants, and analysis tools by crafting good prompts and calling AI APIs.

However, as you start to build more sophisticated applications, especially those that require knowledge not available to an LLM, you'll quickly run into some fundamental limitations.

## The Knowledge Problem

Imagine you're building an AI assistant for a company's customer support team. You want the AI to answer questions about your products, policies, and procedures.

Given a simple prompt
> What is your return policy for tooth brushes?

The AI is likely to return a generic answer, because it doesn't know about your company's specific policies:
> I'm not a retailer or company that sells toothbrushes - I'm Claude, an AI assistant created by Anthropic. I don't have products or a return policy. If you're looking for information about a toothbrush return policy, you'd need to check with the specific store or company where you purchased or are planning to purchase the toothbrush. Most retailers have their return policies listed on their websites or you can contact their customer service directly.

We can get around this problem, by providing useful information or context for the AI. This often called "context engineering":
> What is your return policy for tooth brushes? Use the following information for context: Tooth brushes can be returned up to 360 days after purchase if unopened.

Now the AI has the information it needs to answer the user question correctly:
> Based on the information provided, our return policy for toothbrushes allows returns up to 360 days after purchase, as long as the toothbrush remains unopened and in its original packaging. This gives you nearly a full year to return an unopened toothbrush if needed. Once a toothbrush has been opened, it would not be eligible for return under this policy.

But what happens if we want our AI chatbot to be able to answer questions about shipping, product specifications, troubleshooting, or any other topic? We will have to include our entire knowledge base in our context, which leads to several technical problems.

**Token Limits:** AI models have maximum input lengths. Even the largest models might not be able to fit an entire company's documentation in a single prompt.

**Cost:** AI APIs typically charge per token. Including thousands of words of context in every request becomes expensive quickly.

**Relevance:** When you include too much information, the AI can get confused or focus on irrelevant details instead of what actually matters for answering the user's question.

**Freshness:** Information changes constantly. Product specs update, policies change, new documentation gets written. Keeping everything in prompts means manually updating every prompt whenever anything changes.

**Hallucinations:** Without the correct information or focus for answering a user's question, LLMs may produce a wrong answer with an authoritative voice. For most business applications, where accuracy matters, hallucination is a critical problem.

## Enter Retrieval

Retrieval solves these fundamental challenges by creating a bridge between AI models and your actual data. Instead of trying to cram everything into prompts, a retrieval system **stores your information** in a searchable format. This allows you to search your knowledge base using natural language, so you can find relevant information to answer the user's question, by providing the retrieval system with the user's question itself. This way, you can build context for the model in a strategic manner.

When a retrieval system returns the results from your knowledge base relevant to the user's question, you can use them to provide context for the AI model to help it generate an accurate response.

Here's how a typical retrieval pipeline is built:
1. **Converting information into searchable formats** - this is done by using **embedding models**. They create mathematical representations of your data, called "embeddings", that capture the semantic meaning of text, not just keywords.
2. **Storing these representations** in a retrieval system, optimized for quickly finding similar embeddings for an input query.
3. **Processing user queries** into embeddings, so they can be used as inputs to your retrieval system.
4. **Combining the retrieved results** with the original user query to serve to an AI model.

**Chroma** is a powerful retrieval system that handles most of this process out-of-the-box. It also allows you to customize these steps to get the best performance in your AI application. Let's see it in action for our customer support example.

### Step 1: Embed our Knowledge Base and Store it in a Chroma Collection

### python

Install Chroma:

```terminal
pip install chromadb
```

### poetry

```terminal
poetry add chromadb
```

### uv

```terminal
uv pip install chromadb
```

Chroma embeds and stores information in a single operation.

```python
import chromadb

client = chromadb.Client()
customer_support_collection = client.create_collection(
    name="customer support"
)

customer_support_collection.add(
   ids=["1", "2", "3"],
   documents=[
      "Toothbrushes can be returned up to 360 days after purchase if unopened.",
      "Shipping is free of charge for all orders.",
      "Shipping normally takes 2-3 business days"
   ]
)
```

### typescript

Install Chroma:

```terminal
npm install chromadb @chroma-core/default-embed
```

### pnpm

```terminal
pnpm add chromadb @chroma-core/default-embed
```

### yarn

```terminal
yarn add chromadb @chroma-core/default-embed
```

### bun

```terminal
bun add chromadb @chroma-core/default-embed
```

Run a Chroma server locally:

```terminal
chroma run
```

Chroma embeds and stores information in a single operation.

```typescript
import { ChromaClient } from "chromadb";

const client = new ChromaClient();
const customer_support_collection = await client.createCollection({
    name: "customer support"
});

await customer_support_collection.add({
    ids: ["1", "2", "3"],
    documents: [
        "Toothbrushes can be returned up to 360 days after purchase if unopened.",
        "Shipping is free of charge for all orders.",
        "Shipping normally takes 2-3 business days"
    ]
})
```

### Step 2: Process the User's Query

Similarly, Chroma handles the embedding of queries for you out-of-the-box.

### python

```python
user_query = "What is your return policy for tooth brushes?"

context = customer_support_collection.query(
    queryTexts=[user_query],
    n_results=1
)['documents'][0]

print(context) # Toothbrushes can be returned up to 360 days after purchase if unopened.
```

### typescript

```typescript
const user_query = "What is your return policy for tooth brushes?";

const context = (await customer_support_collection.query({
    queryTexts: [user_query],
    n_results: 1
})).documents[0];

console.log(context); // Toothbrushes can be returned up to 360 days after purchase if unopened.
```

### Step 3: Generate the AI Response

With the result from Chroma, we can build the correct context for an AI model.

### OpenAI

```python
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = f"{user_query}. Use this as context for answering: {context}"

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": prompt}
    ]
)
```

### typescript

```typescript
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const prompt = `${userQuery}. Use this as context for answering: ${context}`;

const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: "You are a helpful assistant" },
      { role: "user", content: prompt },
    ],
});
```

### Anthropic

```python
import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.getenv("ANTHROPIC_API_KEY")
)

prompt = f"{user_query}. Use this as context for answering: {context}"

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": prompt}
    ]
)
```

### typescript

```typescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY,
});

const prompt = `${userQuery}. Use this as context for answering: ${context}`;

const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    messages: [
        {
            role: 'user',
            content: prompt,
        },
    ],
});
```