Chapter 5LangChain — Building LLM-Powered Applications from Scratch

Vector Search with LangChain: Using Chroma or Weaviate

Power your app with fast, semantic search — retrieve the right chunk at the right time using Chroma or Weaviate.

🔍 Why You Need Vector Search

LLMs are powerful, but they have limits — especially when users ask questions about your content, like PDFs, docs, or product data.

That's where vector search comes in.

By combining LangChain with a vector database like Chroma or Weaviate, you can retrieve the most relevant content before asking the LLM to generate an answer — a strategy known as Retrieval-Augmented Generation (RAG).

In this chapter, you'll learn how to:

Chunk and embed your documents
Store them in Chroma or Weaviate
Search semantically using user queries
Pass relevant content to the LLM for accurate, grounded responses

📦 What You'll Build

You'll create a RAG-style system with this flow:

User question → Vector DB search → Top matching docs → LLM response using retrieved content

This is the backbone of tools like Chat with PDF, AI Knowledge Bases, and Enterprise GPT assistants.

🛠️ Step-by-Step: Vector Search with Chroma

Let's start with ChromaDB, an easy-to-use local vector store that works well for small to mid-scale apps.

1. Install Chroma and LangChain Embeddings

pip install chromadb langchain openai

Set your OpenAI key or use another embedding model.

2. Load and Chunk Your Documents

from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("my-doc.txt")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

3. Embed and Store in Chroma

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embedding_model = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(
    chunks,
    embedding_model,
    persist_directory="./db"
)

4. Query and Retrieve Relevant Chunks

query = "How does the onboarding process work?"

results = vectorstore.similarity_search(query, k=3)
for doc in results:
    print(doc.page_content)

⚙️ Optional: Swap to Weaviate

If you need cloud-based search or larger-scale indexing, Weaviate is a great alternative.

LangChain's Weaviate integration works almost identically — you just change the vectorstore class and credentials. You can:

Use Weaviate Cloud Services or run it locally
Plug in OpenAI, Cohere, or local embedding models
Enable hybrid keyword + vector search for better accuracy

🧠 What Happens Behind the Scenes

Here's what's happening under the hood:

Documents are chunked into ~500-token pieces
Each chunk is embedded into a vector using OpenAI or Cohere
At query time, your input is also embedded
The vectorstore runs a similarity search to return the most relevant chunks
LangChain feeds those chunks + your question into an LLM for response

This workflow dramatically improves accuracy and groundedness, especially for domain-specific answers.

💡 When to Use Vector Search

Use this setup when your app needs to:

Answer questions from static or semi-structured content
Support PDF, DOCX, or markdown ingestion
Handle domain-specific use cases (legal, financial, HR, etc.)
Avoid LLM hallucination by grounding answers in retrieved docs

You can also chain this with LangChain memory or tools for even richer interactions.

🧪 Bonus: ConversationalRetrievalChain

LangChain includes a high-level abstraction called ConversationalRetrievalChain, which combines:

A vector retriever (Chroma, Weaviate, etc.)
Memory
LLM
Prompt templates

It's ideal for building production-ready document Q&A tools in minutes.

from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=ChatOpenAI(),
    retriever=vectorstore.as_retriever(),
    memory=ConversationBufferMemory()
)

qa_chain.run({"question": "What is the refund policy?"})

🔚 Wrapping Up

You've now added semantic search capabilities to your LangChain app. That's a massive leap forward.

With vector search, your LLM can now:

Pull context from your documents
Return grounded answers
Act as a personalized knowledge assistant

In the next chapter, we'll bring it all together — showing you how to deploy LangChain apps in production using APIs and simple hosting strategies.

Using LangChain Tools: Add Web Search, Math, and Custom Actions

Deploying LangChain Apps: From Local Scripts to Production APIs

Back to LangChain — Building LLM-Powered Applications from Scratch