Power your app with fast, semantic search — retrieve the right chunk at the right time using Chroma or Weaviate.
LLMs are powerful, but they have limits — especially when users ask questions about your content, like PDFs, docs, or product data.
That's where vector search comes in.
By combining LangChain with a vector database like Chroma or Weaviate, you can retrieve the most relevant content before asking the LLM to generate an answer — a strategy known as Retrieval-Augmented Generation (RAG).
In this chapter, you'll learn how to:
You'll create a RAG-style system with this flow:
User question → Vector DB search → Top matching docs → LLM response using retrieved content
This is the backbone of tools like Chat with PDF, AI Knowledge Bases, and Enterprise GPT assistants.
Let's start with ChromaDB, an easy-to-use local vector store that works well for small to mid-scale apps.
pip install chromadb langchain openai
Set your OpenAI key or use another embedding model.
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = TextLoader("my-doc.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
embedding_model = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
chunks,
embedding_model,
persist_directory="./db"
)
query = "How does the onboarding process work?"
results = vectorstore.similarity_search(query, k=3)
for doc in results:
print(doc.page_content)
If you need cloud-based search or larger-scale indexing, Weaviate is a great alternative.
LangChain's Weaviate integration works almost identically — you just change the vectorstore class and credentials. You can:
Here's what's happening under the hood:
This workflow dramatically improves accuracy and groundedness, especially for domain-specific answers.
Use this setup when your app needs to:
You can also chain this with LangChain memory or tools for even richer interactions.
LangChain includes a high-level abstraction called ConversationalRetrievalChain, which combines:
It's ideal for building production-ready document Q&A tools in minutes.
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory
qa_chain = ConversationalRetrievalChain.from_llm(
llm=ChatOpenAI(),
retriever=vectorstore.as_retriever(),
memory=ConversationBufferMemory()
)
qa_chain.run({"question": "What is the refund policy?"})
You've now added semantic search capabilities to your LangChain app. That's a massive leap forward.
With vector search, your LLM can now:
In the next chapter, we'll bring it all together — showing you how to deploy LangChain apps in production using APIs and simple hosting strategies.