LangChain RAG Pipeline: Complete Production Tutorial

Category: AI Coding Difficulty: Intermediate Updated: 2026-05-28

Build a production-ready RAG pipeline with LangChain. Covers document loading, chunking, vector stores, retrieval chains, and streaming responses with OpenAI embeddings and ChromaDB.

Why LangChain for RAG?

LangChain is the most popular framework for building RAG systems. It provides reusable components for document loading, text splitting, vector stores, and retrieval chains — letting you focus on your application logic instead of boilerplate integration code.

1. Setup

pip install langchain langchain-openai langchain-chroma chromadb

2. Load & Split Documents

from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = DirectoryLoader("./docs/", glob="**/*.md")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
chunks = text_splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")
# Output: Split into 47 chunks

3. Create Vector Store

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
print(f"Indexed {vectorstore._collection.count()} vectors")

4. Build the RAG Chain

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o", temperature=0)

system_prompt = (
    "You are a helpful assistant. Answer the question using ONLY the provided context. "
    'If you cannot answer from the context, say "I don\'t have enough information." '
    "Always cite the source document names.

"
    "Context: {context}"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

document_chain = create_stuff_documents_chain(llm, prompt)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
rag_chain = create_retrieval_chain(retriever, document_chain)

# Query
response = rag_chain.invoke({"input": "What are the system requirements?"})
print(response["answer"])
print(f"Sources: {[d.metadata['source'] for d in response['context']]}")

5. Key Parameters to Tune

chunk_size: 500-1500 tokens. Smaller = more precise, larger = more context
chunk_overlap: 10-20% of chunk size. Prevents information loss at boundaries
k (retrieved chunks): 3-5 for most use cases. More = richer context but more tokens
temperature: 0 for factual Q&A, 0.3-0.7 for creative synthesis