Intro to LangChain, LangGraph
By: Sid
Well, ive already spoken about how i've used MCP's at work to make a smart LLM that can do much more.
LLM's at a workplace is in general restricted and hence i made them smarter by creating my own MCP Client and MCP Server to make them use tools to do company-specific tasks much more effeciently.
Now, ill talk about how I combined LangChain, LangGraph, and Model Context Protocol (MCP),to close the gap between cost-effective models such as GPT‑4o mini and top-tier models like GPT‑5 significantly reducing both AI spend and latency, while boosting reliability ٩(^ᗜ^ )و
The issue with newer models
The whole issue with "smarter" models is the amount of time they actually waste in thinking and analyzing the question T~T
newer models like gpt-5 are no doubt much smarter in determining which tools to use and how to use them, but as far as ive tested almost 2x slower than gpt-4o-mini (atleast in what im working on)
By just using an orchestrated workflow (langchain/langgraph), we can turn the output of a smaller LLM into production-quality results. Streaming, validation, retries, and checkpoints don’t just fix errors—they let a compact model hit the same task KPIs as a much larger one.
Introduction to LangChain
LangChain is a framework for building LLM-powered applications.
A good way to actually get what LangChain does is :
- LangChain = LEGO blocks for LLM apps.
- Want GPT to read your PDFs? There's a block for that.
- Search Google? There's a block for that.
- Remember conversations? Another block for that.
You chain these together to build AI workflows without reinventing the wheel each time. It's basically like a middleware for LLMs.
It wraps your LLM, memory, prompts, and tools into chains or agents
It decides when and why to use a tool.
Reference Architecture:
User → LangChain Agent → MCP Tools → Vendor APIs/Datastore
↓ ↘
LangGraph StateGraph: nodes for retrieval, planning, tool call, validation, summarization; with conditional edges for error recovery, retries, human approvals.
The different components that LangChain brings to the table
I. LLM Abstraction
LangChain provides a unified interface to talk to any language model (OpenAI, Anthropic, etc.), so you don’t have to write custom code for each provider.
want to change model at any time? just swap out the block for openAI and put in the block for Anthropic
The advantage of this ?
- You can switch between LLM providers easily.
- You get retry handling, timeouts, streaming, etc., built-in.
II. Prompt Abstraction
A prompt is the instruction you give the LLM. LangChain treats prompts as structured, reusable templates.
This will ensure that anything you want to say to an LLM will be put in a single uniform way.
This is a wondeful guardrail to ensure that the LLM always uniformly understands various types of questions and reduces the hallucination potential.
Why it’s useful:
- Prompts are cleanly separated from logic.
- Easy to reuse and test.
Other Abstractions
| Abstraction | What it means |
|---|---|
| Tool | Wrapper around an external function/API |
| Memory | Stores previous messages or data |
| Retriever | Fetches relevant documents from a database or vector store |
| Output Parser | Converts LLM output to structured format |
| Document Loader | Loads data from files or databases |
Chains in LangChain
A Chain connects multiple components (LLM, prompt, tool, memory) into a sequence or pipeline to solve a task.
There are various types of chains.
Simple Sequential Chains
Each step takes input → passes it to the next → and so on.
Issue : You can’t control branching or logic. Every step runs no matter what. The final output is only a single value
What if we need MIMO(multiple in multiple out) ? We can use sequential chains. These can process multiple inputs and give out multiple outputs, not just a single output (boringgg)
Introduction to Agents
An Agent is a runtime LLM process that:
- Decides what actions (tools) to take next, based on user input and its memory/state.
- Chain: fixed sequence — A → B → C
- Agent: dynamic reasoning — LLM decides if it should do A, skip B, or call a tool twice.
LangChain has built-in agent classes like:
initialize_agentAgentExecutorConversationalAgentToolCallingAgent
They typically:
- Accept a list of Tools (your functions or APIs)
- Pass user input to the LLM
- Let the LLM decide which tool to use, in what order
- Optionally store history or “memory”
Memory
This was another useful thing that orchestration frameworks provided. Helped with managing context and conversations much easier helping the LLM's to hallucinate lesser and understand more of what the user wanted compared to regular api calls
Short-term memory (conversation context)
- Keeps a running transcript or summary of the conversation.
- Used in chatbots to maintain coherence across turns.
- In LangChain, classes like:
ConversationBufferMemoryConversationSummaryMemory.
Long-term or Graph Memory (LangGraph State)
- LangGraph expands “memory” into graph state — meaning:
- Each node can store inputs and outputs.
- The whole graph can persist across sessions.
- You can checkpoint, resume, or replay workflows.
- we can maintain a long term grapoh memory using open source graph DB's
* poof * LangGraph Enters the Chat
LangGraph is newer and more powerful for agentic workflows think of it like:
- A stateful, graph-based LangChain for LLMs with dynamic tool use and control flow.
What's better than LangChain ?
1. Stateful Control: LangGraph lets you define agent flows as state graphs—with branch points, retries, validation, and approval gates.
2. Reliability: Features like streaming, checkpointing, and human‑in‑the‑loop reduce downtime and enable fast error recovery instead of full reruns.
3. Observability: Built-in tracing reveals token usage (super useful for cost management !!), run steps, and failure points for rapid debugging and optimization.
How do they enhance the whole MCP Setup ?
MCP already defines and manages tools. These tools are called but the decision of which tool to call is made by the LLM based on the description
Where LangChain/LangGraph can help:
- Complex multi-step reasoning or data transformation between tool calls. (reduces the LLM's work 😎)
- To add structured output parsing (JSON schemas for the data given by the backend)
- Multi-model orchestration → Complex tasks costlier LLM’s, easier tasks, a cheaper & faster one. (COST SAVINGGG)
Cost and Latency: Real Savings
GPT‑4o mini is priced at roughly $0.15 per million input tokens and $0.30 per million output tokens—orders of magnitude lower than GPT‑5.
Orchestration means expensive retries and invalid outputs are out! (since LangGraph checkpoints and structured validation gate each step.)
| Model | Input Token Price | Output Token Price | Speed | Reliability |
|---|---|---|---|---|
| GPT‑5 | $0.60 | $1.20 | High | High |
| GPT‑4o mini | $0.15 | $0.30 | Highest | Robust with LangGraph |
How I did it :
- Expose capabilities via MCP servers. Use JSON‑RPC to wrap API access, retrieval, verifiers, etc. for consistent contracts.
- Wrap tools with LangChain interfaces. Create structured output parsers and validators, so your agents aren’t just guessing—they’re verifying.
- Define LangGraph StateGraphs. Map nodes for tool calls, branching fallbacks, user approval, and output validation. On error, checkpoint and skip reruns.
- Use LangGraph persistent state and checkpointing. Long runs can resume or recover instantly, instead of starting from scratch.
- Apply “small‑first” orchestration. Use GPT‑4o mini for most agent work; escalate to larger model only if validation or human gate fails.
Results:
When tested on my deployment, GPT‑4o mini combined with orchestration hit the same accuracy, structured output rates, and reliability as GPT‑5 at under half the cost and with lower latency :D
- Accuracy Parity: Through retries, validation, and external tool-calls, small models deliver high accuracy for scoped workloads.
- Token Spend: Total token cost per resolved task fell by ~50-60% in production.
- Latency: StateGraph streaming and parallelization consistently dropped P95 completion times.
something super small but saved us a shit ton of money and reduced a hell lot of waiting (screw you thinking mode) :D