How to Build a Multi-Stage LangChain Agent for MySQL

As organizations increasingly rely on AI to interact with structured data, integrating LLMs with databases like MySQL is becoming a critical capability. LangChain, a powerful framework for orchestrating LLM applications, enables developers to create agents that can dynamically plan, query, and interpret data. In this guide, we’ll walk through how to build a multi-stage LangChain agent capable of interacting with a MySQL database—transforming natural language questions into SQL queries, executing them, and returning meaningful results.

What Is a Multi-Stage Agent in LangChain?

A multi-stage LangChain agent is designed to handle tasks in sequential phases or “stages.” For example:

Understanding Intent – Parse the user’s query.
Planning – Decide what actions (e.g., SQL queries) are needed.
Execution – Perform the queries.
Postprocessing – Summarize or analyze the output.

This architecture allows for modular, interpretable pipelines that combine the strengths of LLM reasoning and traditional database access.

Prerequisites

Before diving into the code, ensure you have the following installed:

Python 3.10+
MySQL database (local or remote)
mysql-connector-python
langchain
openai
python-dotenv (for environment variable management)

Also, make sure you have your OpenAI API key stored securely in a .env file:

Sample MySQL Database Setup

Let’s use a basic employee database for demonstration:

Environment and Database Setup

Create a file config.py to manage your environment and database connection:

LangChain agents work with tools. Here we define a tool that allows an agent to run SQL queries:

python

# tools/sql_tool.py

from langchain.tools import Tool

from config import get_db_connection

def run_sql_query(query: str) -> str:
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute(query)
rows = cursor.fetchall()
col_names = [desc[0] for desc in cursor.description]
results = [dict(zip(col_names, row)) for row in rows]
return str(results)
except Exception as e:
return f”Error executing query: {str(e)}”
finally:
cursor.close()
conn.close()sql_tool = Tool(
name=“MySQL Query Executor”,
func=run_sql_query,
description=“Executes SQL queries on the company database and returns the result.”
)

Create a Prompt Template for Multi-Stage Reasoning

LangChain uses PromptTemplates to guide LLM behavior. We’ll create one for converting user questions to SQL and another for summarizing the result.

Here’s how we wire the stages together using LangChain’s LLMChain and AgentExecutor:

python

# main.py

from langchain.llms import OpenAI

from langchain.chains import LLMChain

from langchain.agents import initialize_agent, AgentType

from langchain.agents.agent_toolkits import Tool

from tools.sql_tool import sql_tool

from prompts.sql_generator_prompt import sql_prompt

import os

llm = OpenAI(temperature=0, openai_api_key=os.getenv(“OPENAI_API_KEY”))# Stage 1: Generate SQL from natural language
sql_generator_chain = LLMChain(llm=llm, prompt=sql_prompt)# Stage 2: Execute SQL and return results
tools = [sql_tool]# Stage 3: Post-processing/summarization
postprocess_prompt = PromptTemplate(
input_variables=[“query”, “results”],
template=“””
Given the SQL query: “{query}”
And its results: “{results}”
Summarize the information in a clear and concise natural language statement.
“””
)postprocess_chain = LLMChain(llm=llm, prompt=postprocess_prompt)# Orchestration logic
def multi_stage_query(question):
sql = sql_generator_chain.run(question)
results = sql_tool.func(sql)
summary = postprocess_chain.run({“query”: sql, “results”: results})
return {
“natural_question”: question,
“generated_sql”: sql,
“query_results”: results,
“summary”: summary
}# Run it
if __name__ == “__main__”:
question = “Which employees in the Engineering department earn more than $100,000?”
response = multi_stage_query(question)
for key, value in response.items():
print(f”{key.upper()}:\n{value}\n”)

Add Logging and Error Handling (Optional but Recommended)

To make the agent production-grade, wrap calls in try-except blocks and optionally log input/output:

Sample Output

Here’s a sample output for the question: “Which employees in the Engineering department earn more than $100,000?”

bash

NATURAL_QUESTION:

Which employees in the Engineering department earn more than $100,000?

GENERATED_SQL:
SELECT * FROM employees WHERE department = ‘Engineering’ AND salary > 100000;QUERY_RESULTS:
[{‘id’: 1, ‘name’: ‘Alice’, ‘department’: ‘Engineering’, ‘salary’: 120000, ‘hire_date’: datetime.date(2020, 3, 15)}, {‘id’: 3, ‘name’: ‘Charlie’, ‘department’: ‘Engineering’, ‘salary’: 130000, ‘hire_date’: datetime.date(2021, 11, 1)}]SUMMARY:
Alice and Charlie work in the Engineering department and earn more than $100,000.

Optional Enhancements

Add SQL validation using libraries like sqlparse.
Use vector-based context augmentation (e.g., for complex schemas).
Replace MySQL with PostgreSQL or SQLite with minor adjustments.
Add an interface using Streamlit or FastAPI for real-time querying.

Conclusion

In today’s data-driven era, where stakeholders across business functions demand real-time access to insights, the ability to converse with databases using natural language has become a game-changer. The rise of large language models (LLMs) like OpenAI’s GPT-4, combined with orchestration frameworks such as LangChain, enables developers to build powerful agents that bridge the gap between complex data systems and human interaction. A multi-stage LangChain agent for MySQL is a practical, extensible solution that embodies this evolution.

By architecting the agent in clearly defined stages—intent interpretation, SQL generation, query execution, and result summarization—we create a robust pipeline that is both explainable and flexible. Each stage can be independently developed, tested, optimized, or replaced, offering modularity that simplifies long-term maintenance. For instance, the SQL generation component can be fine-tuned for accuracy, the query execution layer can be optimized for performance, and the summarization stage can be enhanced with domain-specific knowledge or formatting tailored to business needs.

This layered approach also provides significant advantages for debugging and transparency. If something goes wrong—say, an incorrect SQL query or unexpected result—the issue can be traced to a specific module in the chain. This makes it far easier to audit the system, enforce security protocols (like query validation or sandboxing), and monitor its real-world performance in production environments.

Moreover, using LangChain’s agent architecture, you can integrate additional tools beyond SQL execution. You could add capabilities like email generation, PDF report creation, Slack notifications, or even integrations with business intelligence dashboards—all driven by natural language prompts. The foundation you build here is not just for database interaction but a template for enterprise-level LLM-enabled workflows.

Importantly, the use of LLMs with structured data must be approached with responsibility. It’s essential to implement safeguards such as query validation, result filtering, and access control to prevent misuse or leakage of sensitive data. Prompt engineering also plays a critical role in ensuring that the LLM’s instructions are precise, context-aware, and tuned to the data model’s structure and constraints.

The MySQL database used in this walkthrough can easily be swapped for other relational databases like PostgreSQL, Oracle, or even cloud-native solutions such as Amazon RDS or Google Cloud SQL. With minimal adjustments, the agent’s SQL execution and connection layers can be adapted to fit your organization’s tech stack, making this solution highly portable.

In conclusion, building a multi-stage LangChain agent for MySQL is more than a cool tech demo—it’s a foundational capability for modern, intelligent applications. Whether you’re in finance, healthcare, logistics, or e-commerce, the ability to empower users with conversational data access unlocks enormous business value. By combining the precision of SQL, the accessibility of natural language, and the intelligence of LLMs, this architecture offers a powerful blueprint for the future of human-computer interaction in data-rich environments.

As you continue to develop your own LangChain-powered agents, consider how you can:

Expand the agent’s domain to include analytics or forecasting.
Integrate with other enterprise APIs for richer automation.
Apply reinforcement learning or prompt tuning for better outputs.

The journey toward intelligent, conversational data agents has just begun—and with LangChain and MySQL as your allies, you’re already well on your way.