Data Agents and the Multi-Model Future

The last post laid out why connecting an LLM directly to your warehouse ends badly. The model reasons from syntax, not meaning. It has no business context, no entity definitions, no guardrails. The output is fluent and wrong.

So what works?

Why data agents change the math

The data platform is the right place to build intelligence. Not because any one vendor’s models are the best. Because the governed data is already there.

A data agent operates directly on the data in your warehouse. Semantic search, classification, summarization, extraction. All of it running on data that is already governed, already access-controlled, already defined by the semantic models your team built. The intelligence inherits the governance. It sees tables as your semantic layer defines them. It respects the role-based access model. When you ask it about customers, it is operating on your governed definition of customer, not its best guess from parsing a schema.

This does not make the agent problem trivial. But it removes the single largest source of failure: the translation gap between what the model thinks the data means and what the data means in your business.

Tools first, agents later

The iteration path matters more than the destination. I have watched too many teams try to build an autonomous agent on day one. They skip the intermediate steps and wonder why the output is unreliable.

Start with AI functions as tools. Let your analysts use natural language search over structured data. Let your BI consumers ask questions grounded in the governed tables. It is not an agent. It is a tool. And it is valuable because it compresses the time between question and answer without sacrificing data quality.

Then build workflows. A function that monitors a metric daily and fires an alert when something anomalous happens. A summarizer that flags data quality issues before anyone opens a dashboard. Still not an agent. Still useful. And every one of these tools validates your semantic model, your entity definitions, and your governance layer under real usage.

By the time you are ready to build a real agent, you have already pressure-tested the foundation with hundreds of tool-level interactions. You know which entity definitions hold up and which break. You know where the boundaries need to be tight and where they can be loose. Every tool-level interaction is a test case for the agent you will build later. Skip the tools, and you are testing the agent in production.

The multi-model reality

Your data platform’s native AI is not the only model your organization will use. Claude, ChatGPT, Gemini, open-source models: they all have strengths. The future is not one model. The future is an orchestrated set of models, each doing what it does best.

But every one of them has the same problem: they need grounded data to produce grounded output. If that context is raw table dumps, you get the catastrophe from the last post. If that context is governed, entity-resolved, semantically modeled data, you get answers that align with how your business operates.

The architecture: your data platform is the governed layer. Data agents handle the functions that run close to the data (search, classification, anomaly detection, summarization at scale). External models like Claude or ChatGPT handle the tasks that benefit from their specific strengths (complex reasoning, long-form generation, multi-step planning). Both consume the same semantic model. Both operate on the same entity definitions. Both respect the same access controls.

The error bars shrink because the foundation is shared. When Claude, your data agents, and your dashboards all agree on what a customer is, what “active” means, and how revenue is calculated, the AI output converges with the human-trusted output. Not because the models got smarter, but because you stopped asking them to guess what your data means.

The trust gradient

Not every use case needs a fully autonomous agent. Most do not. The right question is not “how do we build an agent?” It is “how much autonomy does this use case need?”

Most organizations should be building alerts and recommendations right now. The system notices something and tells a human. The system evaluates a situation and suggests an action. Low risk, high value, and every interaction builds the evidence base for whether you can eventually trust the system to act on its own.

When your entity model has been tested by a thousand tool-level queries and the error rate is low enough that you trust it, then you move to autonomous action. Not before.

The line

The last post said connecting LLMs directly to data is reasoning from syntax. This post is the alternative: build the governed layer, let data agents mediate between the models and the meaning, and earn trust incrementally.

A data agent is not magic. It is a set of AI functions running on governed data. That is what makes it valuable. Not the model sophistication. The proximity to truth.

Build the tools. Validate the foundation. Earn the trust. Then, and only then, let the agents run.

ai
agentic-ai
data-strategy
multi-model
ontology

All insights