The analyst who just wanted one number
Imagine you’re the first data hire at a startup. On day one, things are beautiful. There’s a single Postgres database, a handful of tables, and a CEO who asks straightforward questions: “How many users signed up last week?” You write a query and get a number. Everyone’s happy.
Fast-forward eighteen months.
The company has grown. There’s a Snowflake warehouse now, plus a legacy MySQL instance nobody wants to touch but everyone still reads from. Marketing runs their own Redshift cluster. Someone in finance built a “data pipeline” that is, in fact, a Google Sheet with seventeen tabs and an IMPORTRANGE formula held together by hope alone.
You’re still getting asked the same question as at the beginning – “How many users signed up last week?” – but now it takes you half a day to calculate the number. Not because the query is hard, but because you have to figure out which database has the canonical answer, whether the ETL ran last night, and why the users table in Snowflake has 15% more rows than the one in Postgres.
This isn’t a failure of technology. It’s what happens when every team solves its own data problem locally, and no one keeps an eye on the big picture.
How “just one more fix” becomes permanent architecture
If you’ve spent any time inside a production warehouse, you’ve probably seen the naming graveyard: users_active, users_active_v2, users_active_FINAL, users_active_FINAL_use_this_one. Every one of those columns was someone’s fix to someone else’s fix. And somewhere, a dashboard is still pointing at the wrong one.
Here’s the pattern. A company starts clean, but then growth introduces pressure – a new product line, an acquisition, a pivot. Each event spawns a “temporary” workaround. A quick-fix table that becomes the source of truth for half the company. A one-off script that becomes a nightly cron job. A dashboard filter that encodes a business rule nobody documented.
These workarounds compound. Within a couple of years, the data stack isn’t a stack anymore – it’s an archaeological dig site. Every layer tells you something about what the company used to care about, but not necessarily what it cares about now.
The real damage isn’t technical debt – it’s trust debt. When analysts can’t confidently explain where a number comes from, decision-makers stop trusting data and go back to gut instinct. You’ve built a million-dollar analytics infrastructure that people treat like a weather forecast – interesting, but not something you’d bet the quarter on.
It’s not a naming problem – it’s a people problem
The root cause is almost never sloppy engineering. It’s organizational. Sales counts revenue at the moment a deal closes. Finance recognizes it when the invoice is paid. Product attributes it to the month the customer first activated. Each team has a defensible reason for their definition – and all three end up in the same warehouse under variants of the same column name.
The industry calls this a semantic gap – the distance between what a column is named and what it actually means. Documentation helps, but data dictionaries go stale the moment someone adds a new column without updating the wiki. Business definitions evolve in Slack threads and never make it into the codebase.
The self-service fantasy
For the past decade, every BI vendor has promised the same dream: Give business users a tool, and they’ll answer their own questions. No more ticket queues. No more waiting on the data team. Just drag, drop, and discover.
It hasn’t worked. Not because the tools are bad – modern BI platforms are genuinely impressive – but because self-service assumes shared understanding. When a marketing director drags “revenue” onto a dashboard, they expect it to mean what they mean by revenue. If it doesn’t, they don’t file a bug report. They export to Excel and fix it themselves.
Now multiply that across fifty people in ten departments. You end up with fifty slightly different versions of truth, each living in someone’s laptop. Everyone has access to the data, but nobody shares the same understanding of what it means.
Enter AI – and the problem gets worse before it gets better
Here’s where things get interesting. We’re now in 2026, and AI agents are starting to replace dashboards for ad-hoc questions. “Just ask the AI” is the new “just check the dashboard”.
But an AI agent querying a messy warehouse will confidently produce messy answers. It won’t tell you that the users table has a known deduplication issue. It won’t flag that revenue in the finance schema means something different from revenue in the product schema. It will do exactly what you asked, give you a number, and move on. Text-to-SQL is powerful, but it’s only as good as the foundation it’s built on.
The risk isn’t that AI gives wrong answers – it’s that it gives wrong answers that look right. A hallucinated metric in a polished chart is harder to catch than a broken query that throws an error.
What actually fixes this issue
Is this just the way it is? Not necessarily. The fix isn’t another tool on top of the pile. It’s a layer underneath – what data engineers call a semantic layer. Think of it as a single source of truth for business definitions, written in code. “Churn” means one thing. “MRR” is calculated one way. Every query – whether from a human, a dashboard, or an AI agent – goes through this layer.
This is the approach behind Databao, a new data product from JetBrains. Rather than asking analysts to maintain yet another wiki or expecting AI to magically infer business context from raw tables, Databao lets teams build a shared semantic context that both humans and AI agents can use. Business users ask questions in plain English; the agent translates them into queries anchored to verified definitions, not guesses about which revenue column to use.
This matters because the bottleneck in analytics was never SQL syntax – it was meaning. When an AI agent has access to a semantic layer, it stops guessing and starts reasoning within the guardrails your team defines.
What you can do today
If you’re a data engineer or analyst reading this and nodding along, here are three things that help, regardless of what tools you use:
Audit your naming. Spend a day cataloging every column that represents the same concept under different names. Just seeing the list is sobering and motivating.
Define before you build. Before creating a new metric or dashboard, write down the definition in plain language. Get sign-off from stakeholders. This one habit prevents more confusion than any governance tool.
Give your AI a foundation. If you’re adopting AI-driven analytics, don’t point it at raw tables and hope for the best. Give it structured context. Databao by JetBrains is built specifically for this – you can sign up for a demo session with the team and see how it works on your own data.
The path forward
The companies that will get the most value from AI in analytics aren’t the ones with the most data or the fanciest models. They’re the ones that did the unglamorous work of aligning on definitions, cleaning up naming, and building a semantic foundation.
Data is only as useful as the shared understanding behind it. The tools are finally catching up to that reality, but they can’t skip the human work of agreeing on what things mean. The good news? Once that foundation exists, everything built on top of it – dashboards, reports, AI agents – just works. And for the first time, “just ask the data” might actually mean what it says.
About Databao
If you’d rather use agents than develop them, you can build Databao into your workflow and join us in building a proof of concept together. We’ll work with you to understand your use case, define a context-building process, and give the agent access to a select group of business users. Together, we’ll evaluate the quality of the responses and overall satisfaction with the results.
Related Categories