2026 Predictions: Why Semantics Will Determine AI Success
As we reach the end of 2025, a pattern has emerged which is hard to ignore. AI adoption is everywhere, but companies are struggling to see large-scale and repeatable value creation.
In its 2025 State of AI survey, McKinsey reported that while enterprise use of AI tools was widespread, most organizations were still stuck in piloting and experimentation, with only a small fraction successfully scaling AI in ways that delivered enterprise-wide value. The models worked. The demos impressed executives. But production outcomes lagged.
We’ve seen the same thing repeatedly in practice.
Most enterprise AI pilots in 2025 never made it to production. Not because the models failed—but because when teams tried to move from proof-of-concept to deployment, they hit the same wall: the AI could not understand what the data meant.
At Carrefour Brazil, for example, data practitioners had no visibility into what data was available or what the "official" data source was across multiple systems, and even within a single data source, it was unclear which columns could be used. They were processing 133 petabytes daily across 33,000+ tables, but their initiatives couldn't move forward until they solved the data meaning problem. Once they did—using OpenMetadata to establish a single source of truth—500+ users could finally find and trust the data they needed, and the project became a model for Carrefour's global subsidiaries.
The conversation around AI has been dominated by models for years, and that phase is ending. Frontier models will keep improving, albeit slowly, and gains will continue, but the fundamental constraint is now whether enterprises have AI-ready data.
Metadata helps people find data. Semantics helps AI reason, act, and govern correctly.
In 2026, the enterprises that close this gap will be the ones that treat shared semantic meaning as a data infrastructure priority. Based on what we've seen working with data teams over the past year, here are five predictions for how this shift will unfold.
Prediction 1: Descriptive Metadata Is No Longer Sufficient for AI
We've spent a decade building metadata systems at Uber, at Hortonworks, and now at OpenMetadata, and they were designed to help humans find and understand data through ownership, descriptions, lineage, and freshness indicators. These were appropriate at the time but AI requires something different.
AI systems need to reason over data, and reasoning requires executable, machine-enforceable semantics. Consider what this means in practice: metadata tells you that a column is called revenue, while semantics tells the AI that revenue means gross revenue in USD, calculated as unit_price × quantity minus refunds, reported at the transaction level, and governed by finance policy FP-2024-03. Metadata provides labels, but semantics provides contracts that machines can interpret and enforce.
When meaning is implicit, humans understand context by reading the wiki, messaging the data engineer, and making educated guesses. AI cannot do this. If definitions are ambiguous or policies are buried in documents, AI systems fail silently or produce unpredictable results.
Most data friction inside organizations comes from semantic misalignment rather than technical limitations. At Kansai Airports, a key metric such as "passenger count" had multiple conflicting definitions across departments. Some teams included crew and infants; others excluded transfers; others counted only adult passengers. Everyone built dashboards that worked, but the dashboards told different stories.
When meaning is shared and enforced rather than merely documented, those mismatches disappear. Just look at the Kansai team; they no longer debate what "passenger count" means in every meeting.
By 2026, model choice will matter less than most people think. The difference between a good AI deployment and a failed one will come down to whether the data had meaning that the AI could use.
Prediction 2: AI Treats Structured and Unstructured Data as One Problem
Most enterprises still treat structured and unstructured data as separate domains, with data warehouses in one place and documents, PDFs, and logs in another, managed by different teams with different tools under different governance models. AI does not respect that boundary.
When an AI agent answers a question, it pulls from tables, documents, APIs, and knowledge bases simultaneously, and it does not know or care that your org chart puts them in different silos. Structured data without semantics is opaque to AI because it sees columns and values but not meaning, and unstructured data without structure is ungovernable because the AI can retrieve it but cannot verify whether it is current, compliant, or correct. Semantics bridges these worlds.
At inDrive, tracing data flows from source systems to Tableau dashboards was complex and mostly manual, and there was no single source of truth for hundreds of critical business metrics. The team needed unified semantics across their entire data estate before AI could work with it.
When the same definitions apply to both your data warehouse and your document repository, AI agents can answer questions that span both. Without that, they are limited to whichever silo they happen to query, or worse, they combine data that should not be combined.
By 2026, the conversation will shift. The best RAG pipelines and the most connectors will not matter if meaning is fragmented across silos.
Prediction 3: Governance Moves from Auditing to Runtime Enforcement
Most AI and data governance today is reactive: something breaks, someone investigates, a post-mortem identifies the gap, and a new policy gets added to the wiki. That cycle is too slow for AI.
AI agents operate continuously and autonomously. They don't pause for quarterly reviews or manual governance checkpoints. If meaning and policy are not enforced at runtime—when an agent queries data or takes action—then they are effectively not enforced at all.
At FREENOW, the team manages 17,000+ tables across 400+ schemas, 300+ data pipelines, and over 4,000 reports, and they discovered that one data asset could have 300+ downstream connections. Manual governance was impossible at that scale, so they built automated announcement workflows using OpenMetadata's APIs, ensuring that when an asset was deprecated or changed, every affected downstream owner was notified without human intervention.
In 2026, the shift looks like this:
| Today | 2026 |
|---|---|
| Governance reviews happen quarterly | Policies are enforced when data is accessed |
| Quality issues are caught downstream | Constraints are validated at ingestion |
| Documentation is written after the fact | Descriptions are generated and verified as data changes |
| Trust is audited periodically | Trust is computed and visible in real time |
This changes what humans do. Stewards focus on defining which policies should be in place rather than chasing violations after the fact. Engineers review AI-drafted documentation rather than writing it from scratch. The work becomes judgment, not merely execution.
Organizations that make this shift will ship AI features while maintaining governance. Those that keep humans in every loop will find their AI initiatives bottlenecked by the very processes meant to protect them.
Prediction 4: Semantic Fragmentation Becomes the Next Enterprise Crisis
Every major vendor is now shipping its own semantic layer: Databricks has Unity Catalog, Snowflake has Horizon, dbt has a semantic layer, and Cube, AtScale, and Looker each have their own. Each layer is optimized for its own engine, users, and workflows, instead of shared meaning across the enterprise. Your AI platform is probably building one too, and each promises to be the single source of truth. The outcome is predictable.
Enterprises end up with multiple, conflicting definitions of the same concepts spread across tools and agents, where "customer," "revenue," and "churn" mean different things in different systems. Each system is internally consistent, but across systems, nothing agrees. Schema chaos gave way to schema standards over time, but semantic chaos has not yet found such a resolution.
At Carrefour Brazil, the lack of a standardized glossary and inconsistent data definitions hindered communication and collaboration across teams, and they ultimately had to create 300+ glossary terms defining business concepts and rules. That was a massive effort to undo the fragmentation that had accumulated across their data stack.
When an AI agent queries multiple systems with conflicting definitions, it does not flag the inconsistency but simply picks one or combines them, resulting in confident, wrong answers that are nearly impossible to debug.
In 2026, organizations will recognize this as a structural risk. The solution is not another tool-specific semantic layer, but a unifying semantic foundation that normalizes meaning across systems, resolves conflicts, and preserves intent. Business semantics (what "passenger count" means to operations) and technical semantics (how it's calculated in SQL) cannot live in isolation from one another. AI must connect via a shared semantic metadata graph spanning business definitions, technical logic, and governance rules. Otherwise, consistency breaks down the moment AI tries to reason across tools.
Owning data is no longer enough. Owning meaning becomes essential.

Without a unifying semantic metadata graph, AI agents inherit conflicting definitions from every tool they query. With a shared semantic foundation, meaning is consistent across the entire data estate.
Prediction 5: Open Source Becomes a Trust Strategy
As AI systems take on more responsibility for answering questions, enforcing policies, and triggering actions, trust becomes non-negotiable. Trust in AI requires trust in the meaning AI relies on, and that raises a question most enterprises have not asked: Who owns your definitions?
With closed, proprietary semantic layers, you cannot inspect how meaning is defined, verify how policies are enforced, or extend the model as your business changes. You are locked into a vendor's interpretation of your own data.
I've spent my career in open source, co-founding Hortonworks and now leading OpenMetadata and Collate, and the lesson is always the same: when infrastructure is open, organizations can inspect it, extend it, and own it. But when infrastructure is closed, organizations are simply renting it.
With OpenMetadata, we've seen what happens when meaning becomes a shared asset rather than a vendor dependency. Close to 400 contributors have made significant code contributions, more than 11,600 users collaborate in our Slack community, and thousands of organizations have deployed the platform, from startups to enterprises like Carrefour, Mango, and inDrive.
Open metadata changes the trust equation in four ways:
-
Inspection. You can see precisely how meaning is defined and enforced.
-
Extension. You can adapt semantics as your business evolves.
-
Portability. Definitions move across tools and AI agents without re-creation.
-
Collaboration. Meaning is shared across teams and systems rather than siloed by vendors.
In 2026, open source shifts from a cost discussion to a trust discussion. Enterprises will not just ask what a tool costs but whether they can see inside it, extend it, and take their definitions with them if they leave.
What Comes Next
The next phase of AI requires stronger foundations. Meaning cannot remain an afterthought.
This is the problem we've been solving for the past decade, first at Uber, then with OpenMetadata, and now with Collate. OpenMetadata provides the open-source foundation, a single knowledge graph for metadata and semantics that both humans and AI can rely on, and Collate builds on that foundation with the enterprise capabilities teams need to operationalize it at scale, including AI agents that automate documentation, quality testing, and governance workflows.
We built it on open source because you cannot trust what you cannot inspect. We built it on standards because definitions locked in one tool aren't really yours. And we built it for where the industry is heading: from people-ready data to AI-ready data.
If you're navigating this shift, we'd like to help:
Explore OpenMetadata at docs.open-metadata.org
Join the community at slack.open-metadata.org
See what Collate can do at getcollate.io
The gap between where AI is and where enterprises need it to be will not close by waiting for better models. It closes by getting the meaning right.
Let's build that foundation together.