Snowflake metadata: the three layers and how to manage them

What is Snowflake metadata?

Metadata is the record that lets a data team answer basic questions about their Snowflake estate. These questions include what tables exist, how they are structured, who touched them, and how much they cost to run. Without it, the work of modern data teams gets expensive fast. Engineers rebuild tables that already exist, auditors discover too late that a permission was revoked, and AI assistants return answers that contradict each other because no two tools agree on what a column means.

Snowflake metadata refers to information about Snowflake objects. Databases, schemas, tables, columns, and views carry structural details such as data types, row counts, and timestamps. Usage information, access history, and governance policies round out the picture. Snowflake captures most of this automatically and exposes it through system views that anyone with the right privileges can query. The platform also supports richer metadata through object tagging, classifications, and masking policies, all of which let you attach business context, ownership, and sensitivity labels to existing structures.

What Snowflake does not do is decide what any of this means for your business. A modern data team relies on three metadata layers. The technical layer captures what the data looks like. Business metadata explains what the data means for the organization. Operational metadata shows how the data is used, by whom, and at what cost. Of the three, only the technical layer is automatic. Getting useful business and operational metadata requires deliberate work from you and the tools you choose. Good metadata management starts with understanding that difference.

Article Contents

The three layers of Snowflake metadata
Where Snowflake metadata lives and how to access it
Best practices for Snowflake metadata management
When Snowflake's built-in metadata hits its limit
Frequently asked questions

The three layers of Snowflake metadata

Snowflake metadata divides into three groups. Technical, business, and operational metadata each address a different question and need different kinds of upkeep. Projects that flatten all three into one generic "metadata" bucket tend to push responsibility onto the wrong tool or the wrong team, and they stall before they produce much value.

Technical metadata

Technical metadata describes the physical shape of your data. Object names, schemas, column data types, row counts, storage size, and timestamps all fall into this category. Snowflake captures technical metadata automatically and exposes it through system views such as INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.TABLES. A new table created at two in the morning shows up in these views without any configuration. Most of what a developer needs to inspect the shape of the warehouse lives here.

Business metadata

Business metadata captures what the data means. Definitions, owners, glossary terms, sensitivity classifications, and data contracts all belong in this layer. Snowflake supports the layer through primitives like object tags, classifications, and masking policies. The primitives do not write themselves. Someone has to decide that customer_id in the orders database refers to the same customer as cust_pk in the CRM feed, tag both columns as PII, and record the owner who signs off on changes. Business metadata is how your data becomes something your organization can actually use.

Operational metadata

Operational metadata tracks how the data gets used. Query patterns, warehouse utilization, cost attribution, access history, and lineage all live in this category. Snowflake emits operational metadata through ACCOUNT_USAGE views such as QUERY_HISTORY, ACCESS_HISTORY, and WAREHOUSE_METERING_HISTORY. Those views carry a maximum latency of about three hours and a 365-day retention window. Raw event rows can answer ad hoc questions on their own, and they deliver the most value at scale once someone aggregates, models, and surfaces them for the people who make budget and policy decisions.

Every modern data program depends on all three layers working together. Governance, lineage, and AI-ready context rely on trustworthy business and operational metadata, not just the technical information Snowflake already ships with. The rest of this article walks through where each kind of metadata lives, how to get at it, and what to do with it once you have it.

Where Snowflake metadata lives and how to access it

Snowflake keeps metadata in the cloud services layer, not inside virtual warehouses. That separation matters because metadata queries do not touch your production data and never compete with analytical workloads for storage or I/O. Most metadata queries still run through a warehouse, so they consume credits while the warehouse is awake. The practical question is which metadata surface fits the query you are about to run. Four surfaces cover the vast majority of cases.

INFORMATION_SCHEMA queries

Every Snowflake database automatically includes a read-only schema called INFORMATION_SCHEMA. The views and table functions inside cover the database they live in and give you near real-time visibility into its structure. They are the right tool when you want the current state of something.

SELECT table_schema, table_name, row_count, bytes
FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema = 'SALES'
ORDER BY bytes DESC;

Queries against INFORMATION_SCHEMA are usually fast and light on compute because the views cover one database and return fresh data from the metadata store. Use the schema for current-state checks, catalog building within a database, and ad hoc structural inspection. The one caveat is that several INFORMATION_SCHEMA table functions (such as QUERY_HISTORY) carry shorter retention than their ACCOUNT_USAGE counterparts, so reaching for the account-wide view is the right move when you need more than a few days of history.

ACCOUNT_USAGE queries

ACCOUNT_USAGE is a shared database that Snowflake provides to every account. It covers the entire account rather than a single database, retains up to 365 days of history, and carries a latency of 45 minutes to roughly three hours depending on the view. The views range from QUERY_HISTORY and ACCESS_HISTORY to WAREHOUSE_METERING_HISTORY and POLICY_REFERENCES, giving a governance team or platform engineer one place to see what actually happened across the account.

SELECT query_type, user_name, warehouse_name, start_time
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time > DATEADD(day, -30, CURRENT_TIMESTAMP())
ORDER BY start_time DESC;

Because ACCOUNT_USAGE scans historical metadata across the whole account, queries against it tend to be slower and more compute-intensive than equivalent INFORMATION_SCHEMA queries. Snowflake's own documentation calls out the distinction. Audits, governance reviews, and anything that needs historical depth belong in ACCOUNT_USAGE. For current-state checks where you need an answer now, stay with INFORMATION_SCHEMA.

SHOW and DESCRIBE commands

SHOW and DESCRIBE are quick inspection commands that return metadata about a single object without writing a full SQL statement. They are the fastest way to answer questions like "what columns does this table have?" or "what users can access this database?" Both commands work at any Snowflake tier, and the results are convenient for ad hoc inspection during development or incident response.

SHOW TABLES IN SCHEMA sales;
DESCRIBE TABLE sales.customers;

The output is a result set you can inspect, but you cannot add a WHERE clause directly to the command. To filter, wrap the command in RESULT_SCAN after execution. Use SHOW and DESCRIBE when a one-line answer is enough. For anything that needs filtering, joining, or reuse in downstream logic, INFORMATION_SCHEMA is the better fit.

Metadata functions

Snowflake also exposes a set of metadata functions that return system information inline within other SQL. Common examples include CURRENT_DATABASE(), CURRENT_SCHEMA(), and LAST_QUERY_ID(), plus staged-file functions such as METADATA$FILENAME that expose file-level context when loading external stages. These functions pair well with the three surfaces above for filtering, correlating, and tagging records during ingestion.

Knowing which surface to query is table stakes for productive Snowflake work. Together, INFORMATION_SCHEMA (current state), ACCOUNT_USAGE (historical context), SHOW and DESCRIBE (quick inspection), and metadata functions (inline lookups) cover the vast majority of metadata needs. Collate's Snowflake learning center goes deeper on the access patterns with more examples and edge cases.

Best practices for Snowflake metadata management

The practices below show up in data teams that have scaled Snowflake without losing control. They are low-drama and easy to adopt. Each one addresses a failure mode that costs real time or money when ignored.

Standardize naming conventions across all objects

Start with a naming standard that covers databases, schemas, tables, columns, and tags. Encode environment (prod, stage, dev), domain (sales, finance, marketing), and owner where it fits. Without a standard, the same customer table shows up under three spellings across schemas, and no one can tell which version is authoritative. A naming convention is the cheapest metadata investment you will make, and the payoff runs for years.

Query INFORMATION_SCHEMA for real-time checks

Current-state questions belong in INFORMATION_SCHEMA. What columns does this table have today? Which schemas currently exist? INFORMATION_SCHEMA returns fresh answers at low compute cost. Teams that default to ACCOUNT_USAGE for every metadata question pay compute they did not need to pay and receive data that is up to three hours out of date.

Reach for ACCOUNT_USAGE for historical governance reviews

Audits, quarterly access reviews, and compliance reporting all need history rather than current state. ACCOUNT_USAGE retains up to 365 days of events, which makes it the right surface for governance work. A team that runs access reviews against INFORMATION_SCHEMA misses every revoked permission and every deleted object, because that history only lives in ACCOUNT_USAGE.

Mine query history to find performance problems

When a warehouse starts burning credits, the answer is in QUERY_HISTORY. Sort recent queries by execution time, look at the users and workloads driving them, and spot the patterns before they turn into a budget conversation. Teams that skip the metadata step end up upsizing warehouses to mask symptoms rather than fix the queries that cause them.

Build lineage from metadata, not word of mouth

Lineage is how you answer "if I change this column, what breaks?" Within Snowflake, ACCESS_HISTORY provides column-level lineage through its BASE_OBJECTS_ACCESSED and DIRECT_OBJECTS_ACCESSED columns (with column-level write lineage in the baseSources and directSources fields nested inside OBJECTS_MODIFIED), which lets you trace a column's origin through the queries that built it. For lineage across BI tools, pipelines, or ML features, you need a catalog layer that ingests Snowflake metadata alongside the rest of the stack. Collate's data lineage solution is designed for exactly that case.

The five practices compound on each other. Standard naming makes INFORMATION_SCHEMA and ACCOUNT_USAGE queries readable. Access history feeds both performance reviews and lineage. Governance reviews get easier when lineage already exists. Treat Snowflake metadata management as a program rather than a series of one-off queries, and every later practice delivers more value.

When Snowflake's built-in metadata hits its limit

Snowflake covers the technical metadata layer well. The access surfaces (INFORMATION_SCHEMA, ACCOUNT_USAGE, ACCESS_HISTORY, plus SHOW and DESCRIBE) give platform engineers and data teams a comprehensive view of what their Snowflake estate looks like and how it is used. For many teams, the built-in surfaces are enough for day-to-day operations inside the warehouse.

The limits show up at the edges of Snowflake itself. Business metadata, the layer that turns raw objects into something a business analyst can act on, still needs humans to define terms, decide ownership, and classify sensitivity. Operational metadata arrives in ACCOUNT_USAGE as raw event rows that need aggregation and modeling before they can drive a budget conversation. Cross-platform lineage (Snowflake to BI, Snowflake to pipelines, Snowflake to ML features) sits outside Snowflake's field of view. Semantic consistency across tools and trustworthy context for AI systems that pull from your entire estate both require tooling that spans the stack rather than living inside a single warehouse.

That is the work Collate is built for. The Collate’s AI for Data semantic intelligence platform builds on top of Snowflake metadata and the rest of your stack to produce the business and operational layers that Snowflake alone does not provide. With those layers in place, your team can trust the data, govern it, and feed it into AI systems that depend on clean context.

Frequently asked questions

What is Snowflake metadata?

Snowflake metadata is information about the objects inside your Snowflake account, including their structure, usage, and governance. Technical details like column types and row counts, business context like ownership and sensitivity classifications, and operational signals like query history and access logs all fall under the term. Snowflake captures the technical details automatically and exposes primitives (tags, classifications, raw event rows) that data teams and catalog tools use to build the business and operational layers on top.

Where is Snowflake metadata stored?

Snowflake stores metadata in the cloud services layer, separate from virtual warehouses that run your analytical workloads. Users access that metadata through four main surfaces, which include the INFORMATION_SCHEMA of each database, the account-wide SNOWFLAKE.ACCOUNT_USAGE shared database, the SHOW and DESCRIBE commands, and metadata functions like CURRENT_DATABASE() and METADATA$FILENAME. Most metadata queries run through a warehouse and consume credits while the warehouse is awake, though the compute footprint is usually much smaller than analytical workloads.

When should I use INFORMATION_SCHEMA versus ACCOUNT_USAGE?

Use INFORMATION_SCHEMA for current-state questions about a single database, such as what columns a table has right now or how many rows it contains. Queries against the schema are fast and cheap because they return fresh metadata without scanning large amounts of history. For historical questions across the entire account, like audits, access reviews, and usage trends, reach for SNOWFLAKE.ACCOUNT_USAGE instead. ACCOUNT_USAGE retains up to 365 days of events and carries a latency of 45 minutes to about three hours. The extra historical depth costs more compute, so match the surface to the question rather than defaulting to one or the other.

Can Snowflake track data lineage on its own?

Snowflake tracks column-level lineage inside the platform through the ACCESS_HISTORY view, which records the direct and base objects each query touches. Teams can reconstruct how a column was populated and trace dependencies between Snowflake objects without an external tool. What Snowflake does not track is lineage across tools outside the warehouse, such as BI dashboards, data pipelines, and ML feature stores. For full end-to-end lineage across the data stack, you need a catalog or metadata platform that ingests Snowflake lineage alongside metadata from your other tools.

What are the best practices for Snowflake metadata management?

Five practices cover the majority of the work. Standardize naming across databases, schemas, and tables so teams agree on which object is authoritative. Match the access surface to the question by using INFORMATION_SCHEMA for real-time state and ACCOUNT_USAGE for historical work. Mine QUERY_HISTORY when warehouse costs spike. Build lineage from ACCESS_HISTORY rather than word of mouth, and extend it across tools with a catalog layer. Collate's Snowflake learning center has more on each pattern.

Does querying Snowflake metadata consume warehouse credits?

In most cases, yes. Queries against INFORMATION_SCHEMA and ACCOUNT_USAGE views run through a virtual warehouse and consume credits for as long as the warehouse is running. The practical difference is volume. INFORMATION_SCHEMA queries scan a single database and return fresh data at low compute cost. ACCOUNT_USAGE queries scan historical metadata across the entire account and typically use more compute. SHOW and DESCRIBE commands typically run without a warehouse, returning metadata from the cloud services layer at no warehouse credit cost. The rule of thumb is to reach for the lightest surface that answers your question.