# Data Discovery

Find and understand your data by leveraging AI and rich semantics

[Get Collate Free](/welcome)

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%271400%27%20height=%27600%27/%3e)![Data Discovery](/images/data-discovery/data-discovery-header.webp)

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2744%27%20height=%2744%27/%3e)![Find Anything Instantly](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

![Find Anything Instantly](/images/platform/build-trust.svg)

Find Anything Instantly

Spend less time searching for the right data and more time using it

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2744%27%20height=%2744%27/%3e)![Trust What You Find](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

![Trust What You Find](/images/platform/speed-up-insight.svg)

Trust What You Find

Ensure the data you find is tested, healthy, reliable, and ready for use

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2744%27%20height=%2744%27/%3e)![Act with Confidence](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

![Act with Confidence](/images/data-quality/ai-icon.svg)

Act with Confidence

Get the information you need to make informed, confident decisions

## Find Anything Instantly

Leverage discovery tools and methods in the platform to find the data you need

Over 120 native connectors

Capture metadata from all your assets including Snowflake, Databricks, and many other data platforms

Fast, intelligent search

Full-text search engine based on Elasticsearch searches over all assets and metadata, AI-powered natural language search, and browsing with filtering

Search relevancy settings

Customizable search relevance lets you tune search results to boost up the rankings of the data that is more relevant to you

Discovery via lineage and ER diagrams

Lineage visualization helps you find related assets within a pipeline, and full ER diagramming capabilities let you discover data via relationships

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27780%27%20height=%27550%27/%3e)![\[object Object\]](/images/data-discovery/find-anything-instantly.png)

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27780%27%20height=%27550%27/%3e)![\[object Object\]](/images/data-discovery/find-anything-instantly.png)

## Trust What You Find

View metrics, data quality test results, and usage patterns to validate your data

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27780%27%20height=%27550%27/%3e)![\[object Object\]](/images/data-discovery/trust-what-you-find.png)

Quality metrics

System, table, and column level metrics like values count, values percentage, null percentage, mean, standard deviation.

Data quality test results

View data quality test results to understand the quality of the data you found, leverage AI to suggest tests and run them

Rich metadata

Understand data with complete documentation including tags, descriptions, tiers, descriptions, owners; AI-based auto-classification accelerates the tagging effort

Certification and social validation

Conversation threads to encourage collaboration, activity feeds to display updates like schema changes

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27780%27%20height=%27550%27/%3e)![\[object Object\]](/images/data-discovery/trust-what-you-find.png)

## Act with Confidence

Leverage the query UI, collaboration, and impact analysis to take action confidently

SQL Studio

Run queries within the Collate UI to explore data in one place instead of jumping to separate query interfaces

Peer collaboration tools

Peruse discussions, activity feeds, and usage patterns to better understand how data is used

Self-service usage (with access controls)

Persona-based access controls to let all types of users find and use data

Impact analysis

Proactively notify owners of upstream and downstream impact to avoid broken pipelines

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27780%27%20height=%27550%27/%3e)![\[object Object\]](/images/data-discovery/act-with-confidence.png)

![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27780%27%20height=%27550%27/%3e)![\[object Object\]](/images/data-discovery/act-with-confidence.png)

## Built for modern data & AI practices

Designed for changing needs of data & AI teams

AI-Driven Automation

Improve productivity, enforce governance and reduce costs with AI driven automation

Unified Platform

One platform for all your teams for data discovery, observability and governance

Collaborate Around Data

Accelerate development of data assets with social workspaces and knowledge centers

Get started with Collate today for free

[Get Collate Free](/welcome)

Managed Service for Production Data Teams

[Book a Demo](/contact-sales)

## FAQs

Expand All

What types of data discovery techniques are in Collate?+

Collate supports full-text search, AI-based natural language search, browsing, filtering, and discovery via relationships, i.e., through lineage graphs and through entity-relationship diagrams (ERDs).

What are the key advantages of the Collate Semantic Intelligence Platform for data discovery?+

First, Collate provides all the search capabilities you need to find or explore data, including an AI-based natural language search assistant. Combined with a rich metadata specification, the platform lets you find data that is relevant to your query without necessarily matching the exact terms (e.g., search for “clients” and also get “customers” data). Second, Collate provides a comprehensive data quality testing environment that lets you understand the health of your data. This lets you understand that the data you find is reliable and ready for use. Third, Collate provides querying and collaboration tools so you can get further validation on the usefulness of the data you find.

Why is Collate data discovery powerful?+

Collate data discovery is powerful because the platform captures all the information you need, along with relationships between metadata (in a “unified semantic graph”), to provide a complete understanding of your data. This makes discovery more effective, as the system lets you find information that might not be explicitly in your search terms, but still relevant. Collate also supports Search Relevancy Settings that let you configure the weighting of queries at a per-user level, allowing the system to return the data that is most relevant to that particular user.

What is data discovery?+

Data discovery is the practice of looking for data you need via search, browse, and other exploration methods. It is typically only one part of a bigger data management strategy that also includes capabilities around data quality, data observability, lineage, and governance.

Why is data discovery hard?+

Data discovery is hard because data requires a lot of context for it to be effectively searchable by any user. Tagging is one way to add context so that search engines can find the data that users seek. But even more important is including data meaning in the equation. This is addressed in the Collate Semantic Intelligence Platform via a unified semantic graph, which captures relationships between terms. For example, you might have data tagged as “PII” but then have a semantic graph that links PII with GDPR. If a user searches for GDPR, they will get the data tagged with PII because the system saw the association between PII and GDPR. Note that the data didn’t have to be explicitly tagged with GDPR, which results in significant time savings from the effort of tagging all data sets that are tagged with PII. This is just one simple example of how a semantic graph can be useful in discovery. Some relationships might be very complex, but fortunately in Collate, you only need to define the relationship once in the system and all relevant assets will inherit those relationships. Another difficulty around discovery is that just because you found the data that matches your search criteria doesn’t mean the data is ready for use. You might find data that hasn’t been tested, or is not production ready. Without appropriate metadata, you might end up with a garbage-in-garbage-out situation. Data discovery capabilities necessarily need to incorporate data health and quality information, along with usage patterns to validate the data set is ready for production use. Collate provides all the information you need as part of its data discovery capabilities to ensure that the data you find is trusted, reliable, and ready for use.

What is self-service data access?+

Self-service data access is the practice of giving tools to any type of user so they can take advantage of data, as long as they’re given the appropriate permissions. This practice frees up data teams from searching for data on behalf of users, allowing them to focus on more strategic tasks.

What is impact analysis?+

Impact analysis is the practice of understanding how changes you make will affect other assets. This is an important practice that safeguards against the risk of breaking your data operations because of a change. This is an important aspect of data discovery because if you find data that you need to modify/transform, you want to make sure that your changes are safe and won’t cause a disaster.

How can I identify sensitive data for compliance?+

The ability to find sensitive data for compliance depends on proper tagging of your data. This tagging often starts as a manual process, but automated techniques involving AI tagging, metadata propagation, and reverse metadata help to tag data across its journey. In a typical system, data needs to be tagged with all the right terms for it to be properly findable. For example, you might need to tag a data set with GDPR, CCPA, PCI, HIPAA, and DPDP. But with Collate, you can ask AI to tag sensitive data with PII and then create a semantic graph that connects PII to all of the privacy regulations. And then you can use metadata propagation to further label related data sets as PII. Now any user can search for PII and get all the data sets that are covered by the various regulations.

How do you include data sources in the platform?+

In Collate, all assets are first run through a “metadata ingestion” process that involves pointing the native connector to the data asset. Then a profiler is run to collect metrics. From there, all collected metadata is useful not only for discovery, but also as a starting point for data quality, lineage, and governance. In other words, the process for setting up assets for discovery also gets you most of the data you need to run other data management practices in Collate, so there’s no added complexity for doing more.