Connect Your MySQL Database to Collate: A Step-by-Step Guide

Introduction

Collate is an AI-enabled platform designed to help data teams organize, govern, and optimize their data assets. It focuses on automating tasks like data discovery, quality assurance, observability, and compliance to boost productivity and reduce costs. By using Collate, you can quickly find and collaborate on key data assets across various sources, with over 90 connectors. Generate secure and permission-aware insights from a unified knowledge graph, ensure regulatory compliance, such as GDPR, and build a self-service data culture that accelerates development and problem resolution.

One of the most popular Collate connectors is the widely used open-source database, MySQL. This blog is going to give you a quick primer on what is involved in connecting Collate to MySQL, and get an Entity Relationship Diagram (ERD). In the ever-evolving landscape of data management, connecting and observing your databases has become crucial for organizations seeking to maximize their data insights. Collate offers a powerful solution for MySQL database integration that simplifies metadata management, governance, and observability. A companion video to this blog is available on YouTube.

Initial Connection Setup

Starting from Collate:

1. Navigate to Settings: Begin by accessing the services section in Collate's settings.

Navigate to Settings

2. Add New Database Service: Select “Services”, then “Databases”, then "Add New Service" and search for MySQL in the service list.

Service 1 Service 2 Service 3 Service 4

3. Configure Connection Details:

  • Enter a descriptive name for your database service
  • Provide database credentials (username, password)
  • Specify the host endpoint and port
Connection Details
  • Authentication: For basic auth, provide the MySQL username and password—these are database credentials, not cloud provider ones (e.g., AWS IAM) unless that is selected. Enter the host and port, such as an AWS RDS endpoint like "mysql-database.us-west-2.rds.amazonaws.com:3306."

  • Granular Database Selection: Ability to connect to specific databases or schemas from a single endpoint.

Press the Test Connection button first to ensure you have connectivity. Note, however, that if the service isn’t running and the test connection is what wakes it up, the first try might fail if it doesn’t start fast enough. If that’s the case, wait a couple of minutes and try the test again; it should work. It is always best practice to test your connection. Once done, click Next.

Collate uses filters to control what data is ingested, databases, schemas, or tables, via names or regular expressions (regex). Out of the box, it excludes system schemas, such as "information_schema" or "performance_schema", to focus on user data.

Accept the defaults for a full ingest, or customize: for instance, include only schemas matching "^prod_.*" to target production data. Use the filtering options to control which databases, schemas, or tables are imported, thereby reducing unnecessary bloat.

Filters

Metadata Ingestion and Agents

One of Collate's standout features is its comprehensive agent ecosystem. After establishing the connection, you can leverage multiple agents to extract insights, but the Metadata Agent needs to run first, and that is what we’re going to cover after this brief overview of available agents:

Available Agents

  • Metadata Agent: Brings in database structure and metadata
  • Usage Agent: Captures query patterns and data popularity statistics
  • Lineage Agent: Tracks data lineage and relationships
  • Profiler Agent: Provides detailed table metrics (e.g., row counts)
  • Auto Classification Agent: Automatically categorizes tables
  • DBT Agent: Integrates with DBT for enhanced data transformation insights

We can check the status of the Metadata Agent by navigating to the database service we just created, and selecting Agents.

Agent 1

Here you can see the running agents and their status. Once the Metadata Agent has successfully completed, you will now have enough information to begin performing other tasks.

Agent 2

What Gets Imported

Importantly, Collate emphasizes data privacy by only importing metadata—no sensitive data or sample rows are brought into the system.

Exploring Ingested Data

Post-ingestion, navigate to the Databases tab within your service.

Ingested Data 1

You'll see ingested databases (e.g., "default"). Drill down to schemas— in our demo, an "openmetadata" schema with 49 tables appeared.

Ingested Data 2

Key features to explore:

  • ER Diagram: Click to visualize table relationships. Collate auto-generates this from keys; unconnected tables indicate missing foreign keys, highlighting potential data silos. This observability tool is invaluable for quick audits.
  • Table Details: Select a table like "change_event." View columns with properties such as data types, constraints, tags, and glossary terms. Tags can be added manually (e.g., right-click on a column) or via bulk automations in Collate's governance module.
  • Collaboration via Activity Feeds: Track changes like tag additions or descriptions. Tag users for discussions, fostering team collaboration—e.g., "@JohnDoe, why this description?"
Ingested Data 3

Lineage views show upstream/downstream dependencies, while usage stats reveal popular queries. If you enabled the Profiler, expect metrics like null percentages or distinct values.

Conclusion

Collate's MySQL connector demonstrates how modern data observability platforms can simplify database management. By providing a user-friendly, comprehensive approach to metadata ingestion and exploration, it empowers teams to gain deeper insights with minimal configuration.

Whether you're managing a single database or complex multi-database environments, Collate offers a scalable solution for understanding and governing your data infrastructure. Ready to get started? Sign up for the Collate Free Tier of our managed OpenMetadata Service, or visit the Product Sandbox to try out Collate with demo data.

Read the case study
Mango
Sign up to receive updates for Collate services, events, and products.

Share this article

Are you ready to change how data works for you?
Get Started Now