Bringing Order to the Google Maze: Collate's New Google Drive Connector

Introduction

Most data platforms are built for data engineers, which makes sense, since they create pipelines, write SQL, and manage warehouses. However, data engineers make up only about 20% of those working with data. The other 80% tend to use tools like spreadsheets and documents, leaving those data assets out of view in your broader data ecosystem. Collate's Google Drive connector addresses this gap, bringing data governance and discovery capabilities to where most business users actually work.

Why Connect a Data Catalog to Google Drive?

Even in companies with robust data infrastructure, information frequently gets exported to spreadsheets. Someone pulls data from HubSpot or LinkedIn into Google Sheets because it's easier to manipulate and analyze there. Finance teams build complex models with multiple worksheets handling inputs, calculations, and outputs. Sales teams track prospects in shared spreadsheets that evolve into customer records.

These spreadsheets become disconnected from the source systems, creating shadow data assets that the data team can't track or govern. The Google Drive connector brings them back into view.

Setting Up the Connection

The setup process follows Collate's standard connector pattern. Navigate to Settings > Services, where you'll find a new "Drives" category. Currently, Google Drive is available with OneDrive, SharePoint, and Microsoft Teams on the roadmap. The process looks like this:

1. Navigate to Settings: Begin by accessing the services section in Collate's settings.

Navigate to Settings

2. Add New Drives Service: Select “Services”, then “Drives”, then "Add New Service" and select GoogleDrive in the service list.

Service 1 Drive 1 Service 3 Drive item

3. Configure Connection Details:

Connection requires credentials from the Google Cloud Console; specifically, a service account with a project ID, a private key, and an X.509 certificate. Collate's built-in secrets manager automatically masks sensitive credentials, even during screen shares or documentation.

For organization-level access to shared drives, you'll need to configure delegated email permissions through your Google Cloud admin. This allows Collate to associate changes and ingestions with specific users while respecting your existing access controls.

The connector includes filtering options to limit ingestion to specific directories, spreadsheets, or even individual worksheets within a workbook. You can hide particular sheets from discovery if needed, maintaining appropriate access boundaries.

Connection Details

What Gets Ingested

Once connected, Collate's worker agent can run on-demand or on a schedule to ingest metadata from Google Drive. The connector captures the complete hierarchy: directories (folders), files, spreadsheets (workbooks), and worksheets (individual sheets).

For Google Sheets specifically, the connector extracts column-level metadata, including names and inferred types. This is significant; spreadsheets typically don't provide schema information until you convert them into proper tables. Collate does this inference automatically, adding structure to unstructured data.

Users on Google Workspace enterprise licenses can also ingest existing tags and labels, pre-populating metadata that's already been defined in the Google ecosystem. Collate's AI capabilities can also generate descriptions and tags for assets that lack them.

Discovery and Context

Now, when you search Collate's catalog, Google Drive assets appear alongside database tables, dashboards, and other data assets. Each item displays its type (directory, file, spreadsheet, worksheet) and hierarchical relationships.

The "View in Google Drive" button provides direct links back to the source and appears in directories, spreadsheets, and individual worksheets. This pattern extends across Collate's connectors; Snowflake tables and Power BI reports have similar quick-access links, for example.

For spreadsheets, drilling into a worksheet reveals the extracted schema: every column with its inferred data type. This metadata lives in Collate rather than cluttering the spreadsheet with comments and notes that get easily overlooked.

Tracking Changes and Deletions

The connector marks assets as deleted when they're removed from Google Drive, while preserving their metadata and version history. This solves a common problem: finding outdated spreadsheets that should no longer be used.

Instead of hunting through folders trying to determine which version is current, you can see the deletion status directly in the catalog. The version history shows when the asset was first ingested and when it was marked deleted, helping users understand what happened and find the correct replacement.

Lineage for Spreadsheets

One of the more powerful features is lineage mapping for spreadsheet workflows. You can manually connect worksheets to show data flow; for example, linking a "prospects" sheet to a "customers" sheet, then to a downstream dashboard.

This proves especially valuable in finance, where complex models involve multiple interconnected worksheets with inputs feeding calculations that produce financial statements. The lineage view makes these dependencies explicit and auditable.

In traditional database workflows, Collate automatically generates lineage from SQL queries and ETL processes. While spreadsheet lineage requires manual mapping, having any lineage documentation for spreadsheet-based workflows represents a significant improvement over having nothing.

Governance Without Friction

The Google Drive connector extends governance to tools people are already widely using. Rather than forcing everyone into databases and BI tools, it recognizes that spreadsheets serve legitimate purposes and brings them into the governance framework.

Users continue working with familiar tools that offer self-service capabilities. The data team gains visibility into what data exists, where it lives, who owns it, and how it flows between systems. Metadata, domains, and tier classifications are inherited down through the hierarchy, maintaining consistency without manual duplication.

Conclusion

The Drives category represents a broader vision: bringing cataloging and governance to collaboration platforms where knowledge work happens. With OneDrive and SharePoint coming soon, Collate will cover all major productivity suites.

For organizations navigating the maze of documents and spreadsheets scattered across these platforms, having a unified catalog that treats them as first-class data assets alongside databases and warehouses makes increasingly good sense. The 80% of users working outside traditional data tools deserve governance capabilities too.

To explore further, consider the Collate Free Tier for managed OpenMetadata or the Product Sandbox with demo data.

Are you ready to change how data works for you?
Get Started Now