Best Metadata Management Software for Apache Spark

Compare the Top Metadata Management Software that integrates with Apache Spark as of December 2025

This a list of Metadata Management software that integrates with Apache Spark. Use the filters on the left to add additional filters for products that have integrations with Apache Spark. View the products that work with Apache Spark in the table below.

What is Metadata Management Software for Apache Spark?

Metadata management software enables users and organizations to manage, identify, fetch, and analyze metadata. Metadata management software streamlines the processes involved with managing metadata. Compare and read user reviews of the best Metadata Management software for Apache Spark currently available using the table below. This list is updated regularly.

  • 1
    DataHub

    DataHub

    DataHub

    DataHub Cloud is an event-driven AI & Data Context Platform that uses active metadata for real-time visibility across your entire data ecosystem. Unlike traditional data catalogs that provide outdated snapshots, DataHub Cloud instantly propagates changes, automatically enforces policies, and connects every data source across platforms with 100+ pre-built connectors. Built on an open source foundation with a thriving community of 13,000+ members, DataHub gives you unmatched flexibility to customize and extend without vendor lock-in. DataHub Cloud is a modern metadata platform with REST and GraphQL APIs that optimize performance for complex queries, essential for AI-ready data management and ML lifecycle support.
    Starting Price: $75,000
    View Software
    Visit Website
  • 2
    Inferyx

    Inferyx

    Inferyx

    Move past application silos, cost overrun, and skill obsolescence to scale faster with our intelligent data and analytics platform. An intelligent platform built to perform data management and advanced analytics. Helps you scale across the technology landscape. Our architecture understands how data flows and transforms throughout its lifecycle. Enabling the development of future-proof enterprise AI applications. A highly modular and extensible platform that enables the handling of multifold components. Designed to scale with a multi-tenant architecture. Analyzing complex data structures is made easy using advanced data visualization. Resulting in enhanced enterprise AI app development in an intuitive and low-code predictive platform. Our unique hybrid multi-cloud platform is built using open source community software which makes it immensely adaptive, highly secure, and essentially low-cost.
    Starting Price: Free
  • 3
    Kylo

    Kylo

    Teradata

    Kylo is an open source enterprise-ready data lake management software platform for self-service data ingest and data preparation with integrated metadata management, governance, security and best practices inspired by Think Big's 150+ big data implementation projects. Self-service data ingest with data cleansing, validation, and automatic profiling. Wrangle data with visual sql and an interactive transform through a simple user interface. Search and explore data and metadata, view lineage, and profile statistics. Monitor health of feeds and services in the data lake. Track SLAs and troubleshoot performance. Design batch or streaming pipeline templates in Apache NiFi and register with Kylo to enable user self-service. Organizations can expend significant engineering effort moving data into Hadoop yet struggle to maintain governance and data quality. Kylo dramatically simplifies data ingest by shifting ingest to data owners through a simple guided UI.
  • Previous
  • You're on page 1
  • Next