Showing 76 open source projects for "metadata"

View related business solutions
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather it's a quality control tool to examine, verify or polish up a dataset before further processing.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 2
    DataHub

    DataHub

    The Metadata Platform for your Data and AI Stack

    DataHub is an open source metadata platform that helps organizations discover, understand, and trust their data assets at scale. It models data as a richly connected graph spanning datasets, dashboards, pipelines, ML features, and services, so users can explore relationships like lineage and ownership across tools and domains. The platform focuses on continuous metadata ingestion from many sources, treating metadata as a stream that stays fresh as systems change. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Apache Polaris

    Apache Polaris

    Apache Polaris, the interoperable, open source catalog

    Apache Polaris is an open-source metadata catalog and data management service designed to manage Apache Iceberg tables in modern data lakehouse environments. It provides a centralized catalog that allows multiple compute engines and analytics systems to interact with the same datasets through a standardized interface. By implementing the Iceberg REST catalog API, Polaris enables distributed data platforms to access shared table metadata without tightly coupling storage systems and query engines. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Link-Preview-JS

    Link-Preview-JS

    Extract web links information: title, description, images, videos, etc

    link-preview-js is a lightweight TypeScript library that extracts metadata from URLs or HTML content to generate rich link previews. By parsing Open Graph tags and other metadata, it retrieves information such as titles, descriptions, images, and videos. Designed primarily for Node.js and mobile environments, it facilitates the creation of link previews similar to those found on social media platforms.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    nb-clean

    nb-clean

    Clean Jupyter notebooks of outputs, metadata, and empty cells

    nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs, and (optionally) empty cells, preparing them for committing to version control. It provides both a Git filter and pre-commit hook to automatically clean notebooks before they're staged, and can also be used with other version control systems, as a command line tool, and as a Python library. It can determine if a notebook is clean or not, which can be used as a check in your continuous integration pipelines. nb-clean can also be used as a pre-commit hook. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Symfony PropertyInfo

    Symfony PropertyInfo

    Extracts information about PHP class' properties using metadata

    Symfony PropertyInfo is a component that extracts information about the properties of PHP classes, such as their names, types, visibility, and documentation. It is particularly useful in scenarios like serialization, form generation, and validation, where understanding the structure of an object is essential. PropertyInfo can fetch data from PHPDoc annotations, reflection, and type hints, offering flexible integration with Symfony and other systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    JDF.jl

    JDF.jl

    Julia DataFrames serialization format

    JDF is a DataFrames serialization format with the following goals, fast save and load times, compressed storage on disk, enabled disk-based data manipulation (not yet achieved), and support for machine learning workloads, e.g. mini-batch, sampling (not yet achieved). JDF stores a DataFrame in a folder with each column stored as a separate file. There is also a metadata.jls file that stores metadata about the original DataFrame. Collectively, the column files, the metadata file, and the folder is called a JDF "file". JDF.jl is a pure-Julia solution and there are a lot of ways to do nifty things like compression and encapsulating the underlying struture of the arrays that's hard to do in R and Python. E.g. Python's numpy arrays are C objects, but all the vector types used in JDF are Julia data types.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Metacrafter

    Metacrafter

    Metadata and data identification tool and Python library

    Python command line tool and Python engine to label table fields and fields in data files. It could help to find meaningful data in your tables and data files or to find Personal identifiable information (PII). Metacrafter is a rule-based tool that helps to label fields of the tables in databases. It scans table and finds person names, surnames, midnames, PII data, basic identifiers like UUID/GUID. These rules written as .yaml files and could be easily extended.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    nw_wrld

    nw_wrld

    nw_wrld is an event-driven sequencer for triggering visuals

    ...The system is designed to be extensible, letting developers plug in new generation rules or tweak parameters with real-time previews so they can iterate rapidly on world design. It also includes utilities to derive metadata from worlds, such as climate distributions, strategic points of interest, and navigable paths, which can be consumed by gameplay systems or AI agents. For teams building games or simulations, nw_wrld provides a reusable foundation that reduces upfront world design costs while enabling endless variety.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    HDF5.jl

    HDF5.jl

    Save and load data in the HDF5 file format from Julia

    HDF5 stands for Hierarchical Data Format v5 and is closely modeled on file systems. In HDF5, a "group" is analogous to a directory, a "dataset" is like a file. HDF5 also uses "attributes" to associate metadata with a particular group or dataset. HDF5 uses ASCII names for these different objects, and objects can be accessed by Unix-like pathnames, e.g., "/sample1/tempsensor/firsttrial" for a top-level group "sample1", a subgroup "tempsensor", and a dataset "firsttrial". For simple types (scalars, strings, and arrays), HDF5 provides sufficient metadata to know how each item is to be interpreted. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    pointblank

    pointblank

    Data quality assessment and metadata reporting for data frames

    ...Sometimes, we want to maintain table information and update it when the table goes through changes. For that, we can use an informant object plus associated functions to help define the metadata entries and present it as a data dictionary.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Semantic Type Detection

    Semantic Type Detection

    Metadata/data identification Java library

    Metadata/data identification Java library. Identifies Base Type (e.g. Boolean, Double, Long, String, LocalDate, LocalTime, ...) and Semantic Type information (e.g. Gender, Age, Color, Country, ...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support. Large set of built-in Semantic Types (extensible via JSON defined plugins).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    CKAN

    CKAN

    CKAN is an open-source DMS for powering data hubs

    CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work with data. It's a data management system that provides a powerful platform for cataloging, storing and accessing datasets with a rich front-end, full API (for both data and catalog), visualization tools and more.CKAN is used by national and regional government organizations throughout the European Union, the Americas, Asia, and Oceania to power a variety of official and community data...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 14
    Backstage

    Backstage

    Backstage is an open platform for building developer portals

    Powered by a centralized software catalog, Backstage restores order to your infrastructure and enables your product teams to ship high-quality code quickly, without compromising autonomy. At Spotify, we've always believed in the speed and ingenuity that comes from having autonomous development teams. But as we learned firsthand, the faster you grow, the more fragmented and complex your software ecosystem becomes. And then everything slows down again. By centralizing services and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Krylov.jl

    Krylov.jl

    A Julia Basket of Hand-Picked Krylov Methods

    If you use Krylov.jl in your work, please cite it using the metadata given in CITATION.cff.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    GoldenCheetah

    GoldenCheetah

    Performance Software for Cyclists, Runners, Triathletes and Coaches

    ...Upload and Download with many cloud services including Strava, Withings, and Today's Plan. Import and export data to and from a wide range of bike computers and file formats. Track body measures, and equipment use and set your own metadata to track. GoldenCheetah provides tools for users to develop their own metrics, models, and charts. We believe that cyclists and triathletes should be able to download their power data to the computer of their choice, analyze it in whatever way they see fit, and share their methods of analysis with others.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    NCDatasets.jl

    NCDatasets.jl

    Load and create NetCDF files in Julia

    NCDatasets allows one to read and create netCDF files. NetCDF data set and attribute list behave like Julia dictionaries and variables like Julia arrays. This package implements the CommonDataModel.jl interface, which means that the datasets can be accessed in the same way as GRIB files opened with GRIBDatasets.jl.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Literate

    Literate

    Simple package for literate programming in Julia

    ...Literate can generate markdown pages (for e.g. Documenter.jl), and Jupyter notebooks, from the same source file. There is also an option to "clean" the source from all metadata, and produce a pure Julia script. Using a single source file for multiple purposes reduces maintenance, and makes sure your different output formats are synced with each other.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    ...Most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). File sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata. Mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint). Comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    GeoNode

    GeoNode

    GeoNode is an open source platform for geospatial data

    ...It brings together mature and stable open-source software projects under a consistent and easy-to-use interface allowing non-specialized users to share data and create interactive maps. Data management tools built into GeoNode allow for integrated creation of data, metadata, and map visualization. Each dataset in the system can be shared publicly or restricted to allow access to only specific users. Social features like user profiles and commenting and rating systems allow for the development of communities around each platform to facilitate the use, management, and quality control of the data the GeoNode instance contains. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Querybook

    Querybook

    Big Data Querying UI, combining collocated table metadata

    Querybook is Pinterest’s open-source big data IDE via a notebook interface. Querybook’s core focus is to make composing queries, creating analyses, and collaborating with others as simple as possible. Organize rich text, queries, and charts into a notebook to easily document your analyses. Work collaboratively with others in a DataDoc and get real-time updates. The Query Editor is aware of your tables and their columns, as such it provides autocompletion, syntax highlighting, and the ability...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    ...Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    targets

    targets

    Function-oriented Make-like declarative workflows for R

    The targets package is a pipeline / workflow management tool in R, designed to coordinate multi‐step computational workflows in data science / statistics. It tracks dependencies between “targets” (computational steps), skips steps whose upstream data or code hasn’t changed, supports parallel computation, branching (dynamic generation of sub‐targets), file format abstractions, and encourages reproducible and efficient analyses. It’s something like GNU Make for R, but more integrated. Skipping...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    errsole.js

    errsole.js

    Collect, Store, and Visualize Logs with a Single Module

    Errsole is an open-source logger for Node.js. It has a built-in web dashboard to view, filter, and search your app logs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB