Showing 61 open source projects for "metadata"

View related business solutions
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather it's a quality control tool to examine, verify or polish up a dataset before further processing.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 2
    Link-Preview-JS

    Link-Preview-JS

    Extract web links information: title, description, images, videos, etc

    link-preview-js is a lightweight TypeScript library that extracts metadata from URLs or HTML content to generate rich link previews. By parsing Open Graph tags and other metadata, it retrieves information such as titles, descriptions, images, and videos. Designed primarily for Node.js and mobile environments, it facilitates the creation of link previews similar to those found on social media platforms.​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Apache Polaris

    Apache Polaris

    Apache Polaris, the interoperable, open source catalog

    Apache Polaris is an open-source metadata catalog and data management service designed to manage Apache Iceberg tables in modern data lakehouse environments. It provides a centralized catalog that allows multiple compute engines and analytics systems to interact with the same datasets through a standardized interface. By implementing the Iceberg REST catalog API, Polaris enables distributed data platforms to access shared table metadata without tightly coupling storage systems and query engines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    nb-clean

    nb-clean

    Clean Jupyter notebooks of outputs, metadata, and empty cells

    nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs, and (optionally) empty cells, preparing them for committing to version control. It provides both a Git filter and pre-commit hook to automatically clean notebooks before they're staged, and can also be used with other version control systems, as a command line tool, and as a Python library. It can determine if a notebook is clean or not, which can be used as a check in your continuous integration pipelines. nb-clean can also be used as a pre-commit hook. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Symfony PropertyInfo

    Symfony PropertyInfo

    Extracts information about PHP class' properties using metadata

    Symfony PropertyInfo is a component that extracts information about the properties of PHP classes, such as their names, types, visibility, and documentation. It is particularly useful in scenarios like serialization, form generation, and validation, where understanding the structure of an object is essential. PropertyInfo can fetch data from PHPDoc annotations, reflection, and type hints, offering flexible integration with Symfony and other systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    JDF.jl

    JDF.jl

    Julia DataFrames serialization format

    JDF is a DataFrames serialization format with the following goals, fast save and load times, compressed storage on disk, enabled disk-based data manipulation (not yet achieved), and support for machine learning workloads, e.g. mini-batch, sampling (not yet achieved). JDF stores a DataFrame in a folder with each column stored as a separate file. There is also a metadata.jls file that stores metadata about the original DataFrame. Collectively, the column files, the metadata file, and the folder is called a JDF "file". JDF.jl is a pure-Julia solution and there are a lot of ways to do nifty things like compression and encapsulating the underlying struture of the arrays that's hard to do in R and Python. E.g. Python's numpy arrays are C objects, but all the vector types used in JDF are Julia data types.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Metacrafter

    Metacrafter

    Metadata and data identification tool and Python library

    Python command line tool and Python engine to label table fields and fields in data files. It could help to find meaningful data in your tables and data files or to find Personal identifiable information (PII). Metacrafter is a rule-based tool that helps to label fields of the tables in databases. It scans table and finds person names, surnames, midnames, PII data, basic identifiers like UUID/GUID. These rules written as .yaml files and could be easily extended.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    HDF5.jl

    HDF5.jl

    Save and load data in the HDF5 file format from Julia

    HDF5 stands for Hierarchical Data Format v5 and is closely modeled on file systems. In HDF5, a "group" is analogous to a directory, a "dataset" is like a file. HDF5 also uses "attributes" to associate metadata with a particular group or dataset. HDF5 uses ASCII names for these different objects, and objects can be accessed by Unix-like pathnames, e.g., "/sample1/tempsensor/firsttrial" for a top-level group "sample1", a subgroup "tempsensor", and a dataset "firsttrial". For simple types (scalars, strings, and arrays), HDF5 provides sufficient metadata to know how each item is to be interpreted. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    pointblank

    pointblank

    Data quality assessment and metadata reporting for data frames

    ...Sometimes, we want to maintain table information and update it when the table goes through changes. For that, we can use an informant object plus associated functions to help define the metadata entries and present it as a data dictionary.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Semantic Type Detection

    Semantic Type Detection

    Metadata/data identification Java library

    Metadata/data identification Java library. Identifies Base Type (e.g. Boolean, Double, Long, String, LocalDate, LocalTime, ...) and Semantic Type information (e.g. Gender, Age, Color, Country, ...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support. Large set of built-in Semantic Types (extensible via JSON defined plugins).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ahoy

    Ahoy

    Simple, powerful, first-party analytics for Rails

    Ahoy is a first-party analytics library built primarily for Ruby on Rails, designed to let applications track visits and events in a clean, integrated way rather than relying on third-party tooling. It stores data in your own database by default, which gives developers full control over what data is captured, how it's processed, and how it’s used, sidestepping privacy concerns of external analytics providers. The library supports Rails, JavaScript, and native apps, making it flexible across...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    Backstage

    Backstage

    Backstage is an open platform for building developer portals

    Powered by a centralized software catalog, Backstage restores order to your infrastructure and enables your product teams to ship high-quality code quickly, without compromising autonomy. At Spotify, we've always believed in the speed and ingenuity that comes from having autonomous development teams. But as we learned firsthand, the faster you grow, the more fragmented and complex your software ecosystem becomes. And then everything slows down again. By centralizing services and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    Krylov.jl

    Krylov.jl

    A Julia Basket of Hand-Picked Krylov Methods

    If you use Krylov.jl in your work, please cite it using the metadata given in CITATION.cff.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GoldenCheetah

    GoldenCheetah

    Performance Software for Cyclists, Runners, Triathletes and Coaches

    ...Upload and Download with many cloud services including Strava, Withings, and Today's Plan. Import and export data to and from a wide range of bike computers and file formats. Track body measures, and equipment use and set your own metadata to track. GoldenCheetah provides tools for users to develop their own metrics, models, and charts. We believe that cyclists and triathletes should be able to download their power data to the computer of their choice, analyze it in whatever way they see fit, and share their methods of analysis with others.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    CKAN

    CKAN

    CKAN is an open-source DMS for powering data hubs

    CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work with data. It's a data management system that provides a powerful platform for cataloging, storing and accessing datasets with a rich front-end, full API (for both data and catalog), visualization tools and more.CKAN is used by national and regional government organizations throughout the European Union, the Americas, Asia, and Oceania to power a variety of official and community data...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    iRODS

    iRODS

    Open Source Data Management Software

    The Integrated Rule-Oriented Data System (iRODS) is open-source data management software used by research, commercial, and governmental organizations worldwide. iRODS is released as a production-level distribution aimed at deployment in mission-critical environments. It virtualizes data storage resources, so users can take control of their data, regardless of where and on what device the data is stored. The development infrastructure supports exhaustive testing on supported platforms; plugin...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    NCDatasets.jl

    NCDatasets.jl

    Load and create NetCDF files in Julia

    NCDatasets allows one to read and create netCDF files. NetCDF data set and attribute list behave like Julia dictionaries and variables like Julia arrays. This package implements the CommonDataModel.jl interface, which means that the datasets can be accessed in the same way as GRIB files opened with GRIBDatasets.jl.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Literate

    Literate

    Simple package for literate programming in Julia

    ...Literate can generate markdown pages (for e.g. Documenter.jl), and Jupyter notebooks, from the same source file. There is also an option to "clean" the source from all metadata, and produce a pure Julia script. Using a single source file for multiple purposes reduces maintenance, and makes sure your different output formats are synced with each other.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    GeoNode

    GeoNode

    GeoNode is an open source platform for geospatial data

    ...It brings together mature and stable open-source software projects under a consistent and easy-to-use interface allowing non-specialized users to share data and create interactive maps. Data management tools built into GeoNode allow for integrated creation of data, metadata, and map visualization. Each dataset in the system can be shared publicly or restricted to allow access to only specific users. Social features like user profiles and commenting and rating systems allow for the development of communities around each platform to facilitate the use, management, and quality control of the data the GeoNode instance contains. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Querybook

    Querybook

    Big Data Querying UI, combining collocated table metadata

    Querybook is Pinterest’s open-source big data IDE via a notebook interface. Querybook’s core focus is to make composing queries, creating analyses, and collaborating with others as simple as possible. Organize rich text, queries, and charts into a notebook to easily document your analyses. Work collaboratively with others in a DataDoc and get real-time updates. The Query Editor is aware of your tables and their columns, as such it provides autocompletion, syntax highlighting, and the ability...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Fondant

    Fondant

    Production-ready data processing made easy and shareable

    Fondant is a modular, pipeline-based framework designed to simplify the preparation of large-scale datasets for training machine learning models, especially foundation models. It offers an end-to-end system for ingesting raw data, applying transformations, filtering, and formatting outputs—all while remaining scalable and traceable. Fondant is designed with reproducibility in mind and supports containerized steps using Docker, making it easy to share and reuse data processing components....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    ...Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    targets

    targets

    Function-oriented Make-like declarative workflows for R

    The targets package is a pipeline / workflow management tool in R, designed to coordinate multi‐step computational workflows in data science / statistics. It tracks dependencies between “targets” (computational steps), skips steps whose upstream data or code hasn’t changed, supports parallel computation, branching (dynamic generation of sub‐targets), file format abstractions, and encourages reproducible and efficient analyses. It’s something like GNU Make for R, but more integrated. Skipping...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB