Data Integration Tools

View 501 business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the mapped file group contains all versions of a group of records.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2

    BD integration

    Heterogeneous BD integration

    The increasing need to obtain a generalized view of the information resources, presented in various systems has led to the data integration mechanisms formation, which focus on efficient access organization to external, heterogeneous data sources through a single interface. The project includes the mass integration platform which allows to create global infrastructure of tens and hundreds of heterogeneous databases based on service-oriented approach.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    The BioDataServer is a database integration system. It implements a mediator-wrapper architecture and offers a SQL interface. The data integration is based on user defined intergrated schema and adapter that wrap any kind of data source.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    CI Tools Demo

    CI Tools Demo

    Docker Infrastructure via docker-compose

    This repository provides a Docker-powered CI tools demo environment via a single command with docker-compose. It assembles popular CI/CD components—Jenkins, SonarQube, Nexus, GitLab, and Selenium Grid—each running in separate containers, facilitating self-contained integration testing or workshops. It’s not intended for production but serves as a practical demo or launchpad for containerized CI stacks. Each tool runs in an isolated container for modular experimentation. Maintained primarily for workshops and proofs of concept, not for production use. Includes legacy documentation and scripting for Mac users and older setups.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    CMIS Input plugin for Pentaho

    CMIS Input plugin for Pentaho

    Allows querying Content Management Systems that use the CMIS.

    Imagine being able to extract from your Enterprise Content Management System, all the metadata of your documents using simple queries with a query language very close to the traditional SQL. Imagine using the information extracted for statistical purposes, for creating reports and, more generally, to analyse your document archives in a way unthinkable until now with the current tools available. All this is possible within the Pentaho Suite, the Open Source Business Intelligence platform, which is useful to the extraction and analysis of structured and semi-structured data. With this goal (the extraction and analysis of data) has been designed and developed the CMIS Input plugin for Pentaho Data Integration (Kettle) that allows querying Content Management Systems that use the CMIS interoperability standard. The data, once extracted, can be stored and analyzed and perhaps presented in customized reports be published in various formats for the end user (PDF, Excel, etc..).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Cassandra Spark Connector

    Cassandra Spark Connector

    Apache Spark to Apache Cassandra connector

    The Apache Cassandra Spark Connector allows Spark jobs (RDDs or DataFrames/Datasets) to read from and write to Cassandra tables. Compatible with Apache Cassandra (v2.1+), Spark 1.0–3.5, and Scala 2.11–2.13, it supports mapping Cassandra rows to Scala case classes, saving results back to Cassandra, and executing arbitrary CQL within Spark applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CellTypist

    CellTypist

    A tool for semi-automatic cell type classification, harmonization

    CellTypist is an automated tool for cell type classification, harmonization, and integration. Classification, transfer cell type labels from the reference to query dataset. Harmonization, match and harmonize cell types defined by independent datasets. integration, integrate cell and cell types with supervision from harmonization. CellTypist recapitulates cell type structure and biology of independent datasets. Regularised linear models with Stochastic Gradient Descent provide a fast and accurate prediction. Scalable and flexible. Python-based implementation is easy to integrate into existing pipelines. A community-driven encyclopedia for cell types.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ChunJun

    ChunJun

    A data integration framework

    ChunJun is a distributed integration framework, and currently is based on Apache Flink. It was initially known as FlinkX and renamed ChunJun on February 22, 2022. It can realize data synchronization and calculation between various heterogeneous data sources. ChunJun has been deployed and running stably in thousands of companies so far. Based on the real-time computing engine--Flink, and supports JSON template and SQL script configuration tasks. The SQL script is compatible with Flink SQL syntax. Supports a variety of heterogeneous data sources, and supports synchronization and calculation of more than 20 data sources such as MySQL, Oracle, SQLServer, Hive, Kudu, etc. Easy to expand, highly flexible, newly expanded data source plugins can integrate with existing data source plugins instantly, plugin developers do not need to care about the code logic of other plugins.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Civi Data Integration

    Civi Data Integration

    This is a Pentaho Data Integration plugin for CiviCRM.

    This is a Pentaho Data Integration plugin for CiviCRM. It allows you to take advantage of the power of Pentaho Data Integration tools and use it with your CiviCRM instance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    DIMVisual stands for Data Integration Model for Visualization. The implementation allows the integration of information to the analysis of parallel application behavior.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Daffodil Replicator is a powerful Open Source Java tool for data integration, data migration and data protection in real time. It allows bi-directional data replication and synchronization between homogeneous / heterogeneous databases including Oracle, M
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DataSync Suite
    DataSync Suite is an open source platform for integrating tools like Zimbra, SugarCRM, and Drupal. The tool is focused on a single sign-on, application data integration, and fast, flexible deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Diirt

    Data integration in real time

    Java framework for real time data collection, aggregation and manipulation. This top level project is composed of different sub-projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    EasyDataQuality for Pentaho Kettle

    EasyDataQuality for Pentaho Kettle

    EasyDataQuality for Pentaho Data Integration in Kettle

    EasyDQ plugins for Contact cleansing in Pentaho Data Integration in Kettle.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ExAws

    ExAws

    A flexible, easy to use set of clients AWS APIs for Elixir

    ExAws is a comprehensive Elixir client library for interfacing with AWS services. It provides low-level request builders for nearly all AWS APIs—like S3, EC2, Lambda, DynamoDB, SQS, SES, Route 53, and more—while supporting streaming, request configuration overrides, telemetry, flexible HTTP clients, and codecs. Its modular architecture enables importing only the services you need with separate packages (e.g., ex_aws_s3, ex_aws_ec2).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Fluxion
    The Fluxion framework is a prototype data integration system using Semantic Web technologies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Open source Application and Data Integration Platform that allows developers and end-users to integrate and transform information using a web-based drag-and-drop interface that doesn't require coding or programming skills.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GeoACE is geo-data integration package designed to include and present geographical data, measured geo-information, and simulated geo-information. The tool is built upon the tcl/tk platform, using the Visualization Toolkit as a 3-D engine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Gradle Docker Compose Plugin

    Gradle Docker Compose Plugin

    Simplifies usage of Docker Compose for integration testing

    The Gradle Docker Compose Plugin by Avast integrates Docker Compose lifecycle management into Gradle builds. It allows developers to define and manage Docker containers required for integration testing or local development directly from their Gradle build scripts. This plugin automates the startup and shutdown of services, supports container health checks, and enables tight integration between application code and containerized services, enhancing reproducibility and automation in development pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Grinn

    Grinn

    graph database and R package for omic data integration

    http://kwanjeeraw.github.io/grinn/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    The Hanalyzer is a tool designed to help biologists explain results observed in genome-scale experiments and to generate new hypotheses. It combines information extraction, semantic data integration, reasoning, and visualization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Harmony Data Integration

    Harmony Data Integration

    Fast, sensitive and accurate integration of single-cell data

    Harmony is a general-purpose R package with an efficient algorithm for integrating multiple data sets. It is especially useful for large single-cell datasets such as single-cell RNA-seq. Harmony has been tested on R versions =4. Please consult the DESCRIPTION file for more details on required R packages. Harmony has been tested on Linux, OS X, and Windows platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    INDUS is a porject for knowledge acquisition and data integration from heterogeneous distributed data, particularly from bio-informatics databases
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    This disease-centric project contributes data integration and analysis tools from the Institute for Systems Biology (ISB). We offer this project to the research community to further our efforts in disease prediction and prevention.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    An extension package to Pentaho Data Integration, providing plug-ins. Steps/job entries can be downloaded independently and each comes with source code in the .zip file. All are licensed as LGPL or GPL.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB