Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Business
Data Management
Data Integration Tools

Open Source Linux Data Integration Tools - Page 2

x

Sort By:

Most Popular

Clear All Filters

OS

Linux 89
Windows 85
Mac 80
More...
BSD 32
ChromeOS 29

Category

Business 89
Database 15
Software Development 14
Scientific/Engineering 10
Formats and Protocols 7
System 5
Internet 4
Communications 3
Artificial Intelligence 2
Education 1
Printing 1

License

OSI-Approved Open Source 69
Other License 2
Creative Commons Attribution License 1
Public Domain 1

Translations

English 16
Spanish 4
Catalan 2
Chinese (Simplified) 2
More...
French 2
Brazilian Portuguese 1
Dutch 1
German 1
Italian 1
Javanese 1
Korean 1
Polish 1
Portuguese 1
Romanian 1
Russian 1

Programming Language

Java 40
Python 12
JavaScript 8
Groovy 4
More...
PHP 3
R 3
TypeScript 3
Go 2
JSP 2
Perl 2
PL/SQL 2
Unix Shell 2
C++ 1
Elixir 1
Haskell 1
Lisp 1
Ruby 1
Scala 1
XSL (XSLT/XPath/XSL-FO) 1

Status

Production/Stable 22
Beta 9
Alpha 6
Mature 5
More...
Pre-Alpha 4
Planning 2

Data Integration Tools for Linux

View 41 business solutions

Data Integration Linux Clear Filters

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

Cassandra Spark Connector

Apache Spark to Apache Cassandra connector

The Apache Cassandra Spark Connector allows Spark jobs (RDDs or DataFrames/Datasets) to read from and write to Cassandra tables. Compatible with Apache Cassandra (v2.1+), Spark 1.0–3.5, and Scala 2.11–2.13, it supports mapping Cassandra rows to Scala case classes, saving results back to Cassandra, and executing arbitrary CQL within Spark applications.

Downloads: 0 This Week

Last Update: 2025-08-04
See Project
2

CellTypist

A tool for semi-automatic cell type classification, harmonization

CellTypist is an automated tool for cell type classification, harmonization, and integration. Classification, transfer cell type labels from the reference to query dataset. Harmonization, match and harmonize cell types defined by independent datasets. integration, integrate cell and cell types with supervision from harmonization. CellTypist recapitulates cell type structure and biology of independent datasets. Regularised linear models with Stochastic Gradient Descent provide a fast and accurate prediction. Scalable and flexible. Python-based implementation is easy to integrate into existing pipelines. A community-driven encyclopedia for cell types.

Downloads: 0 This Week

Last Update: 2025-06-25
See Project
3

ChunJun

A data integration framework

ChunJun is a distributed integration framework, and currently is based on Apache Flink. It was initially known as FlinkX and renamed ChunJun on February 22, 2022. It can realize data synchronization and calculation between various heterogeneous data sources. ChunJun has been deployed and running stably in thousands of companies so far. Based on the real-time computing engine--Flink, and supports JSON template and SQL script configuration tasks. The SQL script is compatible with Flink SQL syntax. Supports a variety of heterogeneous data sources, and supports synchronization and calculation of more than 20 data sources such as MySQL, Oracle, SQLServer, Hive, Kudu, etc. Easy to expand, highly flexible, newly expanded data source plugins can integrate with existing data source plugins instantly, plugin developers do not need to care about the code logic of other plugins.

Downloads: 0 This Week

Last Update: 2022-11-18
See Project
4

Civi Data Integration

This is a Pentaho Data Integration plugin for CiviCRM.

This is a Pentaho Data Integration plugin for CiviCRM. It allows you to take advantage of the power of Pentaho Data Integration tools and use it with your CiviCRM instance.

Downloads: 0 This Week

Last Update: 2015-01-13
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Common Core Ontologies

The Common Core Ontology Repository

The Common Core Ontologies (CCO) comprise twelve ontologies that are designed to represent and integrate taxonomies of generic classes and relations across all domains of interest. CCO is a mid-level extension of Basic Formal Ontology (BFO), an upper-level ontology framework widely used to structure and integrate ontologies in the biomedical domain (Arp, et al., 2015). BFO aims to represent the most generic categories of entity and the most generic types of relations that hold between them, by defining a small number of classes and relations. CCO then extends from BFO in the sense that every class in CCO is asserted to be a subclass of some class in BFO, and that CCO adopts the generic relations defined in BFO (e.g., has_part) (Smith and Grenon, 2004). Accordingly, CCO classes and relations are heavily constrained by the BFO framework, from which it inherits much of its basic semantic relationships.

Downloads: 0 This Week

Last Update: 2026-04-04
See Project
6

Daffodil Replicator

Daffodil Replicator is a powerful Open Source Java tool for data integration, data migration and data protection in real time. It allows bi-directional data replication and synchronization between homogeneous / heterogeneous databases including Oracle, M

1 Review

Downloads: 0 This Week

Last Update: 2019-06-12
See Project
7

DataSync Suite

DataSync Suite is an open source platform for integrating tools like Zimbra, SugarCRM, and Drupal. The tool is focused on a single sign-on, application data integration, and fast, flexible deployment.

Downloads: 0 This Week

Last Update: 2015-12-21
See Project
8

EasyDataQuality for Pentaho Kettle

EasyDataQuality for Pentaho Data Integration in Kettle

EasyDQ plugins for Contact cleansing in Pentaho Data Integration in Kettle.

1 Review

Downloads: 0 This Week

Last Update: 2016-04-26
See Project
9

ExAws

A flexible, easy to use set of clients AWS APIs for Elixir

ExAws is a comprehensive Elixir client library for interfacing with AWS services. It provides low-level request builders for nearly all AWS APIs—like S3, EC2, Lambda, DynamoDB, SQS, SES, Route 53, and more—while supporting streaming, request configuration overrides, telemetry, flexible HTTP clients, and codecs. Its modular architecture enables importing only the services you need with separate packages (e.g., ex_aws_s3, ex_aws_ec2).

Downloads: 0 This Week

Last Update: 2025-07-10
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

Fluxion

The Fluxion framework is a prototype data integration system using Semantic Web technologies.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
11

Fusion Data Integration Platform

Open source Application and Data Integration Platform that allows developers and end-users to integrate and transform information using a web-based drag-and-drop interface that doesn't require coding or programming skills.

Downloads: 0 This Week

Last Update: 2014-06-07
See Project
12

Gradle Docker Compose Plugin

Simplifies usage of Docker Compose for integration testing

The Gradle Docker Compose Plugin by Avast integrates Docker Compose lifecycle management into Gradle builds. It allows developers to define and manage Docker containers required for integration testing or local development directly from their Gradle build scripts. This plugin automates the startup and shutdown of services, supports container health checks, and enables tight integration between application code and containerized services, enhancing reproducibility and automation in development pipelines.

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
13

Grinn

graph database and R package for omic data integration

http://kwanjeeraw.github.io/grinn/

Downloads: 0 This Week

Last Update: 2018-07-31
See Project
14

Hanalyzer

The Hanalyzer is a tool designed to help biologists explain results observed in genome-scale experiments and to generate new hypotheses. It combines information extraction, semantic data integration, reasoning, and visualization.

Downloads: 0 This Week

Last Update: 2013-04-15
See Project
15

Harmony Data Integration

Fast, sensitive and accurate integration of single-cell data

Harmony is a general-purpose R package with an efficient algorithm for integrating multiple data sets. It is especially useful for large single-cell datasets such as single-cell RNA-seq. Harmony has been tested on R versions =4. Please consult the DESCRIPTION file for more details on required R packages. Harmony has been tested on Linux, OS X, and Windows platforms.

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
16

Hetionet

Hetionet: an integrative network of disease

Hetionet is a hetnet — network with multiple node and edge (relationship) types — which encodes biology. The hetnet was designed for Project Rephetio, which aims to systematically identify why drugs work and predict new therapies for drugs. The JSON and Neo4j formats contain node and edge properties, which are absent in the TSV and matrix formats, including licensing information. Therefore the recommended formats are JSON and Neo4j. Our hetio package in Python reads the JSON format, but it is otherwise a simple yet new format. The Neo4j graph database has an established and thriving ecosystem. However, if you would like to access Hetionet without Neo4j, then we suggest the JSON format. The matrix format refers to HetMat archives, which store edge adjacency matrices on disk. Additional usage information is available at the corresponding download locations.

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
17

INDUS

INDUS is a porject for knowledge acquisition and data integration from heterogeneous distributed data, particularly from bio-informatics databases

1 Review

Downloads: 0 This Week

Last Update: 2013-03-07
See Project
18

ISBiology

This disease-centric project contributes data integration and analysis tools from the Institute for Systems Biology (ISB). We offer this project to the research community to further our efforts in disease prediction and prevention.

1 Review

Downloads: 0 This Week

Last Update: 2013-10-31
See Project
19

Illunus Data Integration

An extension package to Pentaho Data Integration, providing plug-ins. Steps/job entries can be downloaded independently and each comes with source code in the .zip file. All are licensed as LGPL or GPL.

Downloads: 0 This Week

Last Update: 2014-12-15
See Project
20

JasperSoft Business Intelligence Suite

The JasperSoft Business Intelligence Suite provides integrated reporting, analysis, and data integration to make faster, better decisions. * Integrated or stand-alone * Analytic & operational data integration * Embeddable with ERP or CRM

Downloads: 0 This Week

Last Update: 2016-04-21
See Project
21

Jitsu

Jitsu is an open-source Segment alternative

Jitsu is a fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days. Installing Jitsu is a matter of selecting your framework and adding few lines of code to your app. Jitsu is built to be framework agnostic, so regardless of your stack, we have a solution that'll work for your team. Connect data warehouse (Snowflake, Clickhouse, BigQuery, S3, Redshift ot Postgres) and query your data instantly. Jitsu can either stream data in real-time or send it in micro-batches (up to once a minute). Apply any transformation with Jitsu. Just write JavaScript code right in the UI to do anything with incoming data. And yes, the code editor supports code completion, debugging and many more. It feels like a full-featured IDE!

Downloads: 0 This Week

Last Update: 2025-08-14
See Project
22

KETL

KETL(tm) is a production ready ETL platform. The engine is built upon an open, multi-threaded, XML-based architecture. KETL's is designed to assist in the development and deployment of data integration efforts which require ETL and scheduling

Downloads: 0 This Week

Last Update: 2015-08-22
See Project
23

KubeRay

A toolkit to run Ray applications on Kubernetes

KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. It offers several key components. KubeRay core: This is the official, fully-maintained component of KubeRay that provides three custom resource definitions, RayCluster, RayJob, and RayService. These resources are designed to help you run a wide range of workloads with ease.

Downloads: 0 This Week

Last Update: 2026-03-19
See Project
24

LD-FusionTool

Data Fusion and Conflict Resolution tool for Linked Data

LD-FusionTool covers the Data Fusion step in the integration process for RDF, where data are merged to produce consistent and clean representations of objects, and conflicts which emerged during data integration need to be resolved.

Downloads: 0 This Week

Last Update: 2014-10-22
See Project
25

Legacy Data Integration

Developing a "bridge" to facilitate transfer of data between various databases(ith dis-similar schemas). JDBC and XML would be used.

Downloads: 0 This Week

Last Update: 2013-02-22
See Project

Previous
1
You're on page 2
3
4
Next

Related Searches

pentaho

ontology

database replication

kettle

data fusion

pentaho business analyticss

mysql replication

32

digital library source code

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise