Showing 254 open source projects for "data integration"

View related business solutions
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Pentaho Data Integration

    Pentaho Data Integration

    Pentaho Data Integration ( ETL ) a.k.a Kettle

    Pentaho Data Integration uses the Maven framework. Project distribution archive is produced under the assemblies module. Core implementation, database dialog, user interface, PDI engine, PDI engine extensions, PDI core plugins, and integration tests. Maven, version 3+, and Java JDK 1.8 are requisites. Use of the Pentaho checkstyle format (via mvn checkstyle:check and reviewing the report) and developing working Unit Tests helps to ensure that pull requests for bugs and improvements are processed quickly. ...
    Downloads: 87 This Week
    Last Update:
    See Project
  • 3
    Open Information Integration
    Open Information Integration Tool Suite (Open II) is used by analysts and programmers to accelerate data integration and harmonization across organizations. OpenII has a neutral schema repository for browsing and comparing all sorts of data models. OpenII is built as a Rich Client Platform Application on top of Eclipse 3.x. Developers need to download Eclipse, install the RCP support, the Fatjar plugin and the Delta Pack in one of the 3.x flavors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    JegasCRM

    CRM/CMS/DBMS/WEBSERVER/PROJECTMGT

    WWW.JEGAS.COM has GITHUB link on JegasCRM/JAS product page! PROJECT HAS ALREADY LEFT SORCEFORGE
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    MSCViewer

    MSCViewer

    A tool for visualization and analysis of logs as sequence diagrams

    MSCViewer is a tool intended for debugging of control flows in concurrent, distributed systems. The tool loads logs generated by various entities in the system and visualize a sequence diagram chart for events and interactions. The diagram is fully interactive: entity can be added/removed from the diagram and shuffled; events can be filtered, searched, highlighted and annotated with comments. MSCViewer features integration with a Python interpreter which allows writing Python scripts...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Civi Data Integration

    Civi Data Integration

    This is a Pentaho Data Integration plugin for CiviCRM.

    This is a Pentaho Data Integration plugin for CiviCRM. It allows you to take advantage of the power of Pentaho Data Integration tools and use it with your CiviCRM instance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    eAnalytics

    eAnalytics

    eAnalytics is an open source webanalytics tool

    eAnalytics is a new web analytics system especially designed for companies that require an integrated in-house web analytics solution. The system meets strong privacy issues as well as requirements for a tight integration with other systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    GeoKettle
    GeoKettle is a powerful, metadata-driven spatial ETL (Extract, Transform and Load) tool dedicated to the integration of different data sources for building and updating geospatial databases, data warehouses and services.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Apache Kafka

    Apache Kafka

    Mirror of Apache Kafka

    ...Exactly-once processing semantics, idempotent producers, and transactions help prevent duplicates across complex dataflows. Kafka Streams and Kafka Connect extend the core: Streams provides a library for stateful stream processing within applications, while Connect standardizes integration with external systems. With horizontal scalability, strong ordering guarantees within partitions, and mature tooling, Kafka serves as the backbone for event-driven architectures across analytics, microservices, and data integration.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10

    PDI Data Vault framework

    Data Vault loading automation using Pentaho Data Integration.

    A metadata driven 'tool' to automate loading a designed Data Vault. It consists of a set of Pentaho Data Integration and database objects. Thel Virtual Machine (VMware) is a 64 bit Ubuntu Server 14.04, with MySQL (Percona Server) and PostgreSQL 9.4 as the database flavours and PDI version 5.2 CE. NB: Directory version_2.4 contains the most recent Virtual Machine. The readme.txt contains info about that VM.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    KETL(tm) is a production ready ETL platform. The engine is built upon an open, multi-threaded, XML-based architecture. KETL's is designed to assist in the development and deployment of data integration efforts which require ETL and scheduling
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    LD-FusionTool

    Data Fusion and Conflict Resolution tool for Linked Data

    LD-FusionTool covers the Data Fusion step in the integration process for RDF, where data are merged to produce consistent and clean representations of objects, and conflicts which emerged during data integration need to be resolved.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    TeleScope

    TeleScope

    XML Data Stream Broker/Replicator

    TeleScope is the efficient intensive-load XML data stream broker, replicator and simple event processing platform (SEP) written in C for the Fedora 17-18, Slackware 13-14, Red Hat Enterprise Linux 6 (RHEL-6) Linux distributions. The platform is intended to be operated upon the single number/word values and is not meant to be deployed for full-text XML stream analysis. TeleScope has internal query language with a set of standard logical operators that allows to construct relatively complex...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    CMIS Input plugin for Pentaho

    CMIS Input plugin for Pentaho

    Allows querying Content Management Systems that use the CMIS.

    ...All this is possible within the Pentaho Suite, the Open Source Business Intelligence platform, which is useful to the extraction and analysis of structured and semi-structured data. With this goal (the extraction and analysis of data) has been designed and developed the CMIS Input plugin for Pentaho Data Integration (Kettle) that allows querying Content Management Systems that use the CMIS interoperability standard. The data, once extracted, can be stored and analyzed and perhaps presented in customized reports be published in various formats for the end user (PDF, Excel, etc..).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    bio2rdf
    The Bio2RDF project aims to transforms silos of life science data into a globally distributed network of linked data for biological knowledge discovery. Bio2RDF creates and provides machine understandable descriptions of biological entities using the RDF/RDFS/OWL Semantic Web languages. Using both syntactic and semantic data integration techniques, Bio2RDF seamlessly integrates diverse biological data and enables powerful new SPARQL-based services across its globally distributed knowledge bases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Cascalog

    Cascalog

    Data processing on Hadoop without the hassle

    Cascalog is a powerful Clojure (and Java) data processing and querying library built atop Hadoop (via Cascading), providing a high-level, Datalog-inspired abstraction for both big data processing and local computation. Cascalog is hosted at Clojars, and some of its dependencies are hosted at Conjars. Both Clo/Con-jars are maven repos that's easy to use with maven or leiningen. The Cascalog website contains more information and links to Various articles and tutorials. The best way to get...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    giServer

    giServer

    giServer the easy to use and extensible batch and integration server

    The giServer is an easy-to-use integration server for process automation and event-driven or scheduled execution of batch jobs. Instead of using complex XML configuration files an elaborate GUI for batch job management is included. Some possible usage scenarios are: - Automatic processing of incoming data files - Big Data applications - Process automation - Data Mining/Aggregation applications - Automatic Reporting - Processing and analysis of database records
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    BIRT Report Designer

    BIRT Report Designer

    Open Source Reporting & Data Visualization Platform

    ...With a flexible Open Data Access framework, developers can write custom data drivers to access data from any source, including Big Data sources like Apache Hadoop, Cassandra, and MongoDB, along with all traditional relational databases, Flat Files, XML data streams, and data stored in proprietary systems. Built for embedding, BIRT includes APIs for data access, chart generation, output formats, content execution, and integration within larger applications.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19

    RDF Content Provider for iQser GIN

    Plugin to connect RDF sources with the GIN Server

    GIN Server is a semantic middleware for easy data integration and automized analysis. The extendable architecture allows to plugin in data sources, analytics and event handling. This RDF Content Provider enables access to Semantic Web Content as an RDF file or SPAEQL endpoint.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    di-date-dimension-plugin

    Pentaho Data Integration plugin to supply to resolve date dimension

    Pentaho Data Integration plugin to supply a function to resolve, and insert if it doesn't exist, the date dimension. It calculates all calendar data and you must supply the table info used to save the information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    di-history-join-plugin

    Plugin for Pentaho Data Integration used to supply a method to join tw

    This plugin supply a method to join two tables using the date-from and date-to history. It use the two dates that indicate the life of the record and join using a query (like the database join plugin) to resolve the record's story of the two entities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    InfiniDB Community
    InfiniDB is now hosted on github.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    PythonStatsLab

    A collection of useful statistical functions written in Python

    There are functions for numerical integration of the standard normal probability distribution. They can be used to find probabilities, find z-alpha values, calculate confidence interval, etc. New functionality I wish to add are hypothesis testing and calculation of p-value, type I & II errors, etc. These functions would come in handy to anyone taking introductory or intermediate-level courses in probability, statistics and data analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Ondex Web

    Web-based visualisation of networks using Java

    Ondex Web is a new web-based implementation of the network visualization and exploration tools from the Ondex data integration platform. New features such as context-sensitive menus and annotation tools provide users with intuitive ways to explore and manipulate the appearance of heterogeneous biological networks. Ondex Web is open source, written in Java and can be easily embedded into Web sites as an applet. Ondex Web supports loading data from a variety of network formats, such as XGMML, NWB, Pajek and OXL
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB