Showing 35 open source projects for "etl project"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    HPCC Systems

    HPCC Systems

    End-to-end big data in a massively scalable supercomputing platform.

    Important: As of April 20, 2026, this project can now be found at https://github.com/hpcc-systems/HPCC-Platform/releases. HPCC Systems® (www.hpccsystems.com) from LexisNexis® Risk Solutions is a proven, open source solution for Big Data insights that can be implemented by businesses of all sizes. With HPCC Systems, developers can design applications with Big Data at their core, enabling businesses to better analyze and understand data at scale, improving business time to results and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    MentDB Projects

    MentDB Projects

    Generalized Interoperability and Strong AI

    MentDB is an open-source platform driving research into next-generation AI and universal data exchange. Our architecture is built around the revolutionary Mentalese Query Language (MQL). MentDB Weak (Generalized Interoperability): A unified data layer enabling seamless data exchange and application integration (SOA, ETL, Data Quality). We eliminate data silos through a single, generalized data language. MentDB Strong (Strong AI / AGI): The framework for exploring and building Machine...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    The SQOPS project makes it possible to analyze and optimize ETL processes. in particular the Talend ETL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    SQLBucket

    SQLBucket

    Lightweight library to write, orchestrate and test your SQL ETL

    SQLBucket is a lightweight framework to help write, orchestrate and validate SQL data pipelines. It gives the possibility to set variables and introduces some control flow using the fantastic Jinja2 library. It also implements a very simplistic unit and integration test framework where you can validate the results of your ETL in the form of SQL checks. With SQLBucket, you can apply TDD principles when writing data pipelines. To start working, you need to instantiate your SQLBucket core...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    Automatic Report Generator

    Automatic Report Generator

    Generate reports from Java applications directly.

    Automatic Report Generator is a mini-ETL API which allows to retrieve data through an SQL query towards a structured file, would it be a CSV, XLSX, or XML. This API also supports BIRT reports, and, in this case, its respective template is required. The project is available on Maven: https://mvnrepository.com/artifact/net.sf.automatic-report-generator Version 3: https://mvnrepository.com/artifact/net.sf.ennahdi.automatic-report-generator Checkout the documentation for both version 2 and version 3: https://sourceforge.net/p/automatic-report-generator/wiki/Home/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Talend Spatial Module (aka Spatial Data Integrator or SDI) is an ETL tool for geospatial. Based on Talend Open Studio, input, output and transform geocomponents are available. IO components read/write GIS formats(eg.PostGIS, GeoRSS). Transformers all
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    cocoNLP

    cocoNLP

    A Chinese information extraction tool

    ...Instead of requiring a heavy pipeline, it focuses on quick wins such as extracting names, places, organizations, emails, phone numbers, and dates directly from unstructured sentences. The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. Its API is intentionally simple, so you can drop it into scripts, ETL jobs, or dashboards without deep ML expertise. Because it aims at utility over complexity, it’s useful for prototyping data products or building lightweight text analytics where large models would be overkill. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    CloverDX

    CloverDX

    Design, automate, operate and publish data pipelines at scale

    Please, visit www.cloverdx.com for latest product versions. Data integration platform; can be used to transform/map/manipulate data in batch and near-realtime modes. Suppors various input/output formats (CSV,FIXLEN,Excel,XML,JSON,Parquet, Avro,EDI/X12,HL7,COBOL,LOTUS, etc.). Connects to RDBMS/JMS/Kafka/SOAP/Rest/LDAP/S3/HTTP/FTP/ZIP/TAR. CloverDX offers 100+ specialized components which can be further extended by creation of "macros" - subgraphs - and libraries, shareable with 3rd...
    Downloads: 36 This Week
    Last Update:
    See Project
  • 9
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    The goal of the project is to create specifications and provide reference parser in Java and C# for Extensible Term Language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Talend Custom Components

    Talend Custom Components

    Talend User Components made by cimt Objects AG

    This repository is deprecated and will not be maintained anymore. The source code is moved to: https://github.com/jlolling?tab=repositories Sources of the libraries and JET code of the components. This project contains a lot of the custom components I have made for Talend Open Studio. These components can be easily installed in the Talend Studio via the Exchange view or download them on the Talend Exchange website: http://exchange.talend.com The best way to get help is using the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    BI Report for OrangeHRM

    By using open source tool we have provided a BI report for OrangeHRM

    By using open source products such as PostgreSQL, Mondrian, Pentaho BI, ETL tools to provide OLAP reports from OrangeHRM open source version. Send us your feedback for the report
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Informatica Create ctl

    automate Informatica control file creation

    Createinfactl is a Java utility that enables Administrators to fully automate Informatica deployments from the command line by creating thedeployment group control XML file to be used with the pmrep command “deploydeploymentgroup”. Default settings for the control file can be overridden at the command line and works with both static and dynamic deployment groups in the repository. Please review the “Using the Deployment Control File” section in the Informatica Command Reference guide for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Pentaho Data Integration

    Pentaho Data Integration

    Pentaho Data Integration ( ETL ) a.k.a Kettle

    Pentaho Data Integration uses the Maven framework. Project distribution archive is produced under the assemblies module. Core implementation, database dialog, user interface, PDI engine, PDI engine extensions, PDI core plugins, and integration tests. Maven, version 3+, and Java JDK 1.8 are requisites. Use of the Pentaho checkstyle format (via mvn checkstyle:check and reviewing the report) and developing working Unit Tests helps to ensure that pull requests for bugs and improvements are...
    Downloads: 98 This Week
    Last Update:
    See Project
  • 15

    TabZilla

    Ad-hoc data replication for Oracle database.

    #FreeUkraine #SaveUkraine #StopRussia #StopPutin #CrimeaIsUkraine #UnitedForUkraine #RussiaInvadedUkraine UI written using wxPython. Allows you, to copy tables between Oracle databases using drag-n-drop interface. AKA filezilla, but for tables, not files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    BIAutomationTool

    Tool created to aggregate commands to disparate ETL tools

    This project was created to allow executing ETL jobs/tasks from a single command line tool with the same syntax, no matter what tool you were executing in. As long as you have a command line client for the ETL tool, you can configure the BIAutomationTool to use it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    common-etl

    A common ETL framework utilizing spring.

    This project is meant to do all the dirty threading work for you. The intention is to use this project as an archetype to provide a framework for writing ETLs. It contains an Extractor Thread, a Loader Thread and a Transformer Thread. All that is needed is to add the necessary business logic for your ETL while not having to worry about making sure your threading is correct.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    coopy
    Diffs, patches, and revision control for CSV files, spreadsheets, and databases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Misc scripts and utilities related to Oracle Warehouse Builder ETL (Tcl scripts, OWB Expert, project samples, etc.)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Simple ETL project that contains: - an ETL engine with syntactic and semantic validations - a web application used to upload file then verified by the ETL engine - a WPF application to define a new ETL project
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Apatar Data Integration/ETL
    Apatar is an open source Extract, Transform, and Load (ETL) project. Modular architecture delivers 1. Visual job designer/mapping 2. Connectivity to all major data sources 3. Flexible Deployment Options (GUI, or server engine with JVM, or embedded).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    XIForge is a team of IT volunteer to explore new free open source technology framework and platform. We focus Pentaho and OpenBravo ERP. Our current hosted project includes Pentaho Data Integration Parse JSON String plugin. Team founder is Reid Lai.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    File Based DBMS & ETL Tool: OpenSQL is a file based database management system which uses SQL like features to accept the query request and return the query response. In later phase of this project, ETL based features will be added.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    “Genoma Datawarehouse framework version 1.0” esta compuesto por un conjunto de atributos y entidades relacionadas entre si que tienen por objetivo almacenar los datos en un datawarehouse corporativo.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    cobol2j reads or writes COBOL or RPG data files imported from mainframes, AS/400 or Baby/36 environment. Packed decimal, zoned or packed date fields decoding included. EBCDIC conversion. ETL ISAM data to any other platform. PC Cobol ( ASCII ) supported.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB