etl. free download - SourceForge

Showing 28 open source projects for "etl."

View related business solutions

Data Management Python Clear Filters & Widen Search

Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
1

Ethereum ETL

Python scripts for ETL (extract, transform and load) jobs for Ethereum

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery. Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.

Downloads: 6 This Week

Last Update: 2024-04-11
See Project
2

AWS Data Wrangler

Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

...Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. Convert the column name to be compatible with Amazon Athena and the AWS Glue Catalog. Run a query against AWS CloudWatchLogs Insights and convert the results to Pandas DataFrame. Get QuickSight dashboard ID given a name and fails if there is more than 1 ID associated with this name. ...

Downloads: 15 This Week

Last Update: 2026-05-07
See Project
3

NVIDIA Merlin

Library providing end-to-end GPU-accelerated recommender systems

...Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA Merlin on the NVIDIA developer website. Transform data (ETL) for preprocessing and engineering features. Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI by leveraging optimized, custom-built data loaders. Scale large deep learning recommender models by distributing large embedding tables that exceed available GPU and CPU memory. Deploy data transformations and trained models to production with only a few lines of code.

Downloads: 4 This Week

Last Update: 2024-06-14
See Project
4

Pathway

Python ETL framework for stream processing, real-time analytics, LLM

Pathway is an open-source framework designed for building real-time data applications using reactive and declarative paradigms. It enables seamless integration of live data streams and structured data into analytical pipelines with minimal latency. Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm. Unlike traditional batch processing frameworks, Pathway continuously...

Downloads: 11 This Week

Last Update: 2026-05-25
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
5

Pyper

Concurrent Python made simple

Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...

Downloads: 2 This Week

Last Update: 2025-04-08
See Project
6

AWS SDK for pandas

Easy integration with Athena, Glue, Redshift, Timestream, Neptune

...The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. It also supports Redshift, OpenSearch, and other services, enabling ETL tasks that blend SQL engines and Python transformations. Operational helpers handle IAM, sessions, and concurrency while exposing knobs for encryption, versioning, and catalog consistency. The result is a productive workflow that keeps your analytics in Python while leveraging AWS-native storage and query engines at scale.

Downloads: 11 This Week

Last Update: 2026-05-07
See Project
7

Datapipe

Real-time, incremental ETL library for ML with record-level depend

Datapipe is a real-time, incremental ETL library for Python with record-level dependency tracking. Datapipe is designed to streamline the creation of data processing pipelines. It excels in scenarios where data is continuously changing, requiring pipelines to adapt and process only the modified data efficiently. This library tracks dependencies for each record in the pipeline, ensuring minimal and efficient data processing.

3 Reviews

Downloads: 163 This Week

Last Update: 3 days ago
See Project
8

CSVSplitter

# CSV Splitter Uma ferramenta para dividir arquivos CSV em múltiplos arquivos com base na quantidade de registros especificada, mantendo a integridade dos dados e permitindo configurações de charset, separador e formatação. Ideal para lidar com grandes arquivos CSV que precisam ser fragmentados para melhor manuseio e processamento. ## Funcionalidades - **Divisão de CSV**: Divide o arquivo original em múltiplos arquivos CSV, com o número de registros por arquivo definido pelo...

Downloads: 1 This Week

Last Update: 2024-10-27
See Project
9

Mara Pipelines

A lightweight opinionated ETL framework, halfway between plain scripts

This package contains a lightweight data transformation framework with a focus on transparency and complexity reduction. Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code. PostgreSQL as a data processing engine. Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines. GNU make semantics. Nodes depend on the completion of upstream nodes. No data dependencies or data flows. No in-app data...

Downloads: 0 This Week

Last Update: 2023-12-06
See Project
Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
10

Tributary

Streaming reactive and dataflow graphs in Python

Tributary is a library for constructing dataflow graphs in Python. Unlike many other DAG libraries in Python (airflow, luigi, prefect, dagster, dask, kedro, etc), tributary is not designed with data/etl pipelines or scheduling in mind. Instead, tributary is more similar to libraries like mdf, loman, pyungo, streamz, or pyfunctional, in that it is designed to be used as the implementation for a data model. One such example is the greeks library, which leverages tributary to build data models for options pricing.

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
11

SQLBucket

Lightweight library to write, orchestrate and test your SQL ETL

...It gives the possibility to set variables and introduces some control flow using the fantastic Jinja2 library. It also implements a very simplistic unit and integration test framework where you can validate the results of your ETL in the form of SQL checks. With SQLBucket, you can apply TDD principles when writing data pipelines. To start working, you need to instantiate your SQLBucket core object with the project_folder parameter. That folder will contain all your SQL ETL. The python file where you create your SQLBucket object is also a good place to instantiate your command line interface.

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
12

CSV*Extractor for RDBMS (command line)

Extract table data is CSV format from 14 databases.

Spools data for a given query or table from 14 databases. Windows command-line application.

Downloads: 0 This Week

Last Update: 2014-12-15
See Project
13

Data Migrator for Oracle

Migrate/Copy your data between Oracle database and 13 major DBs.

Command line data Copy/Migration tool for Oracle. Supports Oracle 7.3, Oracle 8i, Oracle 9i, Oracle 10G, Oracle 11G and 13 major databases. 1. Exadata 2. Sybase ASE 3. Informix Innovator C 4. Sybase SQL Anywhere 5. DB2 UDB 6. CSV 7. SQLServer 8. MariaDB 9. Sybase IQ 10. PostgreSQL 11. MySQL 12. Informix IDS 13. TimesTen

Downloads: 0 This Week

Last Update: 2014-12-27
See Project
14

CSV*Loader PRO (Windows command line)

Loads CSV file to14 databases

Windows command line tool for CSV data load to 14 relational stores.

Downloads: 0 This Week

Last Update: 2014-11-28
See Project
15

CSV*Extractor Pro (Windows command line)

Spool you scalar data in CSV format from 14 major Databases.

Command line tool for data export from major relational data stores (RDBMS). DB2 Advanced Enterprise Server DB2 Advanced Workgroup Server DB2 Developer Edition DB2 Enterprise Server DB2 Express DB2 Express C DB2 Workgroup Server Exadata Infobright Informix IDS Informix Innovator C MariaDB MySQL Oracle Oracle XE PostgreSQL SAP Sybase ASE SQL Lite SQL Server Enterprise SQL Server Express Sybase IQ Sybase SQL Anywhere TimesTen

Downloads: 0 This Week

Last Update: 2014-11-23
See Project
16

Data Migration Tools for RDBMS

DataMigrator for 14 major databases

Touch and go Windows command line data migration tool for 14 databases: 1. Sybase ASE 2. Informix Innovator C 3. Sybase SQL Anywhere 4. DB2 UDB 5. SQLServer 6. MariaDB 7. Sybase IQ 8. PostgreSQL 9. MySQL 10. Informix IDS 11. TimesTen 12. Oracle 13. SQL Lite 14. Exadata

Downloads: 0 This Week

Last Update: 2014-10-28
See Project
17

SQLServer ->SQLServer Data Migrator

Copy data between your SQLServer instances

Ad-hoc data replication for SQLServer 2005,2008,2010 and 2012. Touch-and-go design requires you to provide just login info, query file with your SQL and target table name.

Downloads: 0 This Week

Last Update: 2014-09-07
See Project
18

DataMule

Extract-Copy-Load (ECL) tool for 14 databases.

Extract, Copy and Load operations for: 1. Sybase ASE 2. Informix Innovator C 3. Sybase SQL Anywhere 4. DB2 UDB 5. SQLServer 6. MariaDB 7. Sybase IQ 8. PostgreSQL 9. MySQL 10. Informix IDS 11. TimesTen 12. Oracle 13. SQL Lite 14. Exadata Total 224 data copy vectors. CSV -> DB. DB->DB DB->CSV

Downloads: 0 This Week

Last Update: 2014-10-28
See Project
19

DataCopy For SQLServer

Data Copy tool for SQL Server and Oracle

Migrate your data from SQLServer to Oracle without creating single dump file. Input is a SQLServer query file defining dataset you want to copy to Oracle. Target table has to exist for copy to go through. Turbo mode offers 5x copy performance improvement.

Downloads: 0 This Week

Last Update: 2014-09-07
See Project
20

Data Spooler for SQLServer #SaveUkraine

Extracts table or query data from SQL Server 2005, 2008, 2012

#SaveUkraine #StopRussia #FreeUkraine #StopPutin #CrimeaIsUkraine #UnitedForUkraine #RussiaInvadedUkraine Spools/extracts/dump table or query data from SQL Server 2015, 2008,2012. Serial spool creates single dump file. Turbo mode offers 5x spool performance improvement. Sharded turbo more creates multiple files.

Downloads: 0 This Week

Last Update: 2014-09-07
See Project
21

Data Spooler For Oracle #SaveUkraine

Simplified turbo spooler for Oracle.

#SaveUkraine #StopRussia #FreeUkraine #StopPutin #CrimeaIsUkraine #UnitedForUkraine #RussiaInvadedUkraine Exports/Spools scalar data on disk for a given Oracle table. Turbo mode spools 5x faster.

Downloads: 0 This Week

Last Update: 2014-09-07
See Project
22

TabZilla

Ad-hoc data replication for Oracle database.

#FreeUkraine #SaveUkraine #StopRussia #StopPutin #CrimeaIsUkraine #UnitedForUkraine #RussiaInvadedUkraine UI written using wxPython. Allows you, to copy tables between Oracle databases using drag-n-drop interface. AKA filezilla, but for tables, not files.

Downloads: 0 This Week

Last Update: 2014-07-19
See Project
23

PDF*Merger for Windows

Merge/concatenate PDF files into one PDF file

Merge your PDF files for upload to reporting engine or other needs. Command line, win32 Written in Python. Compiled with PyInstaller.

Downloads: 0 This Week

Last Update: 2014-07-03
See Project
24

CSV*Loader for Oracle

Simplified CSV turbo loader to Oracle

Tired of writing control files? No problem! CSV*Loader will generate control file for SQL*Loader. Too slow? No problem! CSV*Loader turbo mode may load it 10x faster to your Oracle database than your good old Perl::DBI script.

Downloads: 0 This Week

Last Update: 2017-11-04
See Project
25

COBOL Data Definitions

Parse, analyze and -- most importantly -- use COBOL data definitions. This gives you access to COBOL data from Python programs. Write data analyzers, one-time data conversion utilities and Python programs that are part of COBOL systems. Really.

Downloads: 1 This Week

Last Update: 2013-04-26
See Project