65 projects for "etl." with 1 filter applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 1
    Embedded Template Library (ETL)

    Embedded Template Library (ETL)

    Embedded Template Library

    C++ is a great language to use for embedded applications and templates are a powerful aspect. The standard library can offer a great deal of well-tested functionality, but there are some parts of the standard library that do not fit well with deterministic behavior and limited resource requirements. These limitations usually preclude the use of dynamically allocated memory and containers with open-ended sizes. What is needed is a template library where the user can declare the size, or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Steampipe

    Steampipe

    Zero-ETL, infinite possibilities. Live query APIs, code & more

    ...Your cloud is a live database that changes fast. Don't wait on ETL to sync, or rely on old data. Crunch it where it's born, fueling new use cases and swift decisions.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    NVIDIA Merlin

    NVIDIA Merlin

    Library providing end-to-end GPU-accelerated recommender systems

    ...Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA Merlin on the NVIDIA developer website. Transform data (ETL) for preprocessing and engineering features. Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI by leveraging optimized, custom-built data loaders. Scale large deep learning recommender models by distributing large embedding tables that exceed available GPU and CPU memory. Deploy data transformations and trained models to production with only a few lines of code.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 5
    Superduper

    Superduper

    Superduper: Integrate AI models and machine learning workflows

    ...Developers may leverage Superduper by building compositional and declarative objects that out-source the details of deployment, orchestration versioning, and more to the Superduper engine. This allows developers to completely avoid implementing MLOps, ETL pipelines, model deployment, data migration, and synchronization. Using Superduper is simply "CAPE": Connect to your data, apply arbitrary AI to that data, package and reuse the application on arbitrary data, and execute AI-database queries and predictions on the resulting AI outputs and data.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    AWS SDK for pandas

    AWS SDK for pandas

    Easy integration with Athena, Glue, Redshift, Timestream, Neptune

    ...The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. It also supports Redshift, OpenSearch, and other services, enabling ETL tasks that blend SQL engines and Python transformations. Operational helpers handle IAM, sessions, and concurrency while exposing knobs for encryption, versioning, and catalog consistency. The result is a productive workflow that keeps your analytics in Python while leveraging AWS-native storage and query engines at scale.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 7
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    Trellis AI

    Trellis AI

    All-in-one AI framework & toolkit for Claude Code & Cursor

    ...Trellis also includes tooling for monitoring, scheduling, and tracing the execution of complex multi-step jobs, helping teams maintain visibility into how work progresses and where bottlenecks emerge. The platform can integrate with external services, databases, and model endpoints, making it suitable for automation, ETL pipelines, AI-driven processes, and business logic orchestration.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ...Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    Pentaho

    Pentaho

    Pentaho offers comprehensive data integration and analytics platform.

    Pentaho couples data integration with business analytics in a modern platform to easily access, visualize and explore data that impacts business results. Use it as a full suite or as individual components that are accessible on-premise, in the cloud, or on-the-go (mobile). Pentaho enables IT and developers to access and integrate data from any source and deliver it to your applications all from within an intuitive and easy to use graphical tool. The Pentaho Enterprise Edition Free Trial...
    Leader badge
    Downloads: 1,265 This Week
    Last Update:
    See Project
  • 11
    SQL*Plus Commander

    SQL*Plus Commander

    Text-based user interface to query data on Oracle DB in a smart way

    SQL*Plus Commander is Text-based user interface (TUI) / framework to query data on Oracle DB in a smart way. It consists in a fully customizable script shell for bash and ksh. It executes custom queries or procedures on DB with SQLPlus for Oracle. The results of queries can be browsed in a colorful text interface resulting data from a query can be selected and passed dinamically as parameters for others queries or procedures It may be useful for people who runs frequently a limited...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 12

    GETL

    ETL engine based on Groovy

    ...The data structures tend to change over time, or not be known in advance, working with them must be maintained; 3. All routine work ETL should be automated wherever possible; 4. Compiling the code on the fly bail speed and reserve for the optimization; 5. Sophisticated class hierarchy guarantee easy connection of other open source solutions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    NBi

    NBi

    NBi is a testing framework (add-on to NUnit)

    NBi is a testing framework (add-on to NUnit) for Business Intelligence. It supports most of the relational databases (SQL server, MySQL, postgreSQL ...) and OLAP platforms (Analysis Services, Mondrian ...) but also ETL and reporting components (Microsoft technologies). The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# code to specify your tests! Either, you don't need Visual Studio to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Talend Spatial Module (aka Spatial Data Integrator or SDI) is an ETL tool for geospatial. Based on Talend Open Studio, input, output and transform geocomponents are available. IO components read/write GIS formats(eg.PostGIS, GeoRSS). Transformers all
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    csv-parser

    csv-parser

    Streaming csv parser inspired by binary-csv that aims to be faster

    ...The parser handles standard CSV semantics including quoted fields, variable delimiters, escape sequences, and optional headers; this makes it robust for a variety of CSV dialects you might encounter. Because it works incrementally (row by row), it is well suited for ETL pipelines, data ingestion workflows, CSV-to-database imports, or any context where you need to process or transform large tabular data in Node.js efficiently. Using the .on('data') / .on('end') (or equivalent async patterns), you can accumulate, filter, transform, or stream data further downstream without waiting for the whole file.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    cocoNLP

    cocoNLP

    A Chinese information extraction tool

    ...The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. Its API is intentionally simple, so you can drop it into scripts, ETL jobs, or dashboards without deep ML expertise. Because it aims at utility over complexity, it’s useful for prototyping data products or building lightweight text analytics where large models would be overkill. The repository also includes examples and test snippets to help you understand expected inputs and typical outputs, which shortens the learning curve for newcomers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    CloverDX

    CloverDX

    Design, automate, operate and publish data pipelines at scale

    Please, visit www.cloverdx.com for latest product versions. Data integration platform; can be used to transform/map/manipulate data in batch and near-realtime modes. Suppors various input/output formats (CSV,FIXLEN,Excel,XML,JSON,Parquet, Avro,EDI/X12,HL7,COBOL,LOTUS, etc.). Connects to RDBMS/JMS/Kafka/SOAP/Rest/LDAP/S3/HTTP/FTP/ZIP/TAR. CloverDX offers 100+ specialized components which can be further extended by creation of "macros" - subgraphs - and libraries, shareable with 3rd...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    The goal of the project is to create specifications and provide reference parser in Java and C# for Extensible Term Language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Better SQL in java! Offering a seamless java class mapping and SQL-like domain-specific language implemented for number of commercial and open-source DBMS
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    Informatica DBMetadata

    Java utility that reads the metadata from table(s)

    Dbmetadata is a Java utility that reads the metadata from table(s) in a specified database and creates the Informatica XML to import into the repository. I created this utility when we were migrating to a new platform and needed a quick way to create flatfile and relational sources and targets that matched the DDL of the table. I also needed to use shortcuts. If you use the import table list, it will create one XML file with all of the tables and shortcuts (if a shortcut folder is specified)...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    JDONREF

    JDONREF

    Free open source geocoder

    JDONREF is a free web service for restructuration, normalisation, postal validation and geocoding of French postal address. Data are not included, but you can build your own ETL job from OSM sources or affiliates.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Informatica Create ctl

    automate Informatica control file creation

    Createinfactl is a Java utility that enables Administrators to fully automate Informatica deployments from the command line by creating thedeployment group control XML file to be used with the pmrep command “deploydeploymentgroup”. Default settings for the control file can be overridden at the command line and works with both static and dynamic deployment groups in the repository. Please review the “Using the Deployment Control File” section in the Informatica Command Reference guide for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Informatica ExecuteWorkflow

    A utility that uses Informatica Operations API

    A Java utility that uses the Informatica Operations API allowing parameter inputs, trapping of suspended workflows and ability to send an email on failure. This utility extends the functionality of the pmcmd startworkflow and starttask command. If you pass in a parameter file and individual parameters on the command line, a temporary parameter file is created that has the values from the parameter file and appends the individual parameters. The e-mail sent is in HTML format using...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    pg_bulkload

    PostgreSQL Bulk Data Loader

    pg_bulkload is a high speed data loading utility for PostgreSQL. pg_bulkload is designed to load huge amount of data to a database. You can load data to table bypassing PostgreSQL shared buffers. pg_bulkload also has some ETL features; input data validation and data transformation. ---------- pg_bulkload was located in pgfoundry, but has been moved to here temporally. ************************************ 2015.07.03 Now, we are moving to GitHub. http://github.com/ossc-db/pg_bulkload ************************************
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo