Page 2 | gnu/linux free download

PipeRider

Code review for data in dbt

PipeRider automatically compares your data to highlight the difference in impacted downstream dbt models so you can merge your Pull Requests with confidence. PipeRider can profile your dbt models and obtain information such as basic data composition, quantiles, histograms, text length, top categories, and more. PipeRider can integrate with dbt metrics and present the time-series data of metrics in the report. PipeRider generates a static HTML report each time it runs, which can be viewed...

Downloads: 0 This Week

Last Update: 2023-11-22

See Project

DataGym.ai

Open source annotation and labeling tool for image and video assets

DATAGYM enables data scientists and machine learning experts to label images up to 10x faster. AI-assisted annotation tools reduce manual labeling effort, give you more time to finetune ML models and speed up your go to market of new products. Accelerate your computer vision projects by cutting down data preparation time up to 50%. A machine learning model is only as good as its training data. DATAGYM is an end-to-end workbench to create, annotate, manage, and export the right training data...

Downloads: 0 This Week

Last Update: 2023-06-01

See Project

Tributary

Streaming reactive and dataflow graphs in Python

Tributary is a library for constructing dataflow graphs in Python. Unlike many other DAG libraries in Python (airflow, luigi, prefect, dagster, dask, kedro, etc), tributary is not designed with data/etl pipelines or scheduling in mind. Instead, tributary is more similar to libraries like mdf, loman, pyungo, streamz, or pyfunctional, in that it is designed to be used as the implementation for a data model. One such example is the greeks library, which leverages tributary to build data models...

Downloads: 0 This Week

Last Update: 2023-06-12

See Project

Orchest

Build data pipelines, the easy way

Code, run and monitor your data pipelines all from your browser! From idea to scheduled pipeline in hours, not days. Interactively build your data science pipelines in our visual pipeline editor. Versioned as a JSON file. Run scripts or Jupyter notebooks as steps in a pipeline. Python, R, Julia, JavaScript, and Bash are supported. Parameterize your pipelines and run them periodically on a cron schedule. Easily install language or system packages. Built on top of regular Docker container...

Downloads: 0 This Week

Last Update: 2023-04-03

See Project

BitSail

BitSail is a distributed high-performance data integration engine

BitSail is ByteDance's open source data integration engine which is based on distributed architecture and provides high performance. It supports data synchronization between multiple heterogeneous data sources, and provides global data integration solutions in batch, streaming, and incremental scenarios. At present, it serves almost all business lines in ByteDance, such as Douyin, Toutiao, etc., and synchronizes hundreds of trillions of data every day. BitSail has been widely used and...

Downloads: 0 This Week

Last Update: 2023-06-12

See Project

CueLake

Use SQL to build ELT pipelines on a data lakehouse

With CueLake, you can use SQL to build ELT (Extract, Load, Transform) pipelines on a data lakehouse. You write Spark SQL statements in Zeppelin notebooks. You then schedule these notebooks using workflows (DAGs). To extract and load incremental data, you write simple select statements. CueLake executes these statements against your databases and then merges incremental data into your data lakehouse (powered by Apache Iceberg). To transform data, you write SQL statements to create views and...

Downloads: 0 This Week

Last Update: 2023-06-12

See Project

nonechucks

Deal with bad samples in your dataset dynamically

nonechucks is a library that provides wrappers for PyTorch's datasets, samplers and transforms to allow for dropping unwanted or invalid samples dynamically. What if you have a dataset of 1000s of images, out of which a few dozen images are unreadable because the image files are corrupted? Or what if your dataset is a folder full of scanned PDFs that you have to OCRize, and then run a language detector on the resulting text, because you want only the ones that are in English? Or maybe you...

Downloads: 0 This Week

Last Update: 2023-06-12

See Project

CloverDX

Design, automate, operate and publish data pipelines at scale

Please, visit www.cloverdx.com for latest product versions. Data integration platform; can be used to transform/map/manipulate data in batch and near-realtime modes. Suppors various input/output formats (CSV,FIXLEN,Excel,XML,JSON,Parquet, Avro,EDI/X12,HL7,COBOL,LOTUS, etc.). Connects to RDBMS/JMS/Kafka/SOAP/Rest/LDAP/S3/HTTP/FTP/ZIP/TAR. CloverDX offers 100+ specialized components which can be further extended by creation of "macros" - subgraphs - and libraries, shareable with 3rd...

4 Reviews

Downloads: 0 This Week

Last Update: 2023-05-04

See Project

apache spark data pipeline osDQ

osDQ dedicated to create apache spark based data pipeline using JSON

This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to...

Downloads: 0 This Week

Last Update: 2019-01-20

See Project

MethPipeline

Next-generation data pipeline to statistically call methylated and differentially methylated loci See the manual in doc/manual.pdf Please note: Calling of (differentially) methylated _positions_ will soon be uploaded.

Downloads: 0 This Week

Last Update: 2017-08-24

See Project

CCDLAB

A FITS image data viewer & reducer, and UVIT Data Reduction Pipeline.

CCDLAB is a FITS image data viewer, reducer, and UVIT Data Pipeline. The latest CCDLAB installer can be downloaded here: https://github.com/user29A/CCDLAB/releases The Visual Studio 2017 project files can be found here: https://github.com/user29A/CCDLAB/ Those may not be the latest code files as code is generally updated a few times a week. If you want the latest project files then let me know.

Downloads: 0 This Week

Last Update: 2021-02-13

See Project

Apache Kafka

Mirror of Apache Kafka

Apache Kafka is a distributed streaming platform built around durable, partitioned logs called topics, enabling high-throughput, fault-tolerant event pipelines. Producers append records to partitions, brokers replicate them for durability, and consumer groups read them at their own pace while balancing work across instances. The commit/offset model and retention policies support patterns from real-time processing to event sourcing and audit trails. Exactly-once processing semantics,...

Downloads: 1 This Week

Last Update: 2025-09-05

See Project

Data Pipeline

A graphical data manipulation and processing system including data import, numerical analysis and visualisation. The software is written in Java and built upon the Netbeans platform to provide a modular desktop data manipulation application.

Downloads: 0 This Week

Last Update: 2014-07-13

See Project

Search Results for "gnu/linux" - Page 2

Showing 38 open source projects for "gnu/linux"

PipeRider

DataGym.ai

Tributary

Orchest

BitSail

CueLake

nonechucks

CloverDX

apache spark data pipeline osDQ

MethPipeline

CCDLAB

Apache Kafka

Data Pipeline

Search Results for "gnu/linux" - Page 2

Showing 38 open source projects for "gnu/linux"

PipeRider

DataGym.ai

Tributary

Orchest

BitSail

CueLake

nonechucks

CloverDX

apache spark data pipeline osDQ

MethPipeline

CCDLAB

Apache Kafka

Data Pipeline

Related Searches

Related Categories