Showing 545 open source projects for "pipeline"

View related business solutions
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 1
    Middleware

    Middleware

    Open-source DORA metrics platform for engineering teams

    Bring more visibility to your engineering pipeline, get the right data & actionable insights to unclog bottlenecks, ensuring smooth software delivery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    PyTextRank

    PyTextRank

    Python implementation of TextRank algorithms

    PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work -- and related knowledge graph practices.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    TPOT

    TPOT

    A Python Automated Machine Learning tool that optimizes ML

    Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    whylogs

    whylogs

    The open standard for data logging

    whylogs is an open-source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to track changes in their dataset Create data constraints to know whether their data looks the way it should. Quickly visualize key summary statistics about their datasets. whylogs profiles are the core of the whylogs library. They capture key statistical properties of data, such as the distribution (far beyond...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cloudflare secures and ensures the reliability of your external-facing resources such as websites, APIs, and applications. Icon
    Cloudflare secures and ensures the reliability of your external-facing resources such as websites, APIs, and applications.

    Cloudflare is the foundation for your infrastructure, applications, and teams.

    It protects your internal resources such as behind-the-firewall applications, teams, and devices.
    Get Started
  • 5
    DataGym.ai

    DataGym.ai

    Open source annotation and labeling tool for image and video assets

    DATAGYM enables data scientists and machine learning experts to label images up to 10x faster. AI-assisted annotation tools reduce manual labeling effort, give you more time to finetune ML models and speed up your go to market of new products. Accelerate your computer vision projects by cutting down data preparation time up to 50%. A machine learning model is only as good as its training data. DATAGYM is an end-to-end workbench to create, annotate, manage, and export the right training data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    rudderstack

    rudderstack

    Privacy and Security focused Segment-alternative, in Golang

    Quickly deploy flexible, powerful customer data pipelines, then send the data to your entire stack—without the engineering headache. Our complete toolset makes it easy to level-up your customer data stack. Spare your data engineers the headache. Our 180+ integrations, along with custom webhook sources and destinations, save data teams hundred of hours. Say goodbye to different versions of the truth. Our SDKs track anonymous and known users at the source and reconcile users in your warehouse...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GitVersion

    GitVersion

    From git log to SemVer in no time

    ... pipeline with TeamCity, AppVeyor, Jenkins or any of the other supported build servers. GitVersion is a tool that generates a Semantic Version number based on your Git history. The version number generated from GitVersion can then be used for various different purposes. GitVersion can be used in a Continuous Server pipeline to generate a version number that both labels the build itself and makes the different version variables available to the rest of the build pipeline.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Drone

    Drone

    Drone is a Container-Native, Continuous Delivery Platform

    Drone is a self-service Continuous Integration platform for busy development teams. Pipelines are configured with a simple, easy‑to‑read file that you commit to your git repository. Each Pipeline step is executed inside an isolated Docker container that is automatically downloaded at runtime. Drone integrates seamlessly with multiple source code management systems, including GitHub, GitHubEnterprise, Bitbucket, and GitLab. Drone natively supports multiple operating systems and architectures...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    Luigi is a Python (3.6, 3.7, 3.8, 3.9 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more. The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Field Service Management Software | BlueFolder Icon
    Field Service Management Software | BlueFolder

    Maximize technician productivity with intuitive field service software

    Track all your service data in one easy-to-use system, enabling your team to move faster and generate more revenue for your bottom line.
    Learn More
  • 10
    F·W·K

    F·W·K

    3D game engine/framework in C, with Luajit and Python bindings now

    3D game framework in C, with Luajit and Python bindings now.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DecisionTree.jl

    DecisionTree.jl

    Julia implementation of Decision Tree (CART) Random Forest algorithm

    Julia implementation of Decision Tree (CART) and Random Forest algorithms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    testkube

    testkube

    Kubernetes-native testing framework for test execution

    Welcome to Testkube - Your friendly cloud-native testing framework for Kubernetes. Testkube natively integrates test orchestration and execution into Kubernetes and your CI/CD/GitOps pipeline. It decouples test artifacts and execution from CI/CD tooling; tests are meant to be part of your cluster's state and can be executed as needed. Out-of-the-box integrations with all popular testing tools and CI/CD systems mean no custom scripts are required to orchestrate your tests from any CI/CD/GitOps...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    InteractiveViz.jl

    InteractiveViz.jl

    Interactive visualization tools for Julia

    Julia already has a rich set of plotting tools in the form of the Plots and Makie ecosystems, and various backends for these. So why another plotting package? InteractiveViz is not a replacement for Plots or Makie, but rather a graphics pipeline system developed on top of Makie. It has a few objectives. To provide a simple API to visualize large or possibly infinite datasets (tens of millions of data points) easily. To enable interactivity, and be responsive even with large amounts of data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Population Shift Monitoring

    Population Shift Monitoring

    Monitor the stability of a Pandas or Spark dataframe

    ... features. popmon can automatically flag and alert on changes observed over time, such as trends, shifts, peaks, outliers, anomalies, changing correlations, etc, using monitoring business rules. Advanced users can leverage popmon's modular data pipeline to customize their workflow. Visualization of the pipeline can be useful when debugging or for didactic purposes. There is a script included with the package that you can use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    gusty

    gusty

    Making DAG construction easier

    gusty allows you to control your Airflow DAGs, Task Groups, and Tasks with greater ease. gusty manages collections of tasks, represented as any number of YAML, Python, SQL, Jupyter Notebook, or R Markdown files. A directory of task files is instantly rendered into a DAG by passing a file path to gusty's create_dag function. gusty also manages dependencies (within one DAG) and external dependencies (dependencies on tasks in other DAGs) for each task file you define. All you have to do is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Conduit

    Conduit

    Conduit streams data between data stores. Kafka Connect replacement

    Conduit is a data streaming tool written in Go. It aims to provide the best user experience for building and running real-time data pipelines. Conduit comes with batteries included, it provides a UI, common connectors, processors and observability data out of the box. Sync data between your production systems using an extensible, event-first experience with minimal dependencies that fit within your existing workflow. Eliminate the multi-step process you go through today. Just download the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    PipeRider

    PipeRider

    Code review for data in dbt

    PipeRider automatically compares your data to highlight the difference in impacted downstream dbt models so you can merge your Pull Requests with confidence. PipeRider can profile your dbt models and obtain information such as basic data composition, quantiles, histograms, text length, top categories, and more. PipeRider can integrate with dbt metrics and present the time-series data of metrics in the report. PipeRider generates a static HTML report each time it runs, which can be viewed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Covalent workflow

    Covalent workflow

    Pythonic tool for running machine-learning/high performance workflows

    Covalent is a Pythonic workflow tool for computational scientists, AI/ML software engineers, and anyone who needs to run experiments on limited or expensive computing resources including quantum computers, HPC clusters, GPU arrays, and cloud services. Covalent enables a researcher to run computation tasks on an advanced hardware platform – such as a quantum computer or serverless HPC cluster – using a single line of code. Covalent overcomes computational and operational challenges inherent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Elementary

    Elementary

    Open-source data observability for analytics engineers

    Elementary is an open-source data observability solution for data & analytics engineers. Monitor your dbt project and data in minutes, and be the first to know of data issues. Gain immediate visibility, detect data issues, send actionable alerts, and understand the impact and root cause. Generate a data observability report, host it or share with your team. Monitoring of data quality metrics, freshness, volume and schema changes, including anomaly detection. Elementary data monitors are...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    BitSail

    BitSail

    BitSail is a distributed high-performance data integration engine

    BitSail is ByteDance's open source data integration engine which is based on distributed architecture and provides high performance. It supports data synchronization between multiple heterogeneous data sources, and provides global data integration solutions in batch, streaming, and incremental scenarios. At present, it serves almost all business lines in ByteDance, such as Douyin, Toutiao, etc., and synchronizes hundreds of trillions of data every day. BitSail has been widely used and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    memphis

    memphis

    Next-Generation Event Processing Platform

    Memphis enables building modern queue-based applications that require large volumes of streamed and enriched data, modern protocols, zero ops, up to x9 faster development, up to x46 fewer costs, and significantly lower dev time for data-oriented developers and data engineers. Queues and brokers are a mission-critical component in the modern application architecture and should be highly available and stable as possible. Provide great performance while maintaining efficient resource...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Apache SeaTunnel

    Apache SeaTunnel

    SeaTunnel is a distributed, high-performance data integration platform

    SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies. There are hundreds of commonly-used data sources of which versions are incompatible. With the emergence of new technologies, more data sources are appearing. It is difficult for users to find a tool that can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    hindent

    hindent

    Haskell pretty printer

    Haskell pretty printer. hindent is used in a pipeline style. The default indentation size is 2 spaces. Create a hindent.yaml file in your project directory or in your ~/ home directory. By default, hindent preserves the newline or lack of newline in your input. With force-trailing-newline, it will make sure there is always a trailing newline. hindent can be forced to insert a newline before specific operators and tokens with line-breaks. This is especially useful when utilizing libraries like...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    lakeFS

    lakeFS

    lakeFS - Git-like capabilities for your object storage

    Increase data quality and reduce the painful cost of errors. Data engineering best practices using git-like operations on data. lakeFS is an open-source data version control for data lakes. It enables zero-copy Dev / Test isolated environments, continuous quality validation, atomic rollback on bad data, reproducibility, and more. Data is dynamic, it changes over time. Dealing with that without a data version control system is error-prone and labor-intensive. With lakeFS, your data lake is...
    Downloads: 0 This Week
    Last Update:
    See Project