Showing 65 open source projects for "apache"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Automate contact and company data extraction Icon
    Automate contact and company data extraction

    Build lead generation pipelines that pull emails, phone numbers, and company details from directories, maps, social platforms. Full API access.

    Generate leads at scale without building or maintaining scrapers. Use 10,000+ ready-made tools that handle authentication, pagination, and anti-bot protection. Pull data from business directories, social profiles, and public sources, then export to your CRM or database via API. Schedule recurring extractions, enrich existing datasets, and integrate with your workflows.
    Explore Apify Store
  • 1
    airda

    airda

    airda(Air Data Agent

    airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Mage.ai

    Mage.ai

    Build, run, and manage data pipelines for integrating data

    Open-source data pipeline tool for transforming and integrating data. The modern replacement for Airflow. Effortlessly integrate and synchronize data from 3rd party sources. Build real-time and batch pipelines to transform data using Python, SQL, and R. Run, monitor, and orchestrate thousands of pipelines without losing sleep. Have you met anyone who said they loved developing in Airflow? That’s why we designed an easy developer experience that you’ll enjoy. Each step in your pipeline is a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    NannyML

    NannyML

    Detecting silent model failure. NannyML estimates performance

    NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases. NannyML closes the loop with performance monitoring and post deployment data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    AutoGluon

    AutoGluon

    AutoGluon: AutoML for Image, Text, and Tabular Data

    AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 5
    SageMaker Training Toolkit

    SageMaker Training Toolkit

    Train machine learning models within Docker containers

    Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GMAT

    GMAT

    General Mission Analysis Tool

    The General Mission Analysis Tool (GMAT) is an open-source tool for space mission design and navigation. GMAT is developed by a team of NASA, private industry, and public and private contributors. The GMAT development team is pleased to announce the release of GMAT version R2025a. For a complete list of new features, compatibility changes, and bug fixes, see the R2025a Release Notes in the Users Guide.
    Leader badge
    Downloads: 869 This Week
    Last Update:
    See Project
  • 7
    PipeRider

    PipeRider

    Code review for data in dbt

    PipeRider automatically compares your data to highlight the difference in impacted downstream dbt models so you can merge your Pull Requests with confidence. PipeRider can profile your dbt models and obtain information such as basic data composition, quantiles, histograms, text length, top categories, and more. PipeRider can integrate with dbt metrics and present the time-series data of metrics in the report. PipeRider generates a static HTML report each time it runs, which can be viewed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    gravitino

    gravitino

    Unified metadata lake for data & AI assets.

    Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    odd-collector

    odd-collector

    Open-source metadata collector based on ODD Specification

    ODD Collector is a lightweight service that gathers metadata from all your data sources. Push-client is a provider which sends information directly to the central repository of the Platform. ODDRN (Open Data Discovery Resource Name) is a unique resource name that identifies entities such as data sources, data entities, dataset fields etc. It is used to build lineage and update metadata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Say goodbye to broken revenue funnels and poor customer experiences Icon
    Say goodbye to broken revenue funnels and poor customer experiences

    Connect and coordinate your data, signals, tools, and people at every step of the customer journey.

    LeanData is a Demand Management solution that supports all go-to-market strategies such as account-based sales development, geo-based territories, and more. LeanData features a visual, intuitive workflow native to Salesforce that enables users to view their entire lead flow in one interface. LeanData allows users to access the drag-and-drop feature to route their leads. LeanData also features an algorithms match that uses multiple fields in Salesforce.
    Learn More
  • 10
    SageMaker Inference Toolkit

    SageMaker Inference Toolkit

    Serve machine learning models within a Docker container

    Serve machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. Once you have a trained model, you can include it in a Docker container that runs your inference code. A container provides an effectively isolated environment, ensuring a consistent runtime regardless of where the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    odd-collector-gcp

    odd-collector-gcp

    Open-source GCP metadata collector based on ODD Specification

    ODD Collector GCP is a lightweight service which gathers metadata from all your Google Cloud Platform data sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Tributary

    Tributary

    Streaming reactive and dataflow graphs in Python

    Tributary is a library for constructing dataflow graphs in Python. Unlike many other DAG libraries in Python (airflow, luigi, prefect, dagster, dask, kedro, etc), tributary is not designed with data/etl pipelines or scheduling in mind. Instead, tributary is more similar to libraries like mdf, loman, pyungo, streamz, or pyfunctional, in that it is designed to be used as the implementation for a data model. One such example is the greeks library, which leverages tributary to build data models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Visdom

    Visdom

    A tool for creating, organizing, and sharing data visualizations

    A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy. Visdom aims to facilitate visualization of (remote) data with an emphasis on supporting scientific experimentation. Broadcast visualizations of plots, images, and text for yourself and your collaborators. Organize your visualization space programmatically or through the UI to create dashboards for live data, inspect results of experiments, or debug experimental code. Visdom has...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Bloxs

    Bloxs

    Build dashboards in Jupyter Notebook with numeric and chart boxes

    Bloxs is a simple Python package that helps you display information in an attractive way (formed in blocks). Perfect for building dashboards, reports and apps in the notebook.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    AWS Step Functions Data Science SDK

    AWS Step Functions Data Science SDK

    For building machine learning (ML) workflows and pipelines on AWS

    The AWS Step Functions Data Science SDK is an open-source library that allows data scientists to easily create workflows that process and publish machine learning models using Amazon SageMaker and AWS Step Functions. You can create machine learning workflows in Python that orchestrate AWS infrastructure at scale, without having to provision and integrate the AWS services separately. The best way to quickly review how the AWS Step Functions Data Science SDK works is to review the related...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    StreamAlert

    StreamAlert

    StreamAlert is a serverless, realtime data analysis framework

    StreamAlert is a serverless, real-time data analysis framework that empowers you to ingest, analyze, and alert on data from any environment, using data sources and alerting logic you define. Computer security teams use StreamAlert to scan terabytes of log data every day for incident detection and response. Incoming log data will be classified and processed by the rules engine. Alerts are then sent to one or more outputs. Rules are written in Python; they can utilize any Python libraries or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    ML workspace

    ML workspace

    All-in-one web-based IDE specialized for machine learning

    All-in-one web-based development environment for machine learning. The ML workspace is an all-in-one web-based IDE specialized for machine learning and data science. It is simple to deploy and gets you started within minutes to productively built ML solutions on your own machines. This workspace is the ultimate tool for developers preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch, Keras, Sklearn) and dev tools (e.g., Jupyter, VS Code, Tensorboard)...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Kale

    Kale

    Kubeflow’s superfood for Data Scientists

    KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows. Kubeflow is a great platform for orchestrating complex workflows on top Kubernetes and Kubeflow Pipeline provides the mean to create reusable components that can be executed as part of workflows. The self-service nature of Kubeflow make it extremely appealing for Data Science use, at it provides an easy access to advanced distributed jobs...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    OpenFrames

    OpenFrames

    Real-time interactive 3D graphics API for scientific simulations

    OpenFrames has moved its primary development repository to GitHub! Everything else will follow. Get it at https://github.com/ravidavi/OpenFrames/wiki OpenFrames is an Application Programming Interface (API) that allows developers to provides the ability to add interactive 3D graphics to any scientific simulation. A simulation developer can use OpenFrames to specify what they want to visualize, without having to know any details of computer graphics programming. OpenFrames is currently...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Optimus

    Optimus

    Agile Data Preparation Workflows made easy with Pandas

    Easily write code to clean, transform, explore and visualize data using Python. Process using a simple API, making it easy to use for newcomers. More than 100 functions to handle strings, process dates, urls and emails. Easily plot data from any size. Out-of-box functions to explore and fix data quality. Use the same code to process your data in your laptop or in a remote cluster of GPUs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    StellarGraph

    StellarGraph

    Machine Learning on Graphs

    StellarGraph is a Python library for machine learning on graphs and networks. The StellarGraph library offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data. It can solve many machine learning tasks. Graph-structured data represent entities as nodes (or vertices) and relationships between them as edges (or links), and can include data associated with either as attributes. For example, a graph can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    SageMaker Containers

    SageMaker Containers

    Create SageMaker-compatible Docker containers

    Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. The SageMaker Training Toolkit can be easily added to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Crystalsim -  XRD hkl simulation

    Crystalsim - XRD hkl simulation

    X-ray diffraction (XRD) analysis for hkl simulation of any crystal.

    Crystalsim is a simple freeware program with a neat graphical user interface for X-ray diffraction (XRD) data analysis . It can simulates all possible {hkl} planes data for the selected crystal. Crystallographic Information File (.cif) can also be used. Analyze both powder diffraction and single crystal data . Indexed at International Union of Crystallography (IUCR). Crystalline lattice parameters such as ‘a’, ‘b’, ‘c’ as well as interfacial angles such as alpha, beta,...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    Wally

    Wally

    Distributed Stream Processing

    Wally is a fast-stream-processing framework. Wally makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler. When we set out to build Wally, we had several high-level goals in mind. Create a dependable and resilient distributed computing framework. Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic. Provide high-performance & low-latency...
    Downloads: 0 This Week
    Last Update:
    See Project