Showing 31 open source projects for "python distributed list"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    This curated list contains 390 awesome open-source projects with a total of 1.4M stars grouped into 28 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    ..., separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). File sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata. Mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint). Comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    ... can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Dask

    Dask

    Parallel computing with task scheduling

    Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 5
    Bytewax

    Bytewax

    Python Stream Processing

    Bytewax is a Python framework that simplifies event and stream processing. Because Bytewax couples the stream and event processing capabilities of Flink, Spark, and Kafka Streams with the friendly and familiar interface of Python, you can re-use the Python libraries you already know and love. Connect data sources, run stateful transformations, and write to various downstream systems with built-in connectors or existing Python libraries. Bytewax is a Python framework and Rust distributed...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    gusty

    gusty

    Making DAG construction easier

    gusty allows you to control your Airflow DAGs, Task Groups, and Tasks with greater ease. gusty manages collections of tasks, represented as any number of YAML, Python, SQL, Jupyter Notebook, or R Markdown files. A directory of task files is instantly rendered into a DAG by passing a file path to gusty's create_dag function. gusty also manages dependencies (within one DAG) and external dependencies (dependencies on tasks in other DAGs) for each task file you define. All you have to do is provide...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Apache RocketMQ

    Apache RocketMQ

    Distributed messaging and streaming platform with low latency

    Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability. Messaging patterns including publish/subscribe, request/reply and streaming. Financial grade transactional message. Built-in fault tolerance and high availability configuration options base on DLedger. A variety of cross language clients, such as Java, C/C++, Python, Go. Pluggable transport protocols, such as TCP, SSL, AIO. Built...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Modin

    Modin

    Scale your Pandas workflows by changing a single line of code

    Scale your pandas workflow by changing a single line of code. Modin uses Ray, Dask or Unidist to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. It is not necessary to know in advance the available hardware resources in order to use Modin. Additionally, it is not necessary to specify...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 10
    Dolphin Scheduler

    Dolphin Scheduler

    A distributed and extensible workflow scheduler platform

    ... definition operations are visualized, Visualization process defines key information at a glance, One-click deployment. Support multi-tenant. Support many task types e.g., spark,flink,hive, mr, shell, python, sub_process. Support custom task types, Distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Conda.jl

    Conda.jl

    https://github.com/JuliaPy/Conda.jl

    This package allows one to use conda as a cross-platform binary provider for Julia for other Julia packages, especially to install binaries that have complicated dependencies like Python. conda is a package manager that started as the binary package manager for the Anaconda Python distribution, but it also provides arbitrary packages. Instead of the full Anaconda distribution, Conda.jl uses the miniconda Python environment, which only includes conda and its dependencies.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Mara Pipelines

    Mara Pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts

    This package contains a lightweight data transformation framework with a focus on transparency and complexity reduction. Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code. PostgreSQL as a data processing engine. Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines. GNU make semantics. Nodes depend on the completion of upstream nodes. No data dependencies or data flows. No in-app data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    nb-clean

    nb-clean

    Clean Jupyter notebooks of outputs, metadata, and empty cells

    nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs, and (optionally) empty cells, preparing them for committing to version control. It provides both a Git filter and pre-commit hook to automatically clean notebooks before they're staged, and can also be used with other version control systems, as a command line tool, and as a Python library. It can determine if a notebook is clean or not, which can be used as a check in your continuous integration pipelines. nb-clean...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    ... to various endpoints in real time. Agile development experience with SQL-like query language and graphical drag-and-drop editor supporting event simulation. Lightweight runtime that can natively run on Kubernetes, Docker, VM, or bare metal, and embedded in any Java or Python application. Scalable, and highly available distributed event processing on Kubernetes, with NATS Streaming and Siddhi Kubernetes Operator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ipyvolume

    ipyvolume

    3d plotting for Python in the Jupyter notebook

    3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL. Create quiver plots (like scatter, but with an arrow pointing in a particular direction). Render in the Jupyter notebook, or create a standalone html page (or snippet to embed in your page). Render in stereo, for virtual reality with Google Cardboard. Animate in d3 style, for instance, if the x coordinates or color of a scatter plots changes. Animations / sequences, all scatter/quiver plot properties can...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    GMAT

    GMAT

    General Mission Analysis Tool

    The General Mission Analysis Tool (GMAT) is an open-source tool for space mission design and navigation. GMAT is developed by a team of NASA, private industry, and public and private contributors. The GMAT development team is pleased to announce the release of GMAT version R2025a. For a complete list of new features, compatibility changes, and bug fixes, see the R2025a Release Notes in the Users Guide.
    Leader badge
    Downloads: 1,127 This Week
    Last Update:
    See Project
  • 18
    gravitino

    gravitino

    Unified metadata lake for data & AI assets.

    Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Kale

    Kale

    Kubeflow’s superfood for Data Scientists

    KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows. Kubeflow is a great platform for orchestrating complex workflows on top Kubernetes and Kubeflow Pipeline provides the mean to create reusable components that can be executed as part of workflows. The self-service nature of Kubeflow make it extremely appealing for Data Science use, at it provides an easy access to advanced distributed jobs...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Wally

    Wally

    Distributed Stream Processing

    Wally is a fast-stream-processing framework. Wally makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler. When we set out to build Wally, we had several high-level goals in mind. Create a dependable and resilient distributed computing framework. Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic. Provide high-performance & low-latency...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    PyMOL Molecular Graphics System

    PyMOL Molecular Graphics System

    PyMOL is an OpenGL based molecular visualization system

    The Open-Source PyMOL repository has been moved to github: https://github.com/schrodinger/pymol-open-source We still use the pymol-users mailing list here on sourceforge. Please subscribe for community support: https://pymol.org/maillist (Note: SourceForge email newsletter and special offers are optional and can be unchecked) The PyMOL community wiki has its own home: https://pymolwiki.org/
    Downloads: 64 This Week
    Last Update:
    See Project
  • 23
    PyTom

    PyTom

    http://www.sciencedirect.com/science/article/pii/S1047847711003492

    PyTom is a toolbox developed for interpreting cryo electron tomography data. All steps from reconstruction, localization, alignment and classification are covered with standard and improved methods. Please sign up to our mailing list to keep up with the most recent updates and versions.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24

    paralline

    Big Data tool

    Paralline executes a python function (or lambda function) or a script over each line of huge text files, in parallel processes and aggregates the result to a list.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ipytracer

    ipytracer

    Algorithm Visualizer for IPython/Jupyter Notebook

    Algorithm Visualizer for IPython/Jupyter Notebook. If you use the display(TracerObject) code from where you want to see, you can use it without any special modification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next