Showing 37 open source projects for "python distributed list"

View related business solutions
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    This curated list contains 390 awesome open-source projects with a total of 1.4M stars grouped into 28 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    ..., separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). File sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata. Mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint). Comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    ... can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Dask

    Dask

    Parallel computing with task scheduling

    Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Bytewax

    Bytewax

    Python Stream Processing

    Bytewax is a Python framework that simplifies event and stream processing. Because Bytewax couples the stream and event processing capabilities of Flink, Spark, and Kafka Streams with the friendly and familiar interface of Python, you can re-use the Python libraries you already know and love. Connect data sources, run stateful transformations, and write to various downstream systems with built-in connectors or existing Python libraries. Bytewax is a Python framework and Rust distributed...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    gusty

    gusty

    Making DAG construction easier

    gusty allows you to control your Airflow DAGs, Task Groups, and Tasks with greater ease. gusty manages collections of tasks, represented as any number of YAML, Python, SQL, Jupyter Notebook, or R Markdown files. A directory of task files is instantly rendered into a DAG by passing a file path to gusty's create_dag function. gusty also manages dependencies (within one DAG) and external dependencies (dependencies on tasks in other DAGs) for each task file you define. All you have to do is provide...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Apache RocketMQ

    Apache RocketMQ

    Distributed messaging and streaming platform with low latency

    Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability. Messaging patterns including publish/subscribe, request/reply and streaming. Financial grade transactional message. Built-in fault tolerance and high availability configuration options base on DLedger. A variety of cross language clients, such as Java, C/C++, Python, Go. Pluggable transport protocols, such as TCP, SSL, AIO. Built...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Modin

    Modin

    Scale your Pandas workflows by changing a single line of code

    Scale your pandas workflow by changing a single line of code. Modin uses Ray, Dask or Unidist to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. It is not necessary to know in advance the available hardware resources in order to use Modin. Additionally, it is not necessary to specify...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Dolphin Scheduler

    Dolphin Scheduler

    A distributed and extensible workflow scheduler platform

    ... definition operations are visualized, Visualization process defines key information at a glance, One-click deployment. Support multi-tenant. Support many task types e.g., spark,flink,hive, mr, shell, python, sub_process. Support custom task types, Distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Conda.jl

    Conda.jl

    https://github.com/JuliaPy/Conda.jl

    This package allows one to use conda as a cross-platform binary provider for Julia for other Julia packages, especially to install binaries that have complicated dependencies like Python. conda is a package manager that started as the binary package manager for the Anaconda Python distribution, but it also provides arbitrary packages. Instead of the full Anaconda distribution, Conda.jl uses the miniconda Python environment, which only includes conda and its dependencies.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Mara Pipelines

    Mara Pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts

    This package contains a lightweight data transformation framework with a focus on transparency and complexity reduction. Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code. PostgreSQL as a data processing engine. Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines. GNU make semantics. Nodes depend on the completion of upstream nodes. No data dependencies or data flows. No in-app data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    PULSAR

    PULSAR

    Distributed pub-sub messaging system

    Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now a top-level Apache Software Foundation project. Easy to deploy, lightweight compute process, developer-friendly APIs, no need to run your own stream processing engine. Run in production at Yahoo! scale for over 5 years, with millions of messages per second across millions of topics. Expand capacity seamlessly to hundreds of nodes. Low publish latency (< 5ms) at scale with strong...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    nb-clean

    nb-clean

    Clean Jupyter notebooks of outputs, metadata, and empty cells

    nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs, and (optionally) empty cells, preparing them for committing to version control. It provides both a Git filter and pre-commit hook to automatically clean notebooks before they're staged, and can also be used with other version control systems, as a command line tool, and as a Python library. It can determine if a notebook is clean or not, which can be used as a check in your continuous integration pipelines. nb-clean...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    ... to various endpoints in real time. Agile development experience with SQL-like query language and graphical drag-and-drop editor supporting event simulation. Lightweight runtime that can natively run on Kubernetes, Docker, VM, or bare metal, and embedded in any Java or Python application. Scalable, and highly available distributed event processing on Kubernetes, with NATS Streaming and Siddhi Kubernetes Operator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ipyvolume

    ipyvolume

    3d plotting for Python in the Jupyter notebook

    3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL. Create quiver plots (like scatter, but with an arrow pointing in a particular direction). Render in the Jupyter notebook, or create a standalone html page (or snippet to embed in your page). Render in stereo, for virtual reality with Google Cardboard. Animate in d3 style, for instance, if the x coordinates or color of a scatter plots changes. Animations / sequences, all scatter/quiver plot properties can...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    GMAT

    GMAT

    General Mission Analysis Tool

    The General Mission Analysis Tool (GMAT) is an open-source tool for space mission design and navigation. GMAT is developed by a team of NASA, private industry, and public and private contributors. The GMAT development team is pleased to announce the release of GMAT version R2025a. For a complete list of new features, compatibility changes, and bug fixes, see the R2025a Release Notes in the Users Guide.
    Leader badge
    Downloads: 1,127 This Week
    Last Update:
    See Project
  • 19
    gravitino

    gravitino

    Unified metadata lake for data & AI assets.

    Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    MerciGest

    MerciGest

    Free Inventory Software

    MerciGest is a free inventory management software. The program allows you to manage thousands of items in the warehouse by showing them in a complete list of loads, unloads, returns and stock. The software also allows you to print transport documents and invoices. The program is available in 4 languages: english, italian, spanish and french In this project there is both the desktop version for Windows and for MS Access. The application written in VBA for MS Access is distributed...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Raw Image Sorter

    Raw Image Sorter

    Organize RAW image files by matching them to JPEG files.

    This is an easy-to-use executable application for photographers and content creators to organize RAW image files by matching them to JPEG files. How It Works You must have an Excel file anyname.xlsx inside your JPEG folder. This Excel file should contain a list of JPEG file names (e.g., DSC_5805.JPG) in the first column. Sample .xlsx file is available in the git. The app reads the Excel file to know which JPEGs to consider. When you select the JPEG folder and the RAW folder, the app...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Kale

    Kale

    Kubeflow’s superfood for Data Scientists

    KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows. Kubeflow is a great platform for orchestrating complex workflows on top Kubernetes and Kubeflow Pipeline provides the mean to create reusable components that can be executed as part of workflows. The self-service nature of Kubeflow make it extremely appealing for Data Science use, at it provides an easy access to advanced distributed jobs...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    Wally

    Wally

    Distributed Stream Processing

    Wally is a fast-stream-processing framework. Wally makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler. When we set out to build Wally, we had several high-level goals in mind. Create a dependable and resilient distributed computing framework. Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic. Provide high-performance & low-latency...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    PyMOL Molecular Graphics System

    PyMOL Molecular Graphics System

    PyMOL is an OpenGL based molecular visualization system

    The Open-Source PyMOL repository has been moved to github: https://github.com/schrodinger/pymol-open-source We still use the pymol-users mailing list here on sourceforge. Please subscribe for community support: https://pymol.org/maillist (Note: SourceForge email newsletter and special offers are optional and can be unchecked) The PyMOL community wiki has its own home: https://pymolwiki.org/
    Downloads: 64 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next