Showing 120 open source projects for "parallel"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 1
    Medusa

    Medusa

    Framework for Accelerating LLM Generation with Multiple Decoding Heads

    Medusa is a framework aimed at accelerating the generation capabilities of Large Language Models (LLMs) by employing multiple decoding heads. This approach allows for parallel processing during text generation, significantly enhancing throughput and reducing response times. Medusa is designed to be simple to implement and integrates with existing LLM infrastructures, making it a practical solution for scaling LLM applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ControlNet

    ControlNet

    Let us control diffusion models

    ControlNet is a neural network architecture designed to add conditional control to text-to-image diffusion models. Rather than training from scratch, ControlNet “locks” the weights of a pre-trained diffusion model and introduces a parallel trainable branch that learns additional conditions—like edges, depth maps, segmentation, human pose, scribbles, or other guidance signals. This allows the system to control where and how the model should focus during generation, enabling users to steer layout, structure, and content more precisely than prompt text alone. The project includes many trained model variants that accept different types of conditioning (e.g., canny edge input, normal maps, skeletal pose) and produce improved fidelity in stable diffusion outputs. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Petals

    Petals

    Run 100B+ language models at home, BitTorrent-style

    ...Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning. Single-batch inference runs at ≈ 1 sec per step (token) — up to 10x faster than offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec. Beyond classic language model APIs — you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch. You can also host BLOOMZ, a version of BLOOM fine-tuned to follow human instructions in the zero-shot regime — just replace bloom-petals with bloomz-petals. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Evolutionary Computation Framework

    Evolutionary Computation Framework

    C++ framework for application of any type of evolutionary computation.

    ECF is a framework intended for application of any type of evolutionary computation (GA/GP, DE, Clonalg, ES, PSO, ABC, GAn, local search...). It offers simplicity for the end-user (parameterless usage, tutorial) and customization for experienced EC practicioners.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Shoplogix Smart Factory Platform Icon
    Shoplogix Smart Factory Platform

    For manufacturers looking for a powerful Manufacturing Execution solution

    Real-time Visibility into Your Shop Floor's Performance. The Shoplogix smart factory platform enables manufacturers to increase overall equipment effectiveness, reduce operational costs, sustain growth and improve profitability by allowing them to visualize, integrate and act on production and machine performance in real-time. Manufacturers that trust us to drive efficiency in their factories. Real-time visual data and analytics provide valuable insights to make better informed decisions. Uncover hidden shop floor potential and drive rapid time to value. Develop a continuously improving culture through training, education and data-driven decisions. Compete in the i4.0 world by making the Shoplogix Smart Factory Platform the cornerstone of your digital transformation. Connect to any equipment or device to automate data collection and exchange it with other manufacturing technologies. Automatically monitor, report and analyze machine states to track real-time production.
    Learn More
  • 5
    GPT-NeoX

    GPT-NeoX

    Implementation of model parallel autoregressive transformers on GPUs

    This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. For those looking for a TPU-centric codebase, we...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    ElegantRL

    ElegantRL

    Massively Parallel Deep Reinforcement Learning

    ElegantRL is an efficient and flexible deep reinforcement learning framework designed for researchers and practitioners. It focuses on simplicity, high performance, and supporting advanced RL algorithms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    TextBox

    TextBox

    A text generation library with pre-trained language models github.com

    ...From a model perspective, we incorporate 47 pre-trained language models/modules covering the categories of general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight models (modules). From a training perspective, we support 4 pre-training objectives and 4 efficient and robust training strategies, such as distributed data parallel and efficient generation. Compared with the previous version of TextBox, this extension mainly focuses on building a unified, flexible, and standardized framework for better supporting PLM-based text generation models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Elephas

    Elephas

    Distributed Deep learning with Keras & Spark

    ...Elephas intends to keep the simplicity and high usability of Keras, thereby allowing for fast prototyping of distributed models, which can be run on massive data sets. Elephas implements a class of data-parallel algorithms on top of Keras, using Spark's RDDs and data frames. Keras Models are initialized on the driver, then serialized and shipped to workers, alongside with data and broadcasted model parameters. Spark workers deserialize the model, train their chunk of data and send their gradients back to the driver. The "master" model on the driver is updated by an optimizer, which takes gradients either synchronously or asynchronously. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Fairseq

    Fairseq

    Facebook AI Research Sequence-to-Sequence Toolkit written in Python

    Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. We provide reference implementations of various sequence modeling papers. Recent work by Microsoft and Google has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new FullyShardedDataParallel (FSDP) wrapper provided by fairscale. Fairseq can be extended through user-supplied plug-ins. Models define the neural network architecture and encapsulate all of the learnable parameters. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • The #1 White Label Solution for Event Ticketing and Registration Icon
    The #1 White Label Solution for Event Ticketing and Registration

    For Event Organizers

    White label event ticketing & registration solutions customized to fit your every need.
    Learn More
  • 10
    igel

    igel

    Machine learning tool that allows you to train and test models

    A delightful machine learning tool that allows you to train/fit, test, and use models without writing code. The goal of the project is to provide machine learning for everyone, both technical and non-technical users. I sometimes needed a tool sometimes, which I could use to fast create a machine learning prototype. Whether to build some proof of concept, create a fast draft model to prove a point or use auto ML. I find myself often stuck writing boilerplate code and thinking too much about...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    GPT Neo

    GPT Neo

    An implementation of model parallel GPT-2 and GPT-3-style models

    An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here to play with our pre-trained models, we strongly recommend you try out the HuggingFace Transformer integration. Training and inference is officially supported on TPU and should work on GPU as well. This repository will be (mostly) archived as we move focus to our GPU-specific repo, GPT-NeoX.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    Parakeet

    Parakeet

    PAddle PARAllel text-to-speech toolKIT

    PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle dynamic graph and includes many influential TTS models.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    VITS

    VITS

    Conditional Variational Autoencoder with Adversarial Learning

    ...Unlike traditional two-stage systems that separately train an acoustic model and a vocoder, VITS trains an end-to-end model that maps text directly to waveform using a conditional variational autoencoder combined with normalizing flows and adversarial training. This architecture enables parallel generation (fast inference) while achieving speech quality that rivals or surpasses many two-stage systems. The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. It also includes monotonic alignment search code and g2p preprocessing, which are crucial components for aligning text and speech in an end-to-end setup.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Transformer TTS

    Transformer TTS

    Implementation of a Transformer based neural network

    ...This design addresses common autoregressive issues such as repetition, skipped words, and unstable attention, and results in robust, fast synthesis where all frames are predicted in parallel. The repository ships with tooling to build datasets (especially LJSpeech) and create training data, plus scripts to train both the aligner and the TTS model, monitor training with TensorBoard, and resume or reset training runs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    FARM

    FARM

    Fast & easy transfer learning for NLP

    ...With FARM you can build fast proofs-of-concept for tasks like text classification, NER or question answering and transfer them easily into production. Easy fine-tuning of language models to your task and domain language. AMP optimizers (~35% faster) and parallel preprocessing (16 CPU cores => ~16x faster). Modular design of language models and prediction heads. Switch between heads or combine them for multitask learning. Full Compatibility with HuggingFace Transformers' models and model hub. Smooth upgrading to newer language models. Integration of custom datasets via Processor class. Powerful experiment tracking & execution.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    YOLO ROS

    YOLO ROS

    YOLO ROS: Real-Time Object Detection for ROS

    ...Darknet on the CPU is fast (approximately 1.5 seconds on an Intel Core i7-6700HQ CPU @ 2.60GHz × 8) but it's like 500 times faster on GPU! You'll have to have an Nvidia GPU and you'll have to install CUDA. The CMakeLists.txt file automatically detects if you have CUDA installed or not. CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    XLM (Cross-lingual Language Model)

    XLM (Cross-lingual Language Model)

    PyTorch original implementation of Cross-lingual Language Model

    XLM (Cross-lingual Language Model) is a family of multilingual pretraining methods that align representations across languages to enable strong zero-shot transfer. It popularized objectives like Masked Language Modeling (MLM) across many languages and Translation Language Modeling (TLM) that jointly trains on parallel sentence pairs to tighten cross-lingual alignment. Using a shared subword vocabulary, XLM learns language-agnostic features that work well for classification and sequence labeling tasks such as XNLI, NER, and POS without target-language supervision. The repository provides preprocessing pipelines, training code, and fine-tuning scripts so you can reproduce benchmark results or adapt models to your own multilingual corpora. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DETR

    DETR

    End-to-end object detection with transformers

    ...Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. Due to this parallel nature, DETR is very fast and efficient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    SINGA

    SINGA

    A distributed deep learning platform

    Apache SINGA is an Apache Top Level Project, focusing on distributed training of deep learning and machine learning models. Various example deep learning models are provided in SINGA repo on Github and on Google Colab. SINGA supports data parallel training across multiple GPUs (on a single node or across different nodes). SINGA supports various popular optimizers including stochastic gradient descent with momentum, Adam, RMSProp, and AdaGrad, etc. SINGA records the computation graph and applies the backward propagation automatically after forward propagation. The optimization of memory are implemented in the Device class. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    RecNN

    RecNN

    Reinforced Recommendation toolkit built around pytorch 1.7

    This is my school project. It focuses on Reinforcement Learning for personalized news recommendation. The main distinction is that it tries to solve online off-policy learning with dynamically generated item embeddings. I want to create a library with SOTA algorithms for reinforcement learning recommendation, providing the level of abstraction you like.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    CUDA-JMI

    Tool for feature selection using the JMI metric and multiple GPUs

    CUDA-JMI is a parallel tool to accelerate the feature selection process using Joint Mutual Information as metric. This tool receives as input a file with ARFF, CVS or LIBSVM extensions that contais the values of m individuals and n features and returns a file with those features that provide more non-rendundant information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    MUSE

    MUSE

    A library for Multilingual Unsupervised or Supervised word Embeddings

    MUSE is a framework for learning multilingual word embeddings that live in a shared space, enabling bilingual lexicon induction, cross-lingual retrieval, and zero-shot transfer. It supports both supervised alignment with seed dictionaries and unsupervised alignment that starts without parallel data by using adversarial initialization followed by Procrustes refinement. The code can align pre-trained monolingual embeddings (such as fastText) across dozens of languages and provides standardized evaluation scripts and dictionaries. By mapping languages into a common vector space, MUSE makes it straightforward to build cross-lingual applications where resources are scarce for some languages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Scalable Distributed Deep-RL

    Scalable Distributed Deep-RL

    A TensorFlow implementation of Scalable Distributed Deep-RL

    ...IMPALA introduced a new paradigm for efficiently training agents across large-scale environments by decoupling acting and learning processes. In this architecture, multiple actor processes interact with their environments in parallel to collect trajectories, which are then asynchronously sent to a centralized learner for policy updates. The learner uses importance weighting to correct for policy lag between actors and the learner, enabling stable off-policy training at scale. This design allows the system to scale efficiently to hundreds of environments and billions of frames while maintaining sample efficiency and stability. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24

    OpenDino

    Open Source Java platform for Optimization, DoE, and Learning.

    ...Implemented Modules Evolutionary Algorithms: - CMA-ES - (1+1)-ES - Differential Evolution Deterministic optimization algorithm: - SIMPLEX Learning: - a simple Artificial Neural Net Optimization problems: - test functions - interface for executing other programs (solvers) - parallel execution of problems - distributed execution of problems via socket connection between computers Others: - data storage - data analyser and viewer
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Tensorpack

    Tensorpack

    A Neural Net Training Interface on TensorFlow, with focus on speed

    ...Uses TensorFlow in the efficient way with no extra overhead. On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack. Scalable data-parallel multi-GPU / distributed training strategy is off-the-shelf to use. Squeeze the best data loading performance of Python with tensorpack.dataflow. Symbolic programming (e.g. tf.data) does not offer the data processing flexibility needed in research. Tensorpack squeezes the most performance out of pure Python with various auto parallelization strategies. ...
    Downloads: 0 This Week
    Last Update:
    See Project