BytePS is a high-performance and generally distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on either TCP or RDMA networks. BytePS outperforms existing open-sourced distributed training frameworks by a large margin. For example, on BERT-large training, BytePS can achieve ~90% scaling efficiency with 256 GPUs (see below), which is much higher than Horovod+NCCL. In certain scenarios, BytePS can double the training speed compared with Horovod+NCCL. We show our experiment on BERT-large training, which is based on GluonNLP toolkit. The model uses mixed precision. We use Tesla V100 32GB GPUs and set batch size equal to 64 per GPU. Each machine has 8 V100 GPUs (32GB memory) with NVLink-enabled. Machines are inter-connected with 100 Gbps RDMA network. This is the same hardware setup you can get on AWS.

Features

  • Support for tensorflow.keras
  • We show our experiment on BERT-large training
  • We use Tesla V100 32GB GPUs and set batch size equal to 64 per GPU
  • BytePS achieves ~90% scaling efficiency for BERT-large with 256 GPUs
  • You can try out the latest features by directly installing from master branch
  • BytePS can double the training speed compared with Horovod+NCCL

Project Samples

Project Activity

See All Activity >

Categories

Machine Learning

License

Apache License V2.0

Follow BytePS

BytePS Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of BytePS!

Additional Project Details

Programming Language

Python

Related Categories

Python Machine Learning Software

Registered

2022-08-04