Hivemind is a PyTorch library for decentralized deep learning across the Internet. Its intended usage is training one large model on hundreds of computers from different universities, companies, and volunteers. Distributed training without a master node: Distributed Hash Table allows connecting computers in a decentralized network. Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too long to respond. Decentralized parameter averaging: iteratively aggregate updates from multiple workers without the need to synchronize across the entire network. Train neural networks of arbitrary size: parts of their layers are distributed across the participants with the Decentralized Mixture-of-Experts. If you have succesfully trained a model or created a downstream repository with the help of our library, feel free to submit a pull request that adds your project to the list.
Features
- Before installing, make sure that your environment has Python 3.7+
- Decentralized parameter averaging
- Fault-tolerant backpropagation
- Distributed training without a master node
- Train neural networks of arbitrary size
- By default, hivemind uses the precompiled binary of the go-libp2p-daemon library