Decentralized deep learning in PyTorch. Built to train models
...Its intended usage is training one large model on hundreds of computers from different universities, companies, and volunteers. Distributed training without a master node: Distributed HashTable allows connecting computers in a decentralized network. Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too long to respond. Decentralized parameter averaging: iteratively aggregate updates from multiple workers without the need to synchronize across the entire network. ...