UCCL
UCCL is an efficient communication library for GPUs
...The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. UCCL is designed to work with heterogeneous hardware environments, allowing GPUs from different vendors and network interfaces to communicate efficiently without vendor lock-in. The system also supports specialized workloads such as reinforcement learning weight transfers, key-value cache sharing, and expert parallelism for mixture-of-experts models. ...