Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2023-05-09 | 1.8 kB | |
v0.28.0_ Keras 2.11+ optimizers, faster reducescatter, fixes for latest TensorFlow, CUDA, NCCL.tar.gz | 2023-05-09 | 1.2 MB | |
v0.28.0_ Keras 2.11+ optimizers, faster reducescatter, fixes for latest TensorFlow, CUDA, NCCL.zip | 2023-05-09 | 1.6 MB | |
Totals: 3 Items | 2.8 MB | 0 |
Added
- TensorFlow: Added new
get_local_and_global_gradients
to PartialDistributedGradientTape to retrieve local and non-local gradients separately. (#3859)
Changed
- Improved reducescatter performance by allocating output tensors before enqueuing the operation. (#3824)
- TensorFlow: Ensured that
tf.logical_and
within allreducetf.cond
runs on CPU. (#3885) - TensorFlow: Added support for Keras 2.11+ optimizers. (#3860)
CUDA_VISIBLE_DEVICES
environment variable is no longer passed to remote nodes. (#3865)
Fixed
- Fixed build with ROCm. (#3839, #3848)
- Fixed build of Docker image horovod-nvtabular. (#3851)
- Fixed linking recent NCCL by defaulting CUDA runtime library linkage to static and ensuring that weak symbols are overridden. (#3867, #3846)
- Fixed compatibility with TensorFlow 2.12 and recent nightly versions. (#3864, #3894, #3906, #3907)
- Fixed missing arguments of Keras allreduce function. (#3905)
- Updated with_device functions in MXNet and PyTorch to skip unnecessary cudaSetDevice calls. (#3912)