Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2022-10-13 | 5.2 kB | |
v0.26.0 source code.tar.gz | 2022-10-13 | 1.2 MB | |
v0.26.0 source code.zip | 2022-10-13 | 1.6 MB | |
Totals: 3 Items | 2.8 MB | 0 |
Added
- Spark Estimator: Added support for custom data loaders in KerasEstimator. (#3603)
- Spark Estimator: Added NVTabular data loader for KerasEstimator. (#3603)
- Spark Estimator: Added gradient accumulation support to Spark torch estimator. (#3681)
- TensorFlow: Added
register_local_var
functionality to distributed optimizers and local gradient aggregators. (#3695) - TensorFlow: Added support for local variables for
BroadcastGlobalVariablesCallback
. (#3703) - Enabled use of native
ncclAvg
op for NCCL allreduces. (#3646) - Added support for additional reduction operations for
allreduce
(min, max, product). (#3660) - Added 2D torus
allreduce
using NCCL. (#3608) - Added support for Petastorm reader level parallel shuffling. (#3665)
- Added random seed support for Lightning datamodule to generate reproducible data loading outputs. (#3665)
- Added support for
int8
anduint8
allreduce
andgrouped_allreduce
in TensorFlow. (#3649) - Added support for batched memory copies in
GPUAllgather
. (#3590) - Added support for batched memory copies in
GPUReducescatter
. (#3621) - Added
hvd.grouped_allgather()
andhvd.grouped_reducescatter()
operations. (#3594) - Added warning messages if output tensor memory allocations fail. (#3594)
- Added
register_local_source
anduse_generic_names
funtionality toDistributedGradientTape
. (#3628) - Added
PartialDistributedGradientTape()
API for model parallel use cases. (#3643) - Spark/Lightning: Added
reader_worker_count
andreader_pool_type
. (#3612) - Spark/Lightning: Added
transformation_edit_fields
andtransformation_removed_fields
param forEstimatorParams
. (#3651) - TensorFlow: Added doc string for
hvd.grouped_allreduce()
. (#3594) - ROCm: Enabled
alltoall
. (#3654)
Changed
- Default Petastorm reader pool is changed from
process
tothread
for lower memory usage. (#3665) - Keras: Support only legacy optimizers in Keras 2.11+. (#3725)
- Gloo: When negotiating, use
gather
rather thanallgather
. (#3633) - Use
packaging.version
instead ofdistutils
version classes. (#3700)
Deprecated
- Deprecated field
shuffle_buffer_size
fromEstimatorParams
. Useshuffle
to enable shuffle or not. (#3665)
Removed
- Build: Removed std::regex use for better cxxabi11 compatibility. (#3584)
Fixed
- TensorFlow: Fixed the optimizer iteration increments when
backward_passes_per_step > 1
. (#3631) - Fixed
FuseResponses()
onBATCHED_D2D_PADDING
edge cases for Reducescatter and/or ROCm. (#3621) - PyTorch: Fixed Reducescatter functions to raise
HorovodInternalError
rather thanRuntimeError
. (#3594) - PyTorch on GPUs without GPU operations: Fixed grouped allreduce to set CPU device in tensor table. (#3594)
- Fixed race condition in PyTorch allocation handling. (#3639)
- Build: Fixed finding
nvcc
(if not in$PATH
) with older versions of CMake. (#3682) - Fixed
reducescatter()
andgrouped_reducescatter()
to raise clean exceptions for scalar inputs. (#3699) - Updated Eigen submodule to fix build on macOS with aarch64. (#3619)
- Build: Correctly select files in
torch/
directory to be hipified. (#3588) - Build: Modify regex match for CUDA|ROCm in
FindPytorch.cmake
. (#3593) - Build: Fixed ROCm-specific build failure. (#3630)