Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-05-10 | 6.1 kB | |
v3.8 source code.tar.gz | 2025-05-10 | 12.9 MB | |
v3.8 source code.zip | 2025-05-10 | 16.1 MB | |
Totals: 3 Items | 29.1 MB | 0 |
Performance Optimizations
Intel Architecture Processors
- Improved matmul and inner product primitives performance on processors with Intel AMX instruction set support.
- Improved performance of convolution and inner product primitives on processors with Intel AVX2 instruction set support.
- Improved performance of
int8
convolution support with zero points. - Improved
fp32
convolution performance withfp16
andbf16
compressed weights on processors with Intel AVX2 or Intel AVX-512 instruction set support. - Improved
fp16
/bf16
depthwise convolution performance withfp32
bias orsum
post-ops or dilation. - Improved
bf16
pooling backpropagation performance. - Improved binary post-ops performance with
per_w
broadcast.
Intel Graphics Products
- Improved performance on Intel Arc graphics for future Intel Core Ultra processors (code name Panther Lake).
- Improved convolution performance on:
- Intel Arc Graphics for Intel Core Ultra processor series 2 (formerly Lunar Lake).
- Intel Arc B-series discrete graphics (formerly Battlemage).
- Improved
int8
matmul performance with zero-points support for source and weight tensors. - Improved
f4_e2m1
andf4_e3m0
matmul and reorder performance. - Improved performance of the following subgraphs with Graph API:
- Scaled Dot Product Attention (SDPA) with
int4
andint8
compressed key and value. fp16
/bf16
SDPA withfp32
intermediate data types. Usingfp32
intermediate data types is recommended.- SDPA with head size 512 and 576.
- Grouped Query Attention (GQA) with 5D input tensors.
AArch64-based Processors
- Improved
fp16
reorder performance. - Improved
int8
matmul performance. - Improved
bf16
inner product forward propagation performance with Arm Compute Library (ACL). - Improved
bf16
eltwise performance. - Improved convolution performance on processors with SVE support with ACL.
Functionality
Common
- Extended Graph API [
Softmax
] operation to supportinf_as_zero
mode. This functionality enables SDPA subgraph compliant with Pytorch Safe Softmax semantics.
Intel Architecture Processors
- Introduced support for
f32
convolution withfp16
compressed weights. - Enabled
int8
/int4
compressed weights support in matmul primitive.
Intel Graphics Products
- Introduced select algorithm support in binary primitive.
- Introduced support for
f4_e2m1
andf4_e3m0
data types in convolution primitive. - Introduced support for the GenIndex operation in Graph API.
Generic GPU Vendor
- Introduced support for:
- Vanilla RNN forward propagation.
- Inner product backpropagation.
- Group normalization.
- Improved accuracy of inner product primitive with sum post-ops for large shapes.
NVIDIA GPUs
- Introduced Graph API support.
Usability
- Added support for group normalization primitive with [
ONEDNN_ENABLE_PRIMITIVE
] build option. - Enabled support for ROCm 6 on AMD GPUs.
- Improved CMake integration for oneDNN installation with Nvidia backend enabled.
- Reduced memory footprint for matmul primitive when using ACL.
Validation
- Added benchdnn option [
--execution-mode
] to test oneDNN functionality with SYCL Graph record/execute mode. - Extended benchdnn option [
--cold-cache
] with support for cold TLB mode. - Added benchdnn option
--bia-dt
to control bias data type for matmul, inner product, convolution, and deconvolution primitives. - Extended syntax of benchdnn
--dt
option in Graph API driver to manage data types of individual tensors in a pattern.
Deprecated Functionality
- BLAS-like API including
dnnl::sgemm
,dnnl::gemm_u8s8s32
, anddnnl::gemm_s8s8s32
functions is deprecated and will be removed in future releases. If you are using this API consider switching to matmul primitive.
Breaking Changes
- Removed the experimental Graph Compiler backend for Graph API.
Thanks to our Contributors
This release contains contributions from the project core team as well as Aditya Tewari @aditew01, Alexander Simonov @asimonov1, Denis @redradist, Dmitriy Ovchinnikov @inteldimitrius, Eliezer Weissmann @eliezerweissmann, Hubert Maciak @hmaciak, Ilya Lavrenov @ilya-lavrenov, James McGregor @Jmc18134, @jstachowintel, Marek Michalowski @michalowski-arm, Maria Zhukova @mzhukova, Orel Yehuda @yehudaorel, Ravi Pushkar @rpushkarr, Renato Barros Arantes @renato-arantes, @Shreyas-fuj, Shu Chen @shu1chen, Viktoriia Gvozdeva @vgvozdeva, Yair Obodovsky @yair-obodovsky, and @zhangfeiv0.