| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-01-15 | 13.5 kB | |
| v0.9.0 source code.tar.gz | 2026-01-15 | 1.4 MB | |
| v0.9.0 source code.zip | 2026-01-15 | 1.8 MB | |
| Totals: 3 Items | 3.2 MB | 0 | |
What's Changed
- remove is_padded check by @louisfd in https://github.com/tracel-ai/cubecl/pull/988
- Fix plane matmul selection & reduce workgroup invocations by @laggui in https://github.com/tracel-ai/cubecl/pull/989
- bump 0.7.1 by @louisfd in https://github.com/tracel-ai/cubecl/pull/992
- Fix/misc/release 07 by @louisfd in https://github.com/tracel-ai/cubecl/pull/994
- Bump cubecl to version 0.9.0 by @laggui in https://github.com/tracel-ai/cubecl/pull/996
- perf: Separate batch layout from main global layout to allow prefetching batch offset by @wingertge in https://github.com/tracel-ai/cubecl/pull/991
- opt: Add automatic unrolling of unit loops by @wingertge in https://github.com/tracel-ai/cubecl/pull/986
- opt: Make GVN side-effect free and assume loops are executed at least once by @wingertge in https://github.com/tracel-ai/cubecl/pull/985
- Attention: some test refactoring by @louisfd in https://github.com/tracel-ai/cubecl/pull/999
- fix: Fix
ConcreteOutputFactoryimplementation in convolution by @wingertge in https://github.com/tracel-ai/cubecl/pull/998 - Flash Attention: Unit Attention by @louisfd in https://github.com/tracel-ai/cubecl/pull/1002
- ci: check version and use tracel action and xtask to publish by @syl20bnr in https://github.com/tracel-ai/cubecl/pull/995
- Fix/memory usage by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1001
- feat: Allow
mmamatmul to be selected by @wingertge in https://github.com/tracel-ai/cubecl/pull/1003 - refactor: TMA checks by @wingertge in https://github.com/tracel-ai/cubecl/pull/1006
- feat: Update tune key to enable safely tuning TMA algorithms by @wingertge in https://github.com/tracel-ai/cubecl/pull/1007
- feat: Auto-detect CUDA version and fix some 12.8 features by @wingertge in https://github.com/tracel-ai/cubecl/pull/1008
- feat: Granular math mode by @wingertge in https://github.com/tracel-ai/cubecl/pull/1000
- Define numeric types by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1009
- refactor: Read Strategy by @wingertge in https://github.com/tracel-ai/cubecl/pull/1010
- Fix cudarc feature flags for no default-features by @laggui in https://github.com/tracel-ai/cubecl/pull/1011
- Flash attention: unit & accelerated working attentions + fix partitions by @louisfd in https://github.com/tracel-ai/cubecl/pull/1012
- Flash attention: batch and num heads by @louisfd in https://github.com/tracel-ai/cubecl/pull/1014
- Flash Attention: bench by @louisfd in https://github.com/tracel-ai/cubecl/pull/1016
- feat: Specialized matmul using barriers by @wingertge in https://github.com/tracel-ai/cubecl/pull/1015
- Refactor/dtype by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1017
- Disable/mma/amd by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1020
- Add TMA checks before launch by @laggui in https://github.com/tracel-ai/cubecl/pull/1021
- Set pre-release version by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1023
- Ci/disable version check by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1024
- Flash Attention: Transpose key later by @louisfd in https://github.com/tracel-ai/cubecl/pull/1022
- Define many by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1025
- Flash Attention: refactor dtypes by @louisfd in https://github.com/tracel-ai/cubecl/pull/1026
- Flash Attention: fix logical mask bug when kv partition > 1 by @louisfd in https://github.com/tracel-ai/cubecl/pull/1027
- Matmul: readers & jobs generic only on global and stage types by @louisfd in https://github.com/tracel-ai/cubecl/pull/1028
- Fix conv tests by @louisfd in https://github.com/tracel-ai/cubecl/pull/1029
- Fix remainder int by @laggui in https://github.com/tracel-ai/cubecl/pull/1033
- feat: Implement
ldmatrixand refactor manual mma args by @wingertge in https://github.com/tracel-ai/cubecl/pull/1018 - Feat/pinned mem by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1030
- Fix no std file by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1037
- fix: Fix issue with ldmatrix address conversion by @wingertge in https://github.com/tracel-ai/cubecl/pull/1038
- feat: Swizzled shared memory by @wingertge in https://github.com/tracel-ai/cubecl/pull/1035
- feat: add trigonometric functions by @relativityhd in https://github.com/tracel-ai/cubecl/pull/861
- Fix SPIR-V signed int remainder semantics by @laggui in https://github.com/tracel-ai/cubecl/pull/1036
- fix: Fix MMA on HIP by @wingertge in https://github.com/tracel-ai/cubecl/pull/1039
- Fix deadlock when copy by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1041
- Fix: invalid tile size by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1043
- fix: Feature gate fast tanh by @wingertge in https://github.com/tracel-ai/cubecl/pull/1045
- Bump version by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1054
- perf: MMA line size by @wingertge in https://github.com/tracel-ai/cubecl/pull/1044
- Fix fma so its callable from #cube functions by @amfaber in https://github.com/tracel-ai/cubecl/pull/1049
- Treat Operation::Copy as an implicit cast as well on the CPU backend by @amfaber in https://github.com/tracel-ai/cubecl/pull/1050
- Ensure we propagate the return index when a return block is folded into an existing block by @amfaber in https://github.com/tracel-ai/cubecl/pull/1051
- Improve compilation time for Burn by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1055
- Matmul: Major config refactor by @louisfd in https://github.com/tracel-ai/cubecl/pull/1042
- Kernels: some cleanup by @louisfd in https://github.com/tracel-ai/cubecl/pull/1058
- Fix: assertion for line_size in naive.rs (#1046) by @PulsarUnderscore in https://github.com/tracel-ai/cubecl/pull/1047
- fix: Fix writer stage size that was broken during the config migration by @wingertge in https://github.com/tracel-ai/cubecl/pull/1061
- Fix/autotuner by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1062
- Chore: Prepare pre-version 0.9.0-pre.3 by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1066
- Fix workgroup_id typo in book by @BenFradet in https://github.com/tracel-ai/cubecl/pull/1060
- fix: Fix composite merge pass with mutable values by @wingertge in https://github.com/tracel-ai/cubecl/pull/1063
- feat: Implement stmatrix and stage casting to support it by @wingertge in https://github.com/tracel-ai/cubecl/pull/1056
- refactor: Scalars by @wingertge in https://github.com/tracel-ai/cubecl/pull/1064
- Feat/event bus by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/950
- Flash Attention: use loader from matmul + fix sync bug by @louisfd in https://github.com/tracel-ai/cubecl/pull/1067
- refactor: Move
Runtimetocubecl-runtimeby @wingertge in https://github.com/tracel-ai/cubecl/pull/1068 - Flash Attention: vectorized query + fix metal wmma load from global memory + fix main compilation by @louisfd in https://github.com/tracel-ai/cubecl/pull/1069
- Flash Attention: fix all-masked rows by @louisfd in https://github.com/tracel-ai/cubecl/pull/1070
- Enable tuner name by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1071
- Flash attention: lines for mask and value by @louisfd in https://github.com/tracel-ai/cubecl/pull/1072
- Flash Attention: all lines by @louisfd in https://github.com/tracel-ai/cubecl/pull/1073
- Flash Attention: test and fix f16 by @louisfd in https://github.com/tracel-ai/cubecl/pull/1074
- Feat/execution error by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1075
- Flash Attention: strengthen test suite by @louisfd in https://github.com/tracel-ai/cubecl/pull/1077
- Feat/runtime error by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1078
- Flash Attention: a bit of selector and enable unit attention for burn by @louisfd in https://github.com/tracel-ai/cubecl/pull/1079
- Fix missing parenthesis on .into call by @BjornTheProgrammer in https://github.com/tracel-ai/cubecl/pull/1080
- feat: Rewrite async loaders to make them actually useful by @wingertge in https://github.com/tracel-ai/cubecl/pull/1076
- Set pre-release by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1082
- Feat/improve errors by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1084
- refactor: Convolution by @wingertge in https://github.com/tracel-ai/cubecl/pull/1083
- Fix/stuff by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1085
- fix: Fix line size selection for convolution by @wingertge in https://github.com/tracel-ai/cubecl/pull/1086
- Feat/validation error by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1088
- .gitignore editors/ides (cloned from burn) by @crutcher in https://github.com/tracel-ai/cubecl/pull/1089
- feat: Shared values by @wingertge in https://github.com/tracel-ai/cubecl/pull/1090
- Flash Attention: Blueprint by @louisfd in https://github.com/tracel-ai/cubecl/pull/1087
- Migrate kernels part 1 by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1095
- Bump pre-release by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1096
- Remove code by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1097
- Feat/cpu scheduler by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1098
- Fix/no std runtime by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1100
- Fix unused pattern by @crutcher in https://github.com/tracel-ai/cubecl/pull/1091
- refactor: Tensor map by @wingertge in https://github.com/tracel-ai/cubecl/pull/1099
- feat: Add zero-copy Bytes support via bytes::Bytes allocator by @antimora in https://github.com/tracel-ai/cubecl/pull/1093
- add epsilon for type by @louisfd in https://github.com/tracel-ai/cubecl/pull/1101
- Fix try_into_vec for SharedBytesAllocationController by @antimora in https://github.com/tracel-ai/cubecl/pull/1102
- Various fix by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1104
- fix: Fix CUDA < 12.8 branch for im2colWide by @wingertge in https://github.com/tracel-ai/cubecl/pull/1103
- Support const in match patterns for #[cube] macro by @sepcnt in https://github.com/tracel-ai/cubecl/pull/1105
- Add unaligned_line_read and unaligned_line_write as cpu-only cubecl extensions by @amfaber in https://github.com/tracel-ai/cubecl/pull/1052
- CPU Runtime: Fix wrong limits by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1107
- Add
tracinginstrumentation and dependencies across crates by @crutcher in https://github.com/tracel-ai/cubecl/pull/1106 - Add support for hypot and reciprocal hypot. by @vaijira in https://github.com/tracel-ai/cubecl/pull/1048
- Plane non uniform control flow feature by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1108
- Refactor/cube dim by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1111
- fix: Fixes various issues with
--all-featuresbuilds by @wingertge in https://github.com/tracel-ai/cubecl/pull/1110 - Chore/pre release 6 by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1114
- fix: Fix invalid phi nodes from partially destructured array values by @wingertge in https://github.com/tracel-ai/cubecl/pull/1118
- Deep-plumb
tracingfeature. by @crutcher in https://github.com/tracel-ai/cubecl/pull/1116 - 2D into contiguous by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1113
- fix: Fix free handling for CPU backend by @wingertge in https://github.com/tracel-ai/cubecl/pull/1123
- Update readme link to matmul crate by @milesfrain in https://github.com/tracel-ai/cubecl/pull/1122
- Add inverse trigonometric and hyperbolic trait impls for Line
by @ravituringworks in https://github.com/tracel-ai/cubecl/pull/1125
- refactor: Constants by @wingertge in https://github.com/tracel-ai/cubecl/pull/1121
- Gfx12 support by @marion-santiago in https://github.com/tracel-ai/cubecl/pull/1126
- fix: async mem not available in vgpu on some cuda versions by @Na1w in https://github.com/tracel-ai/cubecl/pull/1128
- Fix atan2 line by @laggui in https://github.com/tracel-ai/cubecl/pull/1130
- Feat/comptime device props by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1129
- feat:
usizeindexing by @wingertge in https://github.com/tracel-ai/cubecl/pull/1127 - Enable
#[test_log::test]support; add PERFORMANCE.md doc. by @crutcher in https://github.com/tracel-ai/cubecl/pull/1132 - refactor: Close gaps between CubeCL and standard Rust by @wingertge in https://github.com/tracel-ai/cubecl/pull/1131
- Lift/Test
valid_stridesinto a layout validation lib. by @crutcher in https://github.com/tracel-ai/cubecl/pull/1133 - chore: update xtask to 4.9.0 by @syl20bnr in https://github.com/tracel-ai/cubecl/pull/1135
- Docs/Debug-Checks/Cleaner flow for write_to_{gpu,cpu}. by @crutcher in https://github.com/tracel-ai/cubecl/pull/1137
- fix: Store packed dimension on packed quant stores so it can be correctly handled for swapped tensors by @wingertge in https://github.com/tracel-ai/cubecl/pull/1134
- fix: Fix broken PR [#1133] by @wingertge in https://github.com/tracel-ai/cubecl/pull/1140
- Fix plane count check with
max_units_per_cubeby @laggui in https://github.com/tracel-ai/cubecl/pull/1142 - Pre version by @nathanielsimard in https://github.com/tracel-ai/cubecl/pull/1144