Download Latest Version v0.10.0 source code.tar.gz (1.6 MB)
Email in envelope

Get an email when there's a new version of CubeCL

Home / v0.10.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2026-05-07 7.0 kB
v0.10.0 source code.tar.gz 2026-05-07 1.6 MB
v0.10.0 source code.zip 2026-05-07 1.9 MB
Totals: 3 Items   3.5 MB 0

What's Changed

  • Port over parts of burn-std to cubecl-zspace. (#1139) @crutcher
  • chore: bump tracel-llvm version to 20.1.4-6 (#1141) @syl20bnr
  • Rename try_cast_unchecked to downcast (#1146) @adolago
  • feat: Slice destructure/Fixed dim layout (#1149) @wingertge
  • Propagate lint rules to crates; fix outstanding violations. (#1138) @crutcher
  • fix(spirv): reuse decorated struct id for wkgrp layout(tracel-ai/burn#4355 (#1154)
  • fix(spirv): Fix dim type for metadata (#1155) @wingertge
  • fix: Ensure copy_into works correctly for different ranks (#1150) @wingertge
  • feat(spirv): Compilation cache (#1158) @wingertge
  • fix(spirv): Fix stack overflow by migrating GVN to iterative algorithm (#1156) @wingertge
  • chore(cuda): Bump cudarc for fallback-dynamic-loading (#1159) @wingertge
  • fix(spirv): Don't do copy transform if either operand is written to in between read and write (#1161) @wingertge
  • Upgrade to wgpu v28 (#1119) @laggui
  • fix(metal): fix float to int narrowing (#1163) @dcvz
  • Fix MLIR pass ordering and add CPU barrier support (#1151) @jguhlin
  • fix: error driver not found! in CI and update to Tracel GH action v8 (#1169) @syl20bnr
  • chore: update publish.yml (#1171) @syl20bnr
  • Feat: Improve registry (#1172) @nathanielsimard
  • Add aliases for backend features consistent with burn (#1174) @wingertge
  • fix: Fix SPIR-V fix for CopyTransform (#1173) @wingertge
  • Add some missing infra for radix sorting (#1170) @ArthurBrussee
  • feat: Add explicit resource errors (#1164) @wingertge
  • Add safety docs to generated launch_unchecked functions (#1168) @adolago
  • perf: Improve optimized tensor algorithm (#1177) @wingertge
  • feat: Vulkan 64bit indexing (#1178) @wingertge
  • feat: Chunked binary compilation cache (#1166) @wingertge
  • fix: Make TMA errors return a result instead of panicking (#1176) @wingertge
  • chore: Update dependencies to deduplicate and get fixes (#1179) @wingertge
  • Chore: Pre-Release 0.10.0-pre.1 (#1180) @nathanielsimard
  • Upgrade to rand 0.10 (#1182) @laggui
  • fix: remove unconditional std on cubecl-runtime and cubecl-common (#1184) @antimora
  • feat: Enable 64-bit indexing (#1185) @wingertge
  • perf: Remove unconditional format from virtual layout (#1186) @wingertge
  • CudaServer::change_server_serialized simultaneous Commands. (#1143) @crutcher
  • perf: Improve performance of MetadataBuilder (#1187) @wingertge
  • refactor: Metadata (#1190) @wingertge
  • fix: Make cubecl-zspace no_std, avoid future prelude issues (#1193) @wingertge
  • Dependency tweaks for building on Android (#1160) @metasim
  • fix: Fix metadata on no_std (#1195) @wingertge
  • fix(hip): use __hip_bfloat16 types and __hmax/__hmin for ROCm 7.1 (#1152) @GeisYaO
  • chore: update to cubecl-hip-sys version 7.1.5280200 (#1192) @syl20bnr
  • refactor: Remove CubeOption in favor of expanding Option (#1194) @wingertge
  • Update version (#1205) @nathanielsimard
  • Refactor device communication channel (#1199) @nathanielsimard
  • Fix/no std device + improve channel device handle performance (#1209) @nathanielsimard
  • Mma inplace version & 16x8x8 support (#1213) @louisfd
  • feat: Runtime enum (#1208) @wingertge
  • Fix wasm compilation error (#1206) @ArthurBrussee
  • Fix/memory management (#1214) @nathanielsimard
  • Fix: Benchmarking and Profiling (#1220) @nathanielsimard
  • fix: Fix for loops with breaks (#1222) @paulzhng
  • Remove critical section (#1223) @nathanielsimard
  • Fix multiple bugs (#1225) @nathanielsimard
  • refactor: Line size generic (#1221) @wingertge
  • refactor: Rename and refactor dynamic types (#1229) @wingertge
  • rm f32 float from metal (#1233) @louisfd
  • feat: Allow view layouts to infer launch info from buffer metadata (#1231) @wingertge
  • Remove atomic ptr (#1228) @nathanielsimard
  • Nccl all reduce (#1226) @Charles23R
  • revert removing f32 atomic from metal (#1235) @louisfd
  • chore: Update to wgpu v29, enable 64-bit buffers for Vulkan (#1236) @wingertge
  • refactor: Merge compilation_arg and register (#1237) @wingertge
  • Fix UB in memory handle location, fix cloning CubeCount::Dynamic (#1239) @ArthurBrussee
  • feat: gitignore .DS_Store (#1240) @syl20bnr
  • Fix 7 more cases of UB, fix flaky test (#1238) @ArthurBrussee
  • fix(wgpu): flush staging buffers periodically during bulk writes (#1204) @holg
  • remove nonexistant field (#1242) @louisfd
  • fix(cubecl-runtime): PersistentPool HashMap key mismatch and reuse safety (#1241) @Veercodeprog
  • Switch to effective_size (#1245) @nathanielsimard
  • refactor: Scalars/Metadata (#1244) @wingertge
  • chore: Clean up MetadataBindingInfo (#1248) @wingertge
  • Fix: defer CPU staging buffer drops with PendingDropQueue (#1255) @nathanielsimard
  • Replace bincode with ciborium for compilation cache (#1254) @Veercodeprog
  • Fix GPU hangs on integrated AMD GPUs by increasing drop queue flush frequency (#1257) @nathanielsimard
  • Document unsafe code in cubecl-hip/cubecl-cuda (#1258) @nathanielsimard
  • Adds arena + refactor stream id (#1259) @nathanielsimard
  • feat: Atomic vector (#1253) @wingertge
  • fix: Fix metal compile error (#1261) @wingertge
  • fix: Fix metal again, make features not mutually exclusive (#1262) @wingertge
  • Try all options as fallback when autotuning (#1247) @ArthurBrussee
  • fix: Use out item for atomic index so it works properly on Metal (#1265) @wingertge
  • fix: Improve portability of Vulkan compiler (#1263) @wingertge
  • Fix/cuda err all reduce (#1266) @Charles23R
  • feat(wgpu): support zero-sized resources (#1256) @ArthurBrussee
  • feat: Add Validate execution mode (#1268) @wingertge
  • fix: Remove const __restrict__ from atomic pointers in CUDA (#1273) @wingertge
  • fix: Remove optimized casts because it's not supported (#1269) @wingertge
  • Feat/vector sum (#1286) @nathanielsimard
  • Fix UB in arena dropping (#1287) @ArthurBrussee
  • Fix performance regression rocm (#1284) @nathanielsimard
  • Fix one case of unsoundness, and two other potential bugs. (#1289) @ArthurBrussee
  • Fix/tuner group (#1291) @nathanielsimard
  • Simplify async tuning (#1292) @ArthurBrussee
  • ci: add job to execute miri tests in ci.yml workflow (#1251) @syl20bnr
  • Fix and refactor all_reduce (#1290) @Charles23R
  • Track tuning lifetime (#1293) @ArthurBrussee
  • Feat/device service stage (#1302) @nathanielsimard
  • Fix wasm compilation (#1305) @ArthurBrussee
  • Refactor cubecl.toml config (#1303) @nathanielsimard
  • Fix: actually use the priority (#1307) @nathanielsimard
  • Fix cubecl-common Arc (#1308) @laggui
  • Fix persistent memory pool reset storage utilization when reserve (#1309) @nathanielsimard
  • Add more strategies than a spin loop (#1310) @nathanielsimard
  • Improve error reporting on WASM (#1306) @ArthurBrussee
  • Server send recv (#1304) @Charles23R
  • Fix vector size check for strided tensors with unit strides on non-axis dims (#1312) @antimora
Source: README.md, updated 2026-05-07