Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-07-25 | 12.3 kB | |
v0.27.1 source code.tar.gz | 2025-07-25 | 3.1 MB | |
v0.27.1 source code.zip | 2025-07-25 | 3.4 MB | |
Totals: 3 Items | 6.4 MB | 0 |
Highlights
- Initial PyPi release of the CUDA back-end.
- CUDA back-end works for well with mlx-lm:
- Reasonably fast for LLM inference
- Supports single-machine training and LoRA fine-tuning
What's Changed
- Avoid invoking allocator::malloc when creating CUDA event by @zcbenz in https://github.com/ml-explore/mlx/pull/2232
- Share more common code in Compiled by @zcbenz in https://github.com/ml-explore/mlx/pull/2240
- Avoid atomic updates across CPU/GPU in CUDA event by @zcbenz in https://github.com/ml-explore/mlx/pull/2231
- Perf regression fix by @angeloskath in https://github.com/ml-explore/mlx/pull/2243
- Add profiler annotations in common primitives for CUDA backend by @zcbenz in https://github.com/ml-explore/mlx/pull/2244
- Default strict mode for module
update
andupdate_modules
by @awni in https://github.com/ml-explore/mlx/pull/2239 - Fix linux linking error by @awni in https://github.com/ml-explore/mlx/pull/2248
- Improve metal elementwise kernels by @awni in https://github.com/ml-explore/mlx/pull/2247
- CUDA backend: matmul by @zcbenz in https://github.com/ml-explore/mlx/pull/2241
- Change layernorms to two pass algorithm by @angeloskath in https://github.com/ml-explore/mlx/pull/2246
- Fix unintuitive metal kernel caching by @awni in https://github.com/ml-explore/mlx/pull/2242
- Refactor the lu test by @emmanuel-ferdman in https://github.com/ml-explore/mlx/pull/2250
- CUDA backend: unary ops by @zcbenz in https://github.com/ml-explore/mlx/pull/2158
- Fix export to work with gather/scatter axis by @awni in https://github.com/ml-explore/mlx/pull/2263
- CUDA backend: binary ops by @zcbenz in https://github.com/ml-explore/mlx/pull/2259
- Report number of missing parameters by @FL33TW00D in https://github.com/ml-explore/mlx/pull/2264
- CUDA backend: sort by @zcbenz in https://github.com/ml-explore/mlx/pull/2262
- CUDA backend: random by @zcbenz in https://github.com/ml-explore/mlx/pull/2261
- Fix conv export by @awni in https://github.com/ml-explore/mlx/pull/2265
- CUDA backend: copy ops by @zcbenz in https://github.com/ml-explore/mlx/pull/2260
- Fix building cpp benchmarks on Linux by @zcbenz in https://github.com/ml-explore/mlx/pull/2268
- Add load_safe to the general conv loaders by @angeloskath in https://github.com/ml-explore/mlx/pull/2258
- start cuda circle config by @awni in https://github.com/ml-explore/mlx/pull/2256
- CUDA backend: reduce by @zcbenz in https://github.com/ml-explore/mlx/pull/2269
- CUDA backend: argreduce by @zcbenz in https://github.com/ml-explore/mlx/pull/2270
- CUDA backend: softmax by @zcbenz in https://github.com/ml-explore/mlx/pull/2272
- CUDA backend: layernorm by @zcbenz in https://github.com/ml-explore/mlx/pull/2271
- Fix warnings from latest CUDA toolkit by @zcbenz in https://github.com/ml-explore/mlx/pull/2275
- Make sliceUpdate general by @awni in https://github.com/ml-explore/mlx/pull/2282
- CUDA backend: compile by @zcbenz in https://github.com/ml-explore/mlx/pull/2276
- [CUDA] RMSNorm and VJP by @awni in https://github.com/ml-explore/mlx/pull/2280
- [CUDA] Fix build by @awni in https://github.com/ml-explore/mlx/pull/2284
- [CUDA] ternary with select op by @awni in https://github.com/ml-explore/mlx/pull/2283
- CUDA backend: indexing ops by @zcbenz in https://github.com/ml-explore/mlx/pull/2277
- Collection of refactors by @jagrit06 in https://github.com/ml-explore/mlx/pull/2274
- Fix complex power and print by @awni in https://github.com/ml-explore/mlx/pull/2286
- fix cuda jit by @awni in https://github.com/ml-explore/mlx/pull/2287
- Fix cuda gemm for bf16 by @awni in https://github.com/ml-explore/mlx/pull/2288
- Fix cuda arg reduce by @awni in https://github.com/ml-explore/mlx/pull/2291
- RoPE for CUDA by @angeloskath in https://github.com/ml-explore/mlx/pull/2293
- Add python testing for cuda with ability to skip list of tests by @awni in https://github.com/ml-explore/mlx/pull/2295
- [CUDA] Fix back-end bugs and enable corresponding tests by @awni in https://github.com/ml-explore/mlx/pull/2296
- Cuda bug fixes 2 by @awni in https://github.com/ml-explore/mlx/pull/2298
- [CUDA] Divmod, Partition, and sort fixes by @awni in https://github.com/ml-explore/mlx/pull/2302
- [CUDA] synch properly waits for all tasks to finish and clear by @awni in https://github.com/ml-explore/mlx/pull/2303
- Make ptx cache settable by environment variable by @angeloskath in https://github.com/ml-explore/mlx/pull/2304
- Build CUDA release in Circle by @awni in https://github.com/ml-explore/mlx/pull/2306
- Cuda perf tuning by @awni in https://github.com/ml-explore/mlx/pull/2307
- Fix
update_modules()
when providing a subset by @angeloskath in https://github.com/ml-explore/mlx/pull/2308 - Compile float64 functions on CPU by @awni in https://github.com/ml-explore/mlx/pull/2311
- Fix get 2d grid dims by @angeloskath in https://github.com/ml-explore/mlx/pull/2316
- Split broadcast so it is always fused in compile by @angeloskath in https://github.com/ml-explore/mlx/pull/2318
- [CUDA] Fix reductions by @angeloskath in https://github.com/ml-explore/mlx/pull/2314
- Fix module update in strict mode by @awni in https://github.com/ml-explore/mlx/pull/2321
- MLX_SWITCH macros to templates by @angeloskath in https://github.com/ml-explore/mlx/pull/2320
- Use fp32 for testing, add more complex ops by @awni in https://github.com/ml-explore/mlx/pull/2322
- Patch bump by @awni in https://github.com/ml-explore/mlx/pull/2324
- Allow parameters to be deleted from a module by @awni in https://github.com/ml-explore/mlx/pull/2325
- Fix compilation error from integral_constant by @zcbenz in https://github.com/ml-explore/mlx/pull/2326
- [CUDA] Switch to CUDA graphs by @awni in https://github.com/ml-explore/mlx/pull/2317
- [CUDA] Fix graphs for older cuda by @awni in https://github.com/ml-explore/mlx/pull/2328
- [CUDA] Add MLX_CUDA_GRAPH_CACHE_SIZE env for setting graph cache size by @zcbenz in https://github.com/ml-explore/mlx/pull/2329
- Fix layernorm race condition by @angeloskath in https://github.com/ml-explore/mlx/pull/2340
- Build with all cpu cores by default by @zcbenz in https://github.com/ml-explore/mlx/pull/2336
- [CUDA] Do vectorized store/load in binary ops by @zcbenz in https://github.com/ml-explore/mlx/pull/2330
- Auto build linux release by @awni in https://github.com/ml-explore/mlx/pull/2341
- MoE backward improvements by @angeloskath in https://github.com/ml-explore/mlx/pull/2335
- Fix compilation with CUDA 11 by @zcbenz in https://github.com/ml-explore/mlx/pull/2331
- patch bump by @awni in https://github.com/ml-explore/mlx/pull/2343
- Align mlx::core::max op nan propagation with NumPy by @jhavukainen in https://github.com/ml-explore/mlx/pull/2339
- Add zero for argsort vjp by @awni in https://github.com/ml-explore/mlx/pull/2345
- [CUDA] Do vectorized store/load in contiguous elementwise ops by @zcbenz in https://github.com/ml-explore/mlx/pull/2342
- Align mlx::core::min op nan propagation with NumPy by @jhavukainen in https://github.com/ml-explore/mlx/pull/2346
- [CUDA] Set current device before cudaGraphLaunch by @zcbenz in https://github.com/ml-explore/mlx/pull/2351
- [CUDA] Put version in ptx cache dir path by @zcbenz in https://github.com/ml-explore/mlx/pull/2352
- Fix type promotion in Adam with bias correction by @angeloskath in https://github.com/ml-explore/mlx/pull/2350
- Fix edge check in QuantizedBlockLoader for qmm_n by @angeloskath in https://github.com/ml-explore/mlx/pull/2355
- [CUDA] Implement Scan kernel by @zcbenz in https://github.com/ml-explore/mlx/pull/2347
- [Metal] fix copy dispatch by @awni in https://github.com/ml-explore/mlx/pull/2360
- [CUDA] Bundle CCCL for JIT compilation by @zcbenz in https://github.com/ml-explore/mlx/pull/2357
- [CUDA] Do not put kernels in annoymous namespace by @zcbenz in https://github.com/ml-explore/mlx/pull/2362
- Fix imag() vjp by @angeloskath in https://github.com/ml-explore/mlx/pull/2367
- Add Primitive::name and remove Primitive::print by @zcbenz in https://github.com/ml-explore/mlx/pull/2365
- update linux build by @awni in https://github.com/ml-explore/mlx/pull/2370
- [CUDA] Affine quantize by @awni in https://github.com/ml-explore/mlx/pull/2354
- Fix flaky linux test by @awni in https://github.com/ml-explore/mlx/pull/2371
- Install linux with mlx[cuda] and mlx[cpu] by @awni in https://github.com/ml-explore/mlx/pull/2356
- [CUDA] Use cuda::std::complex in place of cuComplex by @zcbenz in https://github.com/ml-explore/mlx/pull/2372
- lower memory uniform sampling by @awni in https://github.com/ml-explore/mlx/pull/2361
- [CUDA] Fix complex reduce + nan propagation in min and max by @awni in https://github.com/ml-explore/mlx/pull/2377
- Rename the copy util in cpu/copy.h to copy_cpu by @zcbenz in https://github.com/ml-explore/mlx/pull/2378
- fix ring distributed test by @awni in https://github.com/ml-explore/mlx/pull/2380
- Test with CUDA 12.2 by @awni in https://github.com/ml-explore/mlx/pull/2375
- [CUDA] Add work per thread to compile by @angeloskath in https://github.com/ml-explore/mlx/pull/2368
- [CUDA] Fix resource leaks in matmul and graph by @awni in https://github.com/ml-explore/mlx/pull/2383
- [CUDA] Add more ways finding CCCL headers in JIT by @zcbenz in https://github.com/ml-explore/mlx/pull/2382
- Add contiguous_copy_gpu util for copying array by @zcbenz in https://github.com/ml-explore/mlx/pull/2379
- Adding support for the Muon Optimizer by @Goekdeniz-Guelmez in https://github.com/ml-explore/mlx/pull/1914
- Patch bump by @awni in https://github.com/ml-explore/mlx/pull/2386
- Fix release build + patch bump by @awni in https://github.com/ml-explore/mlx/pull/2387
- Fix cuda manylinux version to match others by @awni in https://github.com/ml-explore/mlx/pull/2388
- [CUDA] speedup handling scalars by @awni in https://github.com/ml-explore/mlx/pull/2389
- Remove thrust iterators by @zcbenz in https://github.com/ml-explore/mlx/pull/2396
- Add contiguous_copy_cpu util for copying array by @zcbenz in https://github.com/ml-explore/mlx/pull/2397
- Fix including stubs in wheel by @awni in https://github.com/ml-explore/mlx/pull/2398
- use size option in binary by @awni in https://github.com/ml-explore/mlx/pull/2399
- [CUDA] Simplify allocator by @awni in https://github.com/ml-explore/mlx/pull/2392
- Add cuda gemv by @awni in https://github.com/ml-explore/mlx/pull/2400
- Fix an error in the comment for mx.dequantize by @csukuangfj in https://github.com/ml-explore/mlx/pull/2409
- Remove unused code in Convolution::vjp by @zcbenz in https://github.com/ml-explore/mlx/pull/2408
- [CUDA] --compress-mode requires CUDA 12.8 by @zcbenz in https://github.com/ml-explore/mlx/pull/2407
- full row mask in sdpa consistently gives nan by @awni in https://github.com/ml-explore/mlx/pull/2406
- Fix uv install and add dev release by @awni in https://github.com/ml-explore/mlx/pull/2411
- [Metal] Release metal events by @awni in https://github.com/ml-explore/mlx/pull/2412
- Test on cuda 12.2 and 12.9 by @awni in https://github.com/ml-explore/mlx/pull/2413
- [CUDA] Initial implementation of Convolution with cuDNN by @zcbenz in https://github.com/ml-explore/mlx/pull/2385
- [DOCS]: Fix eps placement in Adam and AdamW by @Skonor in https://github.com/ml-explore/mlx/pull/2416
- [CUDA] Always use batched matmul by @awni in https://github.com/ml-explore/mlx/pull/2404
- Fix qvm splitk by @awni in https://github.com/ml-explore/mlx/pull/2415
- Update install docs and requirements by @awni in https://github.com/ml-explore/mlx/pull/2419
- version by @awni in https://github.com/ml-explore/mlx/pull/2420
New Contributors
- @emmanuel-ferdman made their first contribution in https://github.com/ml-explore/mlx/pull/2250
- @FL33TW00D made their first contribution in https://github.com/ml-explore/mlx/pull/2264
- @jhavukainen made their first contribution in https://github.com/ml-explore/mlx/pull/2339
- @Goekdeniz-Guelmez made their first contribution in https://github.com/ml-explore/mlx/pull/1914
- @Skonor made their first contribution in https://github.com/ml-explore/mlx/pull/2416
Full Changelog: https://github.com/ml-explore/mlx/compare/v0.26.0...v0.27.0