The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
apache-tvm-src-v0.16.0.tar.gz.asc	2024-04-28	833 Bytes	0
apache-tvm-src-v0.16.0.tar.gz.sha512	2024-04-28	162 Bytes	0
apache-tvm-src-v0.16.0.tar.gz	2024-04-28	75.8 MB	0
Apache TVM v0.16.0 source code.tar.gz	2024-04-13	10.6 MB	0
Apache TVM v0.16.0 source code.zip	2024-04-13	16.2 MB	0
README.md	2024-04-13	46.1 kB	0
Totals: 6 Items		102.6 MB	0

Introduction

The TVM community has worked since the v0.15.0 release to deliver the following new exciting improvements! This release version is:

First support of Relax, with dynamic shape and pipeline
Dlight module for optimizing LLM TIR workloads on GPU
Disco module for initial SPMD multi-GPU support

The main tags are below (bold text is with lots of progress):

Community, RFCs
Adreno, ArmComputeLibrary, Metal, cuda & cutlass & tensorrt, micoNPU, Runtime
Relax, Dlight, Disco
Arith, TIR, TVMScript
Docs, CI, Misc, BugFix

Please visit the full listing of commits for a complete view: v0.16.dev0...v0.16.0.rc0.

Community

#16695 - Add new key for release signing
#16419 - Add new key for release signing

### RFCs

This new RFC explores how TVM can be utilized to generate code for the SME ISA to achieve improved inference performance on supported Arm®-based hardware implementing the SME extension.

* #107 - [RFC] Scalable Matrix Extension enablement

Arith

#16735 - [Fixup] Require feature flag for tighter inequality bounds
#16588 - Provide tighter ConstIntBounds for special cases
#16704 - [Fix]Fix canonical simplification of LE

BYOC

#16567 - Skip processed functions in FuseOpsByPattern and RunCodegen

BugFix

#16766 - [Target] Added null check to fix segfault at ->defined() in cpu.cc DetectSystemTriple()
#16739 - [Ansor] Fixing Ansor Gradient Bug
#16820 - [Fix] PAPI docs
#16793 - [Fix] fix for numpy 2.0 compatibility
#16790 - [Fix] Fix build errors with VS2022
#16780 - [Fix] Fix numpy dtype map
#16773 - [Fix] Fix the purity flag of "vm.call_tir_dyn" and "kill" ops
#16770 - [Hotfix] Revert driver API pass ordering that breaks MLC, mark failing test
#16771 - [Fix] Remove redundant "remove_all_unused" in IPC memory lowering
#16746 - [Fix][Builtin] Fix "GetQueryPosition" of PagedKVCache
#16728 - [Fix] Introduce TVM_DEBUG_WITH_ABI_CHANGE to warn ABI changes in debug mode
#16714 - [Fix] PagedKVCache fetching compute stream when copy stream is needed
#16684 - [SLM] Produce well-formed Relax for nn.modules.KVCache
#16659 - add the default value for DFT in ONNX frontend
#16637 - [Transform] Preserve symbolic variables in FuseOps
#16649 - [FFI] Add a missing default for datatype lanes
#16492 - [Executor] fix debug_executor function debug_get_output
#16598 - [Transform]Handle non-composite lambda functions in FuseOps
#16565 - [Transform] Keep private non-primitive functions in FuseTIR
#16518 - Use xxx instead of pow(x,3)
#16436 - Ensure that bf16 arrays are created as expected
#16361 - Disable SingleEnvThreadVerifier
#16289 - [AUTOTVM][FIX] Typo fixes and add a warning in the Droplet Search

CI

#16837 - Disable flaky unit test
#16765 - [AOT][Testing] Improve output mismatch information on test failure
#16661 - add merge_with_main in unity
#16611 - [AOT][Testing] Print output values on test failure
#16546 - Disable testing that downloads from mxnet
#16521 - Fix CI Script and Broken Tests
#16502 - Support tvm-bot rerun for tvm-unity task
#16435 - Update image tag to 20240126-070121-8ade9c30e
#16420 - [WASM] Update emsdk and nodejs version
#16384 - Remove NVIDIA_DISABLE_REQUIRE
#16382 - In jenkins.cmd_utils.Sh.tee, check for failing subprocess
#16366 - Upgrade sccache version to 0.7.*
#16369 - Upgrade Unity ci images
#16344 - Update docker images tag to 20240105-165030-51bdaec6
#16340 - [Unity][UnitTest] Increase atol to resolve flaky CI failure
#16337 - [Hexagon][UnitTest] Disable flaky quantization test
#16336 - Upgrade cmake version to 3.24.0

Docker

#16755 - [SME]Add Fixed Virtual Platform (FVP) and toolchain install
#16348 - Upgrade pip in i386 container

Disco

#16618 - [Disco] Propagate structlog configuration to disco workers
#16639 - [Disco] Expose functions to query the per-worker device/rank
#16617 - [Disco] Implement Session.import_python_module method
#16715 - [Disco] Propagate structlog/logging config to workers
#16845 - [Debug][Disco] Check if a PackedFunc exists before calling it
#16817 - [Disco] Reduce Process/ThreadSession message queue reads and writes
#16807 - [Disco] Support setting workers' CPU affinity
#16375 - [Unity] Fix creation of disco ProcessSession
#16821 - [Fix] Add TVM_DLL to Disco session
#16752 - [Fix] Lazy import of "psutil" in disco process pool

Dlight

#16775 - [Fix][Dlight] (Low-batched-)GeMV on small spatial loops
#16429 - [Unity][Dlight][Fix] Reduction rule support dyn-shape epilogue
#16351 - [Unity] Add dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
#16338 - [Unity][DLight] Introduce Specific Rule for RMSNorm
#16251 - [Unity][Dlight] Support dlight gemv rule on nested inner block
#16878 - [Dlight] Enhance vectorization loading weight for gemv
#16848 - [DLight] Fix a corner case for reduction rule
#16701 - [Dlight] Add fallback for low batch gemv with outer reduction
#16678 - [Dlight] LowBatchGemv rule only apply to function with spatial symbolic var
#16665 - [Dlight] Skip GeMV when normalization fails
#16579 - [Dlight] Scheduling Low batch GEMM using GEMV-like rule
#16579 - [Dlight] Scheduling Low batch GEMM using GEMV-like rule
#16321 - [DLight] Skip rule if target is not suitable
#16731 - [Dlight] Fix GeMV shared memory estimation

Docs

#16792 - [Doc] Fix set_axis_separator example
#16610 - [Doc] Fixed Docstring usage example in tvm.ir.make_node
#16572 - [Doc] Remove MxNet related tutorials
#16514 - [Unity][Doc] Document passes that depend on DataflowBlocks and encourage using ConvertToDataflow
#16482 - [Doc] Fix Docstring in extern.py for Sphinx
#16346 - [Doc] Fix minor error in "Expressions in Relay"

Frontend

#16001 - [ONNX] Fix interpreting auto_pad parameters in ConvTranspose operator
#16651 - [PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization
#16616 - [PaddlePaddle] Support conv2d when data_format is NHWC
#16526 - [Keras] Enable Dense operator for any input dims
#16478 - [PaddlePaddle] Fixed the bug that prevented the model from being successfully converted to microTVM on MacOS

Hexagon

#16762 - [VM]Cache operations when bypass mode is enabled
#16706 - [VM] Add buffers to dma_wait builtin
#16448 - [VM]Implement dma_copy and dma_wait builtin for hexagon

LLVM

#16782 - [SVE] Support scalable vectors in LoopVectorizer
#16812 - Fix compilation failure due to minor change
#16808 - [Runtime]Fix errors during loading of target tags
#16748 - Lack of DWARF type is not an error
#16696 - [SVE] Add codegen support for scalable buffer accesses
#15964 - [RUNTIME] Add optional LLVM ORCJIT runtime executor
#16612 - [SVE] Add support for scalable data type strings
#16523 - [SVE] Change the dtype of Ramp and Broadcast lanes to PrimExpr
#16484 - [SVE] Add vscale builtin
#16373 - Update Host.h path

MetaSchedule

#16725 - Make the opt_level of tune_relay() adjustable

Metal

#16713 - [RUNTIME]Provide richer runtime when error happens
#16605 - [RUNTIME]Fix multithreading access of metal runtime
#16438 - Dispatch numerically stable tanh for metal

OpenCL & CLML

#16854 - [OpenCL] Add OpenCL device for automatic target detection
#16846 - [Meta-Schedule][OpenCL] Enable MS tuning for Android OpenCL
#16768 - [RUNTIME][OPENCL] Bugfix for ciImage create with host ptr
#16672 - [CLML] Fix build TVM with CLML on MacOS
#16328 - [RUNTIME][CLML] Fix for Softmax op for 4D tensors
#16394 - [OpenCL][CMake] Fix OpenCL tests compilation

ROCm

#16441 - [WebGPU] Intrin Dispatch: tanh, erf, log
#16404 - Some fixes of ROCm codegen

Relax

#16872 - Enhance symbolic expr estimation in memory planning
#16867 - Dispatch sort/scan for non-cuda gpu backends
#16852 - Fix EliminiateCommonSubexpr removing alloc tensor
#16851 - [Relax,Topi] Allow passing workspace to thrust to avoid allocations
#16841 - Provide well-formed output in transform.LazyGetInput
#16798 - [Transform] Provide callback versions of LazyTransformParams
#16801 - Allow DeadCodeElimination within ApplyPassToFunction
#16834 - Capture symbolic vars in struct info of weights
#16830 - Share storage allocs among functions after cuda graph rewriting
#16823 - [VM] Refactor CUDA graph builtins as VM extension
#16828 - [Bugfix] Provide the full Expr to pattern-match rewriter
#16805 - [Bugfix]BlockBuilder may not assume unique input functions
#16815 - Enable capturing symbolic shapes in cuda graph
#16642 - Allow R.Prim('bool') in relax::If and assert_op
#16796 - Unit-test for structural equal of recursive function
#16732 - Allow composition of DFPattern replacements
#16783 - Improve CanonicalizeBindings in DataflowVar edge case
#16721 - Implement operators to inspec DLTensor::strides and offset
#16730 - Refactor PatternRewriter into separate Block/Expr mutators
#16756 - [IR]Improve highlighting in assert_structural_equal
#16779 - Improve malform error msg
#16569 - [Unity][Parser] Check well-formedness in the parser
#16759 - [Pass] Lowering passes for GPU IPC memory and allreduce
#16697 - Implement relax.transform.TopologicalSort
#16658 - Normalize use of void-type variable to inline R.tuple()
#16711 - [Frontend] Add op tanh, exp, negative, and permute
#16703 - [Fix]Fix top-p/top-k sampling kernel
#16669 - [Frontend][Onnx] add sum and globalavgpool 1d/3d op
#16691 - CUDA graph rewrite treating StringImm as static
#16685 - Implement StructInfoPattern for dataflow pattern matching
#16681 - [Frontend][Onnx] support MaxPool1/2/3D and AveragePool1/2/3D
#16584 - [Unity][TIR] Clear struct info when specializing PrimFunc
#16676 - Remove the legalization of cumsum/cumprob
#16654 - [Frontend][NN] Add support for Conv3D
#16674 - Eager free original weights in transform_params
#16675 - add sample_indices in sampling
#16648 - [Runtime] Support Unpack API for NDArrayCache
#16591 - [Unity][Transform] Handle dynamic shapes in CombineParallelMatmul
#16594 - [Transform] Preserve param names in LiftTransformParams
#16575 - [Unity] GPU sampling
#16574 - Additional unit tests for RemoveUnusedParameters
#16585 - [Unity][Analysis] Include impure call in VerifyWellFormed errors
#16421 - [Unity][Transform] Raise error in FuseOpsByPattern for SSA violation
#16629 - Fix error message in BlockBuilder
#16592 - Handle dynamic arguments in legalization of nn.attention
#16590 - [Unity][Transform] Check for permute_dims in ExpandMatmulOfSum
#16604 - [Frontend][Onnx] fix clip unsqueeze opset implement
#16568 - [Runtime] RNNState for Space State Models
#16563 - Implement operators to read runtime DLTensor* information
#16581 - [Unity][MSC][M4.2][Step2] Enable plugin with manager, test plugins in compile pipeline
#16600 - Expose name_hint field for BlockBuilder.match_cast
#16601 - [Transform] Canonicalize let var = R.const bindings
#16583 - [Unity][VM] Recursively visit match bindings in VMShapeLowerMutator
#16586 - Ignore non-relax functions in relax.transform.RunCodegen
#16573 - [VM] Re-implementation of callback functions
#16561 - [Bugfix]Remove call to tvm.build for empty TIR module
#16564 - [Unity] Check for symbolic vars in PrimValue in when lowering to TIR
#16558 - Minor updates for NN frontend
#16542 - Support callback as argument
#16487 - [Unity][Transform] Handle call_tir_inplace in FuseTIR and FuseOps
#16355 - [Unity] Infer struct info for relax.op.split on dynamic-sized index
#16465 - [Redo][Unity] Split DecomposeOpsForTraining into two steps
#16495 - [Unity][MSC][M4.2][Step1] Enable plugin with manager, test plugins in compile pipeline
#16498 - [Frontent] "tensor_ir_inplace" op
#16500 - [Unity] Support storage reuse for dynamic shapes
#16493 - [Pass] Skip data type node for CSE pass
#16467 - [Unity][MSC][Refactor] Reconstruct BYOC and runner
#16422 - [Unity][CodeGen] RunCodegen based on externally-exposed functions
#16483 - [Unity][Frontend] Add Sigmoid and Square Op
#16472 - [Unity] Improved error message in tvm::relax::UpdateStructInfo
#16473 - [Unity] Improve error message in tensor_to_shape struct inference
#16466 - Memory planning for "partially dynamic" shapes
#16464 - NDArray Cache Update with DLTensor Support
#16315 - [Unity][Transform] Implement relax.transform.ReorderTakeAfterMatmul
#16313 - [Unity][Transform] Implement relax.transform.ExpandMatmulOfSum
#16411 - [Unity][Transform] Handle symbolic variables in LambdaLift
#16443 - [Unity][FIX] fix thread dtype mismatch
#16442 - Revert "[Unity] Split DecomposeOpsForTraining into two steps"
#16437 - [Unity] Improve buffer allocation for handling duplicated buffer names.
#16439 - [Unity] Support cumsum with pure int32
#16432 - [Unity] downgrade cmake version requirement
#16427 - [Unity][Frontend][NN] Better support for dynamic convolutions
#16418 - [Unity][Fix] Fix mismatched intrinsic name
#16129 - [Unity][Transform] Replace eligible operators with in-place versions in dataflow blocks
#16414 - [Bugfix][Unity] Recover MSVC/NVCC/ROCm/Vulkan
#15954 - [Unity] Split DecomposeOpsForTraining into two steps
#16111 - [Unity][Transform] Memory planning for dynamic-shape func return
#16396 - [Unity] PagedKVCache supporting on-the-fly RoPE calculation
#16395 - [Frontend][ONNX]fix onnx frontend parse
#16385 - [Unity][Op] Add Conv3D Operator
#16284 - [Unity][nnModule] Dynamic shape support in nn Module
#16378 - [Unity][BlockBuilder] Restore bb.get()
#16374 - [Unity] Support TIR kernel for PagedKVCache
#16314 - [Unity][Transform] Implement relax.transform.AdjustMatmulOrder
#16349 - [Unity][MSC] Avoid depending on trivial bindings in Relax intermediate
#16376 - [Unity][Contrib] Fix a bug due to typo in vllm reconstruct_from_cache kernel and add test
#16388 - [Unity] Update dispatch test cases following the merge from main
#16335 - [Unity] Set CMAKE_CUDA_ARCHITECTURES default to native
#16306 - [Unity][Transform] Update LambdaLift to use name of lifted lambda
#16310 - [Unity][Analysis] Show objects instead of names in WellFormedChecker
#16362 - [Unity][Fix] Memory planning check value type of 'tir_var_upper_bound'
#16367 - [Unity][Transform] Handle replacement at both var binding and usage
#16309 - [Unity][Transform] Use parameter name in BundleModelParams
#16307 - [Unity] Improved error message in ExprMutator::ReEmitBinding
#16308 - [Unity] Improved error message for matmul shape mismatch
#16360 - [Unity] Enhance Torch-consistency in rehsape
#16350 - [Unity][Contrib] Add vLLM paged attention kernel
#16303 - [Unity][NN] Use Linear name for nn.op.permute_dims
#16325 - [Unity][MSC][Legalize] legalize codes and mute logging
#16312 - [Unity][Analysis] Add utility for collecting compile-time bindings
#16330 - [Unity][WEBGPU] Enable wasm exception propagation
#16304 - [Unity][Analysis] Handle PrimStructInfo in EraseToWellDefined
#16305 - [Unity][Transform] Implement UpdateParamStructInfo
#16331 - [Unity] Alter op impl handling empty transform for output
#16254 - [Unity] Dispatch cumsum and sort
#16120 - [Unity][Transform] Extract partial-tuple-usage from FuseTIR
#16311 - [Unity] Validate struct info in relax::Call constructor
#16333 - [Unity] Fix nn.op.tensor_ir_op signature
#16302 - [Unity] Cutlass kernel compatibility with cmake 3.18+

Relay

#16622 - [ONNX] Fix the attribute mode parse of operator Upsample
#16626 - [ONNX] Fix the Resize operator in ONNX frontend
#16624 - [ONNX] fix the wrong default value about dtype in Multinomial converter
#16417 - [Frontend][Torch] fix pytorch frontend linspace op
#16400 - [Frontend][Torch] fix pytorch frontend not support logical or
#16390 - [Frontend][Torch] fix a typo mistake in nonzero_numpy
#16324 - make "ToScalar" support directly obtaining "int64_t"

Runtime

#16804 - Introduce MSCCLPP with NCCL equivalent interface
#16809 - Add "TVM_DLL" to NVTX header
#16750 - CUDA IPC Memory support and custom allreduce kernels
#16738 - [Refactor]Always specify device in allocator interface
#16716 - Ensure NDArray.CopyTo(Device) always sync
#16705 - Add TVM_DLL to memory manager functions
#16692 - PagedKVCache execute data copy on a separate stream
#16647 - [RPC] Fix FreeObject in minrpc server
#16667 - [Builtin] Using float32 accumulation in attention kernel
#16635 - [RPC] Enable RPCObjectRef over multi-hop RPC
#16630 - Add TVM_DLL to threading backend funcs
#16541 - Add "TVM_DLL" to NDArray cache load func
#16550 - [ROCM] Properly align rocm parameter buffer
#16545 - Fix dtype conversion for bf16 and fp8
#16508 - ParallelFor skipping thread backend for unit extent
#16486 - KV cache providing workspace for attn kernel
#16456 - [KVCache] AttentionWithFusedQKV and RoPE mode
#16415 - [Memory] Implement support for non-zero offset within a storage object in AllocNDArr…
#16387 - [RPC] Enable RPCObjectRef return in RPC
#16377 - Use cudaGetDeviceCount to check if device exists

TIR

#16832 - Use constructor for new PrimFunc in TransformLayout
#16543 - Fix segfaults from ordering of Let/Assert in MakePackedAPI
#16795 - Ramp and Broadcast lanes fixed to int32 dtype
#16767 - [Driver] Use BindTarget to specify target for FP8 legalization
#16742 - [Bugfix]Fix cache_read update buffer region
#16726 - [Bugfix]Avoid overwrite of unmanaged buffer allocations
#16548 - [CUDA] Add native FP8 support to codegen
#16723 - Implement max/min_value for fp8 data types
#16655 - Improve well-formed check's handling of match buffer
#16673 - Support Vector Reinterpret Calls
#16682 - [Bugfix]Handle AttrStmt of upcoming tir.Var in ConvertSSA
#16560 - Enhance and fix tensorize schedule for some case
#16660 - [Bugfix]Fix duplicate AllocateConst in CacheReadWrite schedule primitive
#16544 - Expand debug symbol output for CodeGenLLVM
#16553 - Fix get_block_access_region for let bindings
#16515 - Require exactly same-dtype matching for Vulkan smem reuse
#16406 - Fix of inter thread reduction with shared memory prefetch
#16293 - Extend DP4A tensor intrin
#16345 - Allow sync threads inside condition
#16250 - In SplitHostDevice, check for variables in thread extents
#16184 - [Transform] Implement InlinePrivateFunctions

TOPI

#16652 - improve inclusive_scan for thrust
#16383 - [Target] Add fp16 SIMD support for conv2d on arm_cpu targets

TVMC

#16261 - Add tvmc flag to print ir before and print ir after named pass

TVMScript

#16864 - Add parser and printer support for e4m3/e5m2 fp8
#16844 - Produce empty DictAttrs when R.func_attrs is absent
#16811 - Do not throw error for duplicate definitions
#16641 - Allow use of relax.Expr with void type as a statement
#16663 - Infer T.reads() for DeclBuffer nodes
#16640 - Represent tir::builtin::ret() using python "return"
#16562 - [Bugfix]Handle R.match_cast as last binding in if/else
#16593 - [Unity]Parse R.Object return type from call_pure_packed
#16356 - [Unity]Optionally hide StructInfo that can be inferred
#16379 - [Unity]Update call_packed semantics to support empty sinfo_args

Vulkan

#16858 - Fix CLZ support for Vulkan

cuda & cutlass & tensorrt

#16865 - [Codegen, CUDA] Add handling of fp8 broadcast / const
#16818 - [Cutlass] Fix usage of cuda stream for group gemm
#16788 - [Cutlass] Add check for group gemm param shapes
#16789 - [Bugfix][Cutlass] Remove a typo in cutlass build
#16787 - [Codegen, Cuda] Add overload for fp8x4 e5m2 <-> half4 conversion
#16751 - [Cutlass] Add group gemm kernels
#16736 - [Target][CUDA] Allow non-numeric arch as needed for latest gpu
#16619 - [Bugfix][Cutlass] Check if function attributes is None
#16342 - [CUDA] Simple extend to optimize reuse for static shared memory.
#16342 - [CUDA] Simple extend to optimize reuse for static shared memory.
#16342 - [CUDA] Simple extend to optimize reuse for static shared memory.
#16342 - [CUDA] Simple extend to optimize reuse for static shared memory.
#16342 - [CUDA] Simple extend to optimize reuse for static shared memory.

micoNPU

#16266 - [microNPU][ETHOSU] Add fixed point for tanh
#16680 - [microNPU][ETHOSU] Fix LUT size for int16 activations
#16401 - [microNPU][ETHOSU] Add fixed point for matmul

web

#16733 - Support web indexDB cache for larger model storage
#16810 - Support building tvm/web on Windows
#16825 - Allow custom bc files in emcc making
#16791 - Add kv_state and rnn_state to wasm_runtime
#16722 - Implement linear congruential generator, make runtime seedable
#16650 - Seperate parallel shard download and iterative shard loading
#16694 - Initial support for asyncify
#16631 - Fix NDArrayCache loading report callback
#16525 - Move ArtifactCache to Interface, Support Cache delete and Batch Delete, Remove typo
#16554 - Compatibility with PagedKVCache in WebGPU
#16527 - Revert "[Unity]Temp disable wasm exception (#16444)"
#16504 - [Relax]Add ApplyPresenceAndRequencyPenalty
#16485 - [wasm] Enlarge initial memory for emcc
#16444 - [Unity]Temp disable wasm exception

Misc

#16873 - [Thrust] Fix thrust workspace allocation
#16868 - [3rdparty] Bump flashinfer
#16871 - [PageKV] allow PopN to pop all the tokens in last block
#16866 - [3rdparty] Bump FlashInfer
#16863 - [Picojson] Let the key of objects in json be ordered by default
#16856 - [Thrust] Use pointer to tls pool to prevent creating new pool
#16850 - Fixing probability comment
#16849 - [KVCache] Initialize one extra page than specified
#16843 - [IR] Provide well-formed intermediate in ApplyPassToFunction
#16772 - [MSC][M5.3] Support torch.dynamo for dynamic models
#16839 - Bump pillow from 10.2.0 to 10.3.0 in /apps/microtvm/cmsisnn
#16838 - Bump pillow from 10.2.0 to 10.3.0 in /apps/microtvm/ethosu
#16831 - [KVCache] Reducing CacheAuxDataManager copy size
#16794 - [SME] Target parser support for SME
#16824 - [KVCache] Introducing auxiliary data manager
#16800 - [BugTIR]fix error merging shared memory for ptx_cp_async
#16822 - [VM] Recycle VMFrame
#16813 - [KVCache] Support forking sequence at specific posotion
#16786 - [Codegen] Add check to disable invalid reinterpret
#16816 - [Cmake] Allow using custom CCCL path for thrust
#16784 - [SLM] Add unit tests for SLM to Relax exporter
#16814 - Fix includes of custom allreduce kernel
#16806 - [Debug] Improve error message in VMShapeLower
#16802 - [Debug] Improve error messages in LiftTransformParams
#16425 - [Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features
#16797 - [3rdparty] AUTO mode for custom all-reduce strategy
#16761 - [SME] Add support for inserting processor state annotations
#16778 - [Analysis] Allow calls to GlobalVar in @R.function
#16745 - [IR] Default to empty attributes, instead of NULL
#16777 - Revert "[SLM] Allow modules to define pre-processing of weights"
#16776 - [Contrib] Remove thrust "built but not used" warning
#16757 - [SLM] Allow modules to define pre-processing of weights
#16763 - [CONTRIB] Add nm symbol dump
#16717 - Enable Shared Function in LiftTransformParam Pass
#16729 - [Builtin] Sliding window and sink support for PagedKVCache
#16724 - Fix cpp_rtvm cmake build on Windows
#16513 - [Target] Automatically detect system triple when not specified by the user
#16710 - [CMake] Add "USE_FLASHINFER" to libinfo
#16702 - [MSC][M5.2] Enable quantize && prune with gym by wrapper
#16699 - [Transform] Remove R.Object parameters after LazyTransformParams
#16668 - [MSC][M5.1] Build wrapper to support compression
#16693 - [Contrib] Support NDArray cache taking generator
#16412 - [Lint] Add check to prevent usage of #include <regex>
#16689 - [DeviceAPI] Support "GetCurrentStream"
#16690 - Use target name instead of node name as function name
#16683 - [skip ci] Fix wasm exception flag
#16609 - Minor update docs instructions
#16656 - Simplify Windows CMake Command
#16666 - [KVCache] Fix the reference counter in sequence fork
#16662 - Fixing workload comment
#16595 - [Transform] Check for zero-param operators in LiftTransformParams
#16599 - [Transform] De-duplicate MatchCast nodes in EliminateCommonSubexpr
#16596 - [Transform] Implement relax.transform.ReorderPermuteDimsAfterConcat
#16597 - [Transform] Allow explicit name of bundled model parameters
#16602 - [Transform] Improvements to LazyTransformParams
#16606 - [KVCache] Support passing in attn_score_scaling_factor into KV cache
#16608 - Extend gpu memory bandwidth test to work through RPC
#16587 - [Debug] Improve error message for codegen pattern mismatches
#16570 - [Marvell BYOC]: Marvell AI Accelerator Integration - Phase 1
#16576 - Update the 3rdparty/libflash_attn submodule
#16580 - [KVCache] Support mode "None" for Rotary Embebdding
#16578 - [KVCache] Support returning query positions
#16571 - Fix compile warnings
#16540 - [Upd] Enable lld search to include /opt/rocm/llvm/bin for rocm
#16539 - Improve error message in NDArray::CopyFromTo
#16524 - [Build] Improving debug and build-dir options
#16551 - [KVCache] Fix attention kernel for ROCm
#16512 - Cut pytest-lazy-fixture
#16506 - Bump 3rdparty/cutlass_fpA_intB_gemm version
#16511 - [Minor] Fix Clang compilation warning in fuse_tir.cc and codegen_c_host.cc
#16516 - Add Relax, Unity Tags in make_notes.py
#16497 - [Instrument] Add default instrument to print all passes
#16494 - [DPL] Support tir_vars field in is_call_tir pattern
#16453 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm
#16454 - [BugTIR] fix thread_sync occurs in letstmt
#16468 - [LINT] Fix pylint issues in test_dma_builtin.py
#16413 - [Contrib] Workspace for cuBLAS backend
#16460 - [Cherry-pick][MSC][M4.1] Add plugin && plugin_builder, enable build and test in different frameworks (#16397)
#16461 - [Minor] Fix Docstring for sphinx-build
#16431 - [Schedule] Loop-Partition Scheduling Primitive
#16451 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/ethosu
#16452 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/cmsisnn
#16445 - [skip ci] update branch rule to prepare for unity transition
#16426 - [CMake] Enable cuda lang if USE_CUDA is on
#16407 - Add NVIDIA Hopper H100 target tag
#16398 - [DeviceAPI] Support querying total global memory
#16357 - [RPC] Fix tuning on macOS and Windows (#15771)
#16386 - [Thrust] Use no sync exec policy and caching allocator
#16343 - [CMake][MSVC] Disable permissive mode for MSVC builds
#16242 - [Codegen] Fix if_then_else codegen
#16341 - [CMake] Use ccache as CMAKE_CUDA_COMPILER_LAUNCHER
#16332 - Change metal dtype of ceil_log2 to fp32

Source: README.md, updated 2024-04-13

tvm Files

Open deep learning compiler stack for cpu, gpu, etc.

Introduction

Community

* #107 - [RFC] Scalable Matrix Extension enablement

Arith

BYOC

BugFix

CI

Docker

Disco

Dlight

Docs

Frontend

Hexagon

LLVM

MetaSchedule

Metal

OpenCL & CLML

ROCm

Relax

Relay

Runtime

TIR

TOPI

TVMC

TVMScript

Vulkan

cuda & cutlass & tensorrt

micoNPU

web

Misc