Home / v0.14.0
Name Modified Size InfoDownloads / Week
Parent folder
apache-tvm-src-v0.14.0.tar.gz 2023-10-23 68.4 MB
apache-tvm-src-v0.14.0.tar.gz.asc 2023-10-23 833 Bytes
apache-tvm-src-v0.14.0.tar.gz.sha512 2023-10-23 164 Bytes
Apache TVM v0.14.0 source code.tar.gz 2023-10-18 8.7 MB
Apache TVM v0.14.0 source code.zip 2023-10-18 13.4 MB
README.md 2023-10-18 27.6 kB
Totals: 6 Items   90.5 MB 0

Introduction

The TVM community has worked since the v0.13.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFC
  • Arith, MetaSchedule
  • Adreno, ArmComputeLibrary, Hexagon, Metal, OpenCL & CLML, ROCm, Vulkan, cuda & cutlass & tensorrt, micoNPU, web
  • Runtime, TVMC, AOT, LLVM, microTVM, CMSIS-NN
  • Frontend, Relay, BYOC
  • TOPI, TIR, TVMScript
  • Docs, CI, Docker
  • Misc, , BugFix

Please visit the full listing of commits for a complete view: v0.13.0...v0.14.0.

Community

  • #15307 - Qingchao Shen -> Reviewer
  • #15619 - community strategy decision process

RFC


AOT

  • #15301 - Avoid call_extern() with incorrect argument count
  • #15181 - Remove workaround to help resolve test flakiness

Adreno

  • #15830 - Minor changes for Adreno docs and help scripts
  • #15671 - [VM]Fix using buffers for weights in VM
  • #15391 - Small fixes in Adreno schedules

Arith

  • #15881 - Simplify the result of non-divisible floordiv
  • #15665 - Fix detect non-divisible iteration form like (x % 255) // 16
  • #15638 - MLIR PresburgerSet compile fix mlir >= 160
  • #15628 - Added simplification rule for multiple equality compares
  • #15558 - Fix detect linear equation with uint var
  • #14690 - Add tvm::arith::PresburgerSetNode to work with Presburger Set in MLIR
  • #15555 - Fix handling of overlapping predicates
  • #15471 - Enhance Canonical Simplify for LE
  • #15228 - Enhance buffer shape bound deduction to include offset

ArmComputeLibrary

  • #15600 - [ACL] Update Compute Library to v23.05.1
  • #15344 - [ACL] Update Compute Library to v23.05

BugFix

  • #15891 - [Relay]fix axis parsing of repeat converter in the MXNet frontend
  • #15873 - [Fix] Remove duplicated words from comments, NFC
  • #15868 - [Relay]Fix conv transpose with default strides in ONNX frontend
  • #15773 - [CPP] Fix cpp deploy bug
  • #15778 - [Hotfix] Fix Windows Pipe
  • #15748 - Move symbols that are relevant to the runtime from libtvm to…
  • #15752 - [Relay]fix the wrong calculate logic of operator flip in PyTorch frontend
  • #15715 - [Relay]Fix the wrong implementation about operator Threshold in oneflow
  • #15711 - [Strategy] Fix arm_cpu int8 conv2d strategy for dotprod and i8mm targets
  • #15717 - [Relay]fix the wrong implementation of Softplus in OneFlow
  • #15677 - [Arith] IterMapRewriter abort rewriting once failure
  • #15629 - [VTA] tvm.tir.Call has no name attribute
  • #15584 - [Relay][Strategy] Enable compile time transformation of weights matrix for arm_cpu NHWC quantized conv2d
  • #15542 - [Fix] Fix the typo in compile flag
  • #15484 - [TOPI] Fix a bug in arm_cpu int8 conv2d i8mm schedule
  • #15473 - [Relay] Fix some bugs of dominator pattern
  • #15478 - [TIR] ThreadSync with shared.dyn awareness
  • #15406 - [TIR]Ensure the Var's scope is correct
  • #15399 - [TIR] Fix multi-grouped multi-warp allreduce
  • #15350 - [Relay] fix a bug of printing dataflow pattern
  • #15385 - Work around "Internal Compiler Error" in MSVC
  • #15294 - [Bug][Relay] fix relay frontend pytorch op addmm bug
  • #15323 - [Fix][TIR] LowerThreadAllreduce with correct thread mask
  • #15291 - [Relay][GraphExecutor] Fix set_input_zero_copy() precision bug
  • #15225 - Fix function to read all file

CI

  • #15903 - [Target]Add LLVM functions for current system info
  • #15897 - [ADRENO] Few updates to Adreno docker setup
  • #15836 - Update ci-gpu image
  • #15668 - Allow Limit CPUs in Docker
  • #15568 - [Testing] Allow Capitalized name in CompareBeforeAfter
  • #15519 - [TEST] Run tests/python/relay/aot tests in ci-cortexm
  • #15485 - Remove cython version pin
  • #15421 - Bump Flax and Jaxlib versions to fix Jaxlib install error
  • #15226 - Add ml_dypes dependency for all docker images
  • #15353 - Pin cython version to fix cython compilation
  • #15352 - Make Graviton3 default AArch64 job runner node
  • #15339 - Update test to include unique attribute
  • #15277 - [Testing] Return BenchmarkResult in local_run and rpc_run
  • #15268 - [Testing] Add tvm.testing.local_run
  • #15136 - [UnitTest][NVPTX] Avoid cascading failures from CUDA postproc

CMSIS-NN

  • #15747 - Move CMSIS_5 from SHA to release based upgrade
  • #15407 - Support for Softmax Int16 operator

Docker

  • #15799 - Add LLVM 17 to the LLVM install script
  • #15862 - Upgrade oneflow to v0.8.0
  • #15819 - Install oneflow from PyPi
  • #15310 - Update ci-cortexm docker image
  • #15293 - tensorflow_aarch64 package upgrade

Docs

  • #15619 - community strategy decision process
  • #15508 - Add v0.13.0 docs to site
  • #15213 - [#15157][Rust][Doc] Re-enable the Rust documentation build

Frontend

  • #15821 - [TFLite]Support quantized ELU
  • #15844 - [TFLite]Fix test failures caused by div-by-zero
  • #15798 - [TFLite]Support quantized Pow
  • #15829 - [Relay][Keras][Bugfix] fix the converters of GRU and SimpleRNN about the go_backwards attribute
  • #15838 - Fix unnecessary pylint errors
  • #15802 - [SkipCI][Hotfix][TFLite] Disable test of quantized floor mod
  • #15790 - [TFLite]Support quantized LESS_EQUAL
  • #15775 - [TFLite]Support quantized GREATER_EQUAL
  • #15769 - [TFLite]Support quantized NOT_EQUAL
  • #15768 - [TFLite]Support quantized div
  • #15746 - [TFLite]Support quantized LESS
  • #15733 - [TFLite]Support quantized floor_mod
  • #15724 - [TFLite]Support quantized floor_div
  • #15602 - [ONNX][BugFix] Support If body with free variable from graph input
  • #15472 - [Relay][TFLite] Fix in qnn.conv2d when parameter groups not equal to 1
  • #15117 - [TFLITE] Add support for TFLite's regular NMS operator
  • #15415 - [ONNX] add onnx Mish operator
  • #15422 - [Keras] Add support for swish actiivation
  • #15370 - [Relay][Pytorch] Add aten::view_as
  • #15335 - [Bugfix][Keras] Add a check to reject the invalid input shape
  • #15334 - [Bugfix][Relay][Keras] Add a assertion to reject a invalid value for attribute units in RNN layers
  • #15337 - [Bugfix][Keras]Fix a corner case bug in softmax converter of keras frontend
  • #15259 - [TFLITE][BugFix] Fix variable typo in batchmatmul converting func
  • #15261 - [bugfix][keras] Fix go_backwards attribute of LSTM in keras frontend

Hexagon

  • #15788 - Properly handle RPC server shutdown
  • #15599 - F2qi avgpool bug fix
  • #15414 - Add default vtcm capacity for targets
  • #15367 - Simplify Mul->Sub->Conv to Conv->Add when possible
  • #15258 - Propagate QNN Concat Quantization Params to Inputs

LLVM

  • #15921 - Fix for llvm CodeGenOpt API change

MetaSchedule

  • #15792 - Allow generating uint random data
  • #15574 - Fix metaschedule flop estimation for non-integer loop dimensions
  • #15532 - Enable subprocess to stdout for DEBUG level
  • #15437 - Fix mma default rule and disable tuning abort
  • #15133 - [XGBoost,MetaSchedule] Support xgb set tree method

Metal

  • #15756 - [Unittest]Add minimal metal functionality test to CI
  • #15749 - [UnitTest]Parametrize allreduce GPU tests
  • #15401 - [Codegen]Support metal warp-level primitive

OpenCL & CLML

  • #15745 - [OpenCL] Don't initialize OpenCL runtime on host
  • #15400 - [VM][OpenCL] Introduce textures allocation to VM memory manager

ROCm

  • #15777 - [Codegen]Mismatched Dtype of Workgroup/Workitem
  • #15464 - fma intrin
  • #15454 - Fix some ROCm codegen bugs

Relay

  • #15889 - fix the conflicted documentation description
  • #15648 - [TOPI] Remove input padding for arm_cpu conv2d int8 native schedule in Legalize pass
  • #15386 - Fix an adaptive_max_pool1d operator conversion bug
  • #15533 - Disable exception for ADT in mixed precision pass
  • #15506 - [Strategy] Use x86 pool schedules for arm_cpu
  • #15470 - [Strategy] Use x86 dense schedules for arm_cpu
  • #15392 - add redirecting operation to dataflow pattern graph
  • #15468 - [Strategy] Fix arm_cpu int8 conv2d schedule selection for 32-bit targets
  • #15461 - Stop ToMixedPrecision when constant is out of dtype range
  • #15362 - improve SimplifyClipAndConsecutiveCast pass
  • #15137 - Introduce arguments limit to FuseOps pass
  • #15211 - Fix bug in MergeCompilerRegions pass
  • #15237 - ExprMutator Return Origin Expr When All Fields Isn't Changed
  • #15235 - [QNN] Support Dequantize to "float16" and Quantize to "uint16"

Runtime

  • #15693 - Make CSourceModule and StaticLibraryModule Binary Serializable
  • #15658 - Make export_library parameters after file_name keyword-only
  • #15637 - [Backport]Fix ICE from Clang
  • #15244 - Serialization/Deserialization of runtime module
  • #15630 - Utils to Stringify Device
  • #15623 - Expose ModuleGetFunction as PackedFunc
  • #15595 - Enhance PackedFunc Metaprogramming with PackArgs
  • #15543 - [Minor] Suppress verbose logging in Metal device API
  • #15305 - Flush L2 cache in time eval
  • #15332 - Device API to query L2 cache size

TIR

  • #15913 - Fix offset_factor in cuda tensor core intrins
  • #15906 - Fix the error example in the documentation for pad_einsum
  • #15816 - Revert "[TensorIR][Visitor] Visit buffer members in match_buffer's in block visitor functions (#15153)
  • #15763 - Do not drop 4th argument to tir.max
  • #15646 - Output DeclBuffer in LowerThreadAllreduce
  • #15493 - Output DeclBuffer in SplitHostDevice
  • #15517 - Shuffle in PointerValueTypeRewrite for scalar reads
  • #15263 - Output DeclBuffer in MakePackedAPI
  • #15465 - [TIR, Schedule] Fix decompose reduction with thread binding loops
  • #15432 - Generalize implementation of T.macro to work with other dialects
  • #15413 - Fix Primitive Rfactor DType
  • #15404 - Allow starred expressions in TIR script
  • #15374 - Finer predicate handling in cross-thread reduction
  • #15373 - Allreduce broadcast result to each thread in multi-warp case
  • #15214 - [UX] Implement privacy annotations in TIR
  • #15241 - Return error code from kernels in SplitHostDevice
  • #15327 - ThreadAllreduce warp-level primitive support with multi-warp
  • #15260 - Implement TIR macros
  • #15253 - Call TVMBackendFreeWorkspace inside LetStmt
  • #15264 - Allow symbolic bounds in IndexMap analysis
  • #15243 - Output DeclBuffer in LowerTVMBuiltin
  • #15236 - [Schedule] Scoped CacheRead/Write producing compact region
  • #15242 - Preserve AllocateNode::annotations
  • #15247 - Allow VerifyWellFormed to accept IRModule
  • #15192 - Support cross-threaad reduction lowering with thread-broadcasting rewrite
  • #15210 - [Schedule] Derive Nonnegative Bounds from Shape Var
  • #15207 - [Transform] Add LiftThreadBinding Pass

TOPI

  • #15685 - [Target]Use LLVM for x86 CPU feature lookup
  • #15710 - Ensure vectorization of input padding in arm_cpu int8 conv2d interleaved schedule
  • #15513 - check empty array of x86 injective's iters
  • #15371 - Revert "Add arm_cpu specific pooling schedules"
  • #15311 - Add arm_cpu specific pooling schedules
  • #15286 - Revert "Add arm_cpu specific pooling schedules"
  • #14855 - Add arm_cpu specific pooling schedules

TVMC

  • #15779 - enable dumping imported modules too
  • #15349 - Add tvmc flag to print compilation time per pass

TVMScript

  • #15824 - Preserve traceback across TVMScript parsing
  • #15762 - Use environment variable TVM_BLACK_FORMAT for .show()
  • #15706 - Disable black_format by default
  • #15705 - [FIX] Disable show_object_address in printing by default
  • #15579 - Optionally output the address as part of variable names
  • #15564 - Use triple-quoted python strings for metadata
  • #15547 - Create loop var with min_val dtype in for frame
  • #15492 - Allow use of Python builtins in script
  • #15442 - Support starred indices in for-loop
  • #15249 - Ensure completed root block has no read/write
  • #15239 - Handle parsing of PrimFunc calls with non-void return

cuda & cutlass & tensorrt

  • #15573 - [CUTLASS][Cherry-pick] Introduce several features of cutlass profiler
  • #15480 - [Bugfix][CUTLASS] CUTLASS path finding

micoNPU

  • #15780 - [microNPU][ETHOSU] MatMul legalization support
  • #15428 - [microNPU][ETHOSU] Fix concatenation with reused buffers
  • #14909 - [ETHOSU][MicroNPU][Pass] Add a pass to replicate pads
  • #15186 - [microNPU][ETHOSU] Add Vela's logic to select configuration block

microTVM

  • #15667 - Check the output of microNPU demos in CI

web

  • #15218 - Increase default EMCC compilation total memory size

Misc

  • #15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
  • #15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
  • #15847 - [release] Update version to 0.14.0 and 0.15.0.dev on main branch
  • #15867 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/ethosu
  • #15866 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/cmsisnn
  • #15865 - Bump pillow from 9.2.0 to 10.0.1 in /apps/microtvm
  • #15833 - [VM] Memory Manager moved up to runtime
  • #15859 - [Script] Fix miscs of make_notes.py
  • #15818 - [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
  • #15761 - [Target] LLVM helper functions for any target info
  • #15672 - [IR] Implemented Variant<...> container
  • #15714 - [Target][Device] Auto detect target and create device from str in torch style
  • #15723 - fix _convert_simple_rnn
  • #15725 - Revert "[CodeGenC] Handle GlobalVar callee as internal function call"
  • #15684 - [Hopper TMA] Add intrinsic to create barriers for synchronization
  • #15683 - Fix a bug caused by PyTorch instance_norm when the input shape is [1,1,1,2]
  • #15596 - [FFI] Propagate Python errors across FFI boundaries
  • #15666 - [Module] Implement custom imported modules serialization
  • #15656 - [Hopper TMA] Add CUDA codegen support for bulk asynchronous copy
  • #15664 - [IR] Use structural equal for Range equality
  • #15649 - Add output_data_sec section in corstone300.ld
  • #15639 - Do not link LLVM libraries into cpptest binary
  • #15631 - [RPC] Enhance RPC Protocol to support TVM Object
  • #15624 - [CMake] Add RCCL to TVM and TVM Runtime
  • #15616 - [Hopper TMA] CUDA codegen for async copy with barrier synchronization
  • #15537 - [CPP_RPC] export listdir for RPC
  • #15605 - [CMake] Add NCCL to TVM and TVM Runtime
  • #15580 - Fix "to" duplicate word in python and C header file
  • #15581 - Remove duplicate load word inside .cc file
  • #15582 - Remove duplicate 'from' word inside python script
  • #15554 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm
  • #15552 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/ethosu
  • #15553 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/cmsisnn
  • #15536 - fixed typo [TypoFix]
  • #15529 - [quantize] fix bug of annotate for output of add op
  • #15535 - Fixed search task comment
  • #15530 - Remove duplicate msg word and condition inside the function doc
  • #15511 - Remove IRModule Dependency from Target
  • #15525 - Fix typo mistake and change whethe to whether
  • #15524 - Remove duplicate the word
  • #15103 - [CodeGenC] Handle GlobalVar callee as internal function call
  • #15419 - [VM][Textures] Enable OpenCL textures for VM
  • #15483 - [Script] Be more careful when generating ast.ExtSlice for Subscript
  • #15469 - [CYTHON] Make cython compatible with 3.0
  • #15423 - [Submodule] Add Flash attention v2
  • #15380 - [Target] Add Jetson Orin Nano tag
  • #15359 - [CMAKE] Conditionally link "clog" in NNPack install
  • #15326 - [OP] Add rms_norm into TOPI
  • #15312 - [skipci] Fix typo in docs/arch/index.rst
  • #15298 - [Release] Extend PR tags and Format PR hyper-links in release report
  • #15328 - [Package] Remove cutlass media/docs inside cutlass_fpA_intB_gemm
  • #15321 - [JVM] Fix the Maven pom.xml for OS X arm64 tvm4j build
  • #15265 - Fix keras version problem
  • #15292 - [RPC] Fix socket bind errno on corner case
  • #15287 - [Exec] Add a script to test GPU memory bandwidth
  • #15234 - [Miscs] Enhance script about make release notes
  • #15229 - [CMAKE] Add Vulkan header for Android
  • #15215 - [Android] ndk static build
  • #15208 - Update version to 0.14.dev0 on main branch
Source: README.md, updated 2023-10-18