NNVM
Open deep learning compiler stack for cpu, gpu
... and minimal runtimes commonly unlock ML workloads on existing hardware. Automatically generate and optimize tensor operators on more backends. Need support for block sparsity, quantization (1,2,4,8 bit integers, posit), random forests/classical ML, memory planning, MISRA-C compatibility, Python prototyping or all of the above? NNVM flexible design enables all of these things and more.