Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2021-10-01 | 2.2 kB | |
v1.2.1 source code.tar.gz | 2021-10-01 | 65.2 MB | |
v1.2.1 source code.zip | 2021-10-01 | 66.8 MB | |
Totals: 3 Items | 132.0 MB | 0 |
Added
- Support more graph optimizations : Convolution+Convolution, LayerNorm
- Support more operators: ROIAlign, GenerateProposals, Reciprocal, Not, Log, ReductionL2, InstanceNorm, Expand, Gather, Scatter
- Support more operators(PReLU) process NCHW input data.
- Support ONNX share weight between Linear, MatMul, Gemm and Gather
- Support more networks on CPU: vision transformer(ViT, TNT), recommendation networks
- Support more networks on GPU : ASR, Faster_RCNN
- Support Armv7 int8 to accelerate NLP network(50%+ speed-up)
- Support X86 AVX512 int8 to accelerate NLP network(3x+ speed-up)
- Support using image on Qualcomm GPU, add GPU image manage methods
- Improve inference performance on Qualcomm GPU
- Add more kit android/iOS demos : Chinese ASR, Face Detection, Sentiment Analysis
- Try to bind core when using GPU
Changed
- Replace mali option with gpu in install shell script, and remove default target option setting
- Change data format NCWHC4 TO NCHWC4 for GPU
- Simplified tensor padding method with OclMemory for GPU
- Tool preprocess_ocl produces algofile and xxxlib.so before, for now algofile has been packaged into this xxxlib.so
- Add BNN_FP16 option in X2bolt tool to convert ONNX 1-bit model
- Replace original INT8 option with INT8_FP16 in post_training_quantization tool to convert int8+float16 hybrid inference model, and add INT8_FP32 option to convert int8+float32 hybrid inference model.
- Add shell environment variable BOLT_INT8_STORAGE_ERROR_THRESHOLD to control post_training_quantization convert int8 model, default value is 0.002. post_training_quantization will use int8 storage when when quantization error lower than BOLT_INT8_STORAGE_ERROR_THRESHOLD.
Fixed
- Fix PReLU 2d, 3d support
- Fix Resize bug on some mode
- Fix ONNX converter read Squeeze, UnSqueeze, Deconv parameter bug
- Fix Arm Sigmoid precision
- Fix ONNX RNN optimizer, and add support for NCHWC8 input data
- Fix Concat with weight tensor in onnx converter
- Simplify C API example