TurboTransformers is a high-performance inference framework optimized for running Transformer models efficiently on CPUs and GPUs. It improves latency and throughput for NLP applications.
Features
- Optimized for low-latency Transformer model inference
- Supports both CPU and GPU acceleration
- Works with popular Transformer models like BERT and GPT
- Implements kernel fusion for performance optimization
- Compatible with PyTorch and TensorFlow
- Provides quantization support for lower memory usage
License
BSD LicenseFollow TurboTransformers
Other Useful Business Software
Go From AI Idea to AI App Fast
Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of TurboTransformers!