gpu max performance free download

Text Generation Inference

Large Language Model Text Generation Inference

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Downloads: 1 This Week

Last Update: 2025-12-18

See Project

Colossal-AI

Making large AI models cheaper, faster and more accessible

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine.

Downloads: 1 This Week

Last Update: 2025-05-28

See Project

OpenVINO

OpenVINO™ Toolkit repository

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime, Post-Training Optimization Tool, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. ...

Downloads: 30 This Week

Last Update: 2026-03-25

See Project

Chinese-LLaMA-Alpaca 2

Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project

This project is developed based on the commercially available large model Llama-2 released by Meta. It is the second phase of the Chinese LLaMA&Alpaca large model project. The Chinese LLaMA-2 base model and the Alpaca-2 instruction fine-tuning large model are open-sourced. These models expand and optimize the Chinese vocabulary on the basis of the original Llama-2, use large-scale Chinese data for incremental pre-training, and further improve the basic semantics and command understanding of...

Downloads: 0 This Week

Last Update: 2024-01-23

See Project

TurboTransformers

Fast and user-friendly runtime for transformer inference

TurboTransformers is a high-performance inference framework optimized for running Transformer models efficiently on CPUs and GPUs. It improves latency and throughput for NLP applications.

Downloads: 0 This Week

Last Update: 2025-01-24

See Project

Search Results for "gpu max performance"

Showing 5 open source projects for "gpu max performance"

Text Generation Inference

Colossal-AI

OpenVINO

Chinese-LLaMA-Alpaca 2

TurboTransformers

Search Results for "gpu max performance"

Showing 5 open source projects for "gpu max performance"

Text Generation Inference

Colossal-AI

OpenVINO

Chinese-LLaMA-Alpaca 2

TurboTransformers

Related Searches

Related Categories