inference free download

Text Generation Inference

Large Language Model Text Generation Inference

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Downloads: 5 This Week

Last Update: 2025-12-18

See Project

ModelScope

Bring the notion of Model-as-a-Service to life

...Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of code.

Downloads: 7 This Week

Last Update: 2026-04-11

See Project

DeepSparse

Sparsity-aware deep learning inference runtime for CPUs

A sparsity-aware enterprise inferencing system for AI models on CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).

Downloads: 0 This Week

Last Update: 2025-06-02

See Project

SetFit

Efficient few-shot learning with Sentence Transformers

SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers. It achieves high accuracy with little labeled data - for instance, with only 8 labeled examples per class on the Customer Reviews sentiment dataset, SetFit is competitive with fine-tuning RoBERTa Large on the full training set of 3k examples.

Downloads: 1 This Week

Last Update: 2025-08-05

See Project

SparseML

Libraries for applying sparsification recipes to neural networks

SparseML is an optimization toolkit for training and deploying deep learning models using sparsification techniques like pruning and quantization to improve efficiency.

Downloads: 0 This Week

Last Update: 2025-06-02

See Project

NNCF

Neural Network Compression Framework for enhanced OpenVINO

NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.

Downloads: 0 This Week

Last Update: 2026-04-08

See Project

API-for-Open-LLM

Openai style api for open large language models

API-for-Open-LLM is a lightweight API server designed for deploying and serving open large language models (LLMs), offering a simple way to integrate LLMs into applications.

Downloads: 0 This Week

Last Update: 2025-01-22

See Project

Open Interpreter

A natural language interface for computers

...It lets large language models (LLMs) run code locally (Python, JavaScript, shell, etc.), enabling you to ask your computer to do tasks like data analysis, file manipulation, browsing, etc. in human terms (“chat with your computer”), with safeguards. Runs locally or via configured remote LLM servers/inference backends, giving flexibility to use models you trust or have locally. It prompts you to approve code before executing, and supports both online LLM models and local inference servers. It seeks to combine convenience (like ChatGPT’s code interpreter) with control and flexibility by running on your own machine.

Downloads: 15 This Week

Last Update: 2025-09-12

See Project

Adapters

A Unified Library for Parameter-Efficient Learning

Adapters is an add-on library to HuggingFace's Transformers, integrating 10+ adapter methods into 20+ state-of-the-art Transformer models with minimal coding overhead for training and inference. Adapters provide a unified interface for efficient fine-tuning and modular transfer learning, supporting a myriad of features like full-precision or quantized training (e.g. Q-LoRA, Q-Bottleneck Adapters, or Q-PrefixTuning), adapter merging via task arithmetics or the composition of multiple adapters via composition blocks, allowing advanced research in parameter-efficient transfer learning for NLP tasks.

Downloads: 0 This Week

Last Update: 2025-05-20

See Project

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis

AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.

Downloads: 5 This Week

Last Update: 2025-01-21

See Project

KoGPT

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT is a Korean language model based on OpenAI’s GPT architecture, designed for various natural language processing (NLP) tasks such as text generation, summarization, and dialogue systems.

Downloads: 1 This Week

Last Update: 2025-01-24

See Project

TextBrewer

A PyTorch-based knowledge distillation toolkit

...It includes various distillation techniques from both NLP and CV field and provides an easy-to-use distillation framework, which allows users to quickly experiment with the state-of-the-art distillation methods to compress the model with a relatively small sacrifice in the performance, increasing the inference speed and reducing the memory usage.

Downloads: 0 This Week

Last Update: 2025-01-22

See Project

NLP Architect

A model library for exploring state-of-the-art deep learning

NLP Architect is an open-source Python library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing and Natural Language Understanding neural networks. The library includes our past and ongoing NLP research and development efforts as part of Intel AI Lab. NLP Architect is designed to be flexible for adding new models, neural network components, data handling methods, and for easy training and running models. NLP Architect is a...

Downloads: 0 This Week

Last Update: 2022-08-05

See Project

PyText

A natural language modeling framework based on PyTorch

...It achieves this by providing simple and extensible interfaces and abstractions for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We use PyText at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale. Distributed-training support built on the new C10d backend in PyTorch 1.0. Mixed precision training support through APEX (trains faster with less GPU memory on NVIDIA Tensor Cores). ...

Downloads: 0 This Week

Last Update: 2021-08-31

See Project

PyTorch Natural Language Processing

Basic Utilities for PyTorch Natural Language Processing (NLP)

...With your batch in hand, you can use PyTorch to develop and train your model using gradient descent. For example, check out this example code for training on the Stanford Natural Language Inference (SNLI) Corpus. Now you've setup your pipeline, you may want to ensure that some functions run deterministically. Wrap any code that's random, with fork_rng and you'll be good to go. Now that you've computed your vocabulary, you may want to make use of pre-trained word vectors to set your embeddings.

Downloads: 2 This Week

Last Update: 2022-08-09

See Project

InferSent

InferSent sentence embeddings

InferSent is a supervised sentence embedding method that learns universal representations from Natural Language Inference data and transfers well to many downstream tasks. It uses a BiLSTM encoder with max-pooling to produce fixed-length sentence vectors that capture semantics beyond bag-of-words statistics. Trained on large NLI datasets, the embeddings generalize across tasks like sentiment analysis, entailment, paraphrase detection, and semantic similarity with simple linear classifiers. ...

Downloads: 0 This Week

Last Update: 2025-10-07

See Project

Search Results for "inference"

Showing 16 open source projects for "inference"

Text Generation Inference

ModelScope

DeepSparse

SetFit

SparseML

NNCF

API-for-Open-LLM

Open Interpreter

Adapters

AutoGPTQ

KoGPT

TextBrewer

NLP Architect

PyText

PyTorch Natural Language Processing

InferSent

Search Results for "inference"

Showing 16 open source projects for "inference"

Text Generation Inference

ModelScope

DeepSparse

SetFit

SparseML

NNCF

API-for-Open-LLM

Open Interpreter

Adapters

AutoGPTQ

KoGPT

TextBrewer

NLP Architect

PyText

PyTorch Natural Language Processing

InferSent

Related Searches

Related Categories