Page 5 | inference free download

llama2-webui

Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere

Running Llama 2 with gradio web UI on GPU or CPU from anywhere (Linux/Windows/Mac).

Downloads: 0 This Week

Last Update: 2023-10-04

See Project

Petals

Run 100B+ language models at home, BitTorrent-style

Run 100B+ language models at home, BitTorrent‑style. Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning. Single-batch inference runs at ≈ 1 sec per step (token) — up to 10x faster than offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec. Beyond classic language model APIs — you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. ...

Downloads: 1 This Week

Last Update: 2023-09-06

See Project

LLaMA

Inference code for Llama models

“Llama” is the repository from Meta (formerly Facebook/Meta Research) containing the inference code for LLaMA (Large Language Model Meta AI) models. It provides utilities to load pre-trained LLaMA model weights, run inference (text generation, chat, completions), and work with tokenizers. Tokenizer utilities, download scripts, shell helpers to fetch model weights with correct licensing/permissions. Includes example scripts for chat completions and text completions to show how to call the models in code. ...

Downloads: 0 This Week

Last Update: 2025-09-14

See Project

ReplitLM

Inference code and configs for the ReplitLM model family

ReplitLM is a family of open-source language models developed by Replit for assisting with programming tasks such as code generation and completion. The project includes model checkpoints, configuration files, and example code that enable developers to run and experiment with the models locally or within machine learning frameworks. These models are designed specifically for coding workflows and are trained on large datasets of source code covering many programming languages and development...

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

...The model supports efficient inference via INT8 and INT4 quantization, reducing hardware requirements from 8× A100 GPUs to as little as a single server with 4× RTX 3090s. Built on the SwissArmyTransformer (SAT) framework and compatible with DeepSpeed and FasterTransformer, it supports high-speed inference (up to 2.5× faster) and reproducible evaluation across 30+ benchmark tasks.

Downloads: 2 This Week

Last Update: 1 day ago

See Project

Language Models

Explore large language models in 512MB of RAM

...The project focuses on enabling developers and students to explore language model capabilities without needing expensive GPUs or large cloud infrastructures. By using small and optimized models, the library allows LLM inference to run in environments with limited resources, sometimes requiring only a few hundred megabytes of memory. The package provides simple APIs that allow developers to generate text, perform semantic search, classify text, and answer questions using local models. It is particularly useful for educational purposes, as it demonstrates the fundamental mechanics of language model inference and prompt-based applications. ...

Downloads: 0 This Week

Last Update: 2026-03-15

See Project

Repo of Tree of Thoughts (ToT)

Implementation of "Tree of Thoughts

Language models are increasingly being deployed for general problem-solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem-solving. ...

Downloads: 0 This Week

Last Update: 2023-08-21

See Project

GPT-NeoX

Implementation of model parallel autoregressive transformers on GPUs

...For those looking for a TPU-centric codebase, we recommend Mesh Transformer JAX. If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.

Downloads: 1 This Week

Last Update: 2023-03-23

See Project

LM Human Preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

lm-human-preferences is the official OpenAI codebase that implements the method from the paper Fine-Tuning Language Models from Human Preferences. Its purpose is to show how to align language models with human judgments by training a reward model from human comparisons and then fine-tuning a policy model using that reward signal. The repository includes scripts to train the reward model (learning to rank or score pairs of outputs), and to fine-tune a policy (a language model) with...

Downloads: 0 This Week

Last Update: 2025-10-03

See Project

GPT Neo

An implementation of model parallel GPT-2 and GPT-3-style models

An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here to play with our pre-trained models, we strongly recommend you try out the HuggingFace Transformer integration. Training and inference is officially supported on TPU and should work on GPU as well. This repository will be (mostly) archived as we move focus to our GPU-specific repo, GPT-NeoX. NB, while neo can technically run a training step at 200B+ parameters, it is very inefficient at those scales. This, as well as the fact that many GPUs became available to us, among other things, prompted us to move development over to GPT-NeoX. ...

Downloads: 1 This Week

Last Update: 2023-03-24

See Project

Search Results for "inference" - Page 5

Showing 110 open source projects for "inference"

llama2-webui

Petals

LLaMA

ReplitLM

GLM-130B

Language Models

Repo of Tree of Thoughts (ToT)

GPT-NeoX

LM Human Preferences

GPT Neo

Search Results for "inference" - Page 5

Showing 110 open source projects for "inference"

llama2-webui

Petals

LLaMA

ReplitLM

GLM-130B

Language Models

Repo of Tree of Thoughts (ToT)

GPT-NeoX

LM Human Preferences

GPT Neo

Related Searches

Related Categories