Page 2 | llama-cpp-python.whl free download

SGLang

SGLang is a fast serving framework for large language models

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

Downloads: 1 This Week

Last Update: 3 days ago

See Project

OpenLLM

Operating LLMs in production

...With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2， StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Downloads: 0 This Week

Last Update: 2025-04-21

See Project

Pruna AI

Pruna is a model optimization framework built for developers

Pruna is an open-source, self-hostable AI inference engine designed to help teams deploy and manage large language models (LLMs) efficiently across private or hybrid infrastructures. Built with performance and developer ergonomics in mind, Pruna simplifies inference workflows by enabling multi-model orchestration, autoscaling, GPU resource allocation, and compatibility with popular open-source models. It is ideal for companies or teams looking to reduce reliance on external APIs while...

Downloads: 0 This Week

Last Update: 2025-11-10

See Project

Curated Transformers

PyTorch library of curated Transformer models and their components

...It provides state-of-the-art models that are composed of a set of reusable components. Supports state-of-the-art transformer models, including LLMs such as Falcon, Llama, and Dolly v2. Implementing a feature or bugfix benefits all models. For example, all models support 4/8-bit inference through the bitsandbytes library and each model can use the PyTorch meta device to avoid unnecessary allocations and initialization.

Downloads: 0 This Week

Last Update: 2024-04-17

See Project

h2oGPT

Private chat with local GPT with document, images, video, etc.

h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...

Downloads: 2 This Week

Last Update: 2025-02-22

See Project

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation model

...It is model-agnostic and advertises support for a variety of TTS and speech models such as ChatTTS, CosyVoice, Fish-Speech, FireredTTS and others, as well as Whisper-based ASR, giving you a flexible playground for experimenting with different speech stacks. The project also integrates with general-purpose LLMs (for example GPT- or LLaMA-style models), which can be used to pre-process text, manage conversations.

Downloads: 3 This Week

Last Update: 2025-11-28

See Project

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

CogVLM2 is the second generation of the CogVLM vision-language model series, developed by ZhipuAI and released in 2024. Built on Meta-Llama-3-8B-Instruct, CogVLM2 significantly improves over its predecessor by providing stronger performance across multimodal benchmarks such as TextVQA, DocVQA, and ChartQA, while introducing extended context length support of up to 8K tokens and high-resolution image input up to 1344×1344. The series includes models for both image understanding and video understanding, with CogVLM2-Video supporting up to 1-minute videos by analyzing keyframes. ...

Downloads: 0 This Week

Last Update: 4 days ago

See Project

CSM (Conversational Speech Model)

A Conversational Speech Generation Model

The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Downloads: 10 This Week

Last Update: 2025-03-19

See Project

Chinese-LLaMA-Alpaca-2 v2.0

Chinese LLaMA & Alpaca large language model + local CPU/GPU training

This project has open-sourced the Chinese LLaMA model and the Alpaca large model with instruction fine-tuning to further promote the open research of large models in the Chinese NLP community. Based on the original LLaMA , these models expand the Chinese vocabulary and use Chinese data for secondary pre-training, which further improves the basic semantic understanding of Chinese.

Downloads: 2 This Week

Last Update: 2023-08-21

See Project

OpenFlamingo

An open-source framework for training large multimodal models

...If you have any questions, please feel free to open an issue. We also welcome contributions! We provide an initial OpenFlamingo 9B model using a CLIP ViT-Large vision encoder and a LLaMA-7B language model. In general, we support any CLIP vision encoder. For the language model, we support LLaMA, OPT, GPT-Neo, GPT-J, and Pythia models. OpenFlamingo is a multimodal language model that can be used for a variety of tasks. It is trained on a large multimodal dataset.

Downloads: 0 This Week

Last Update: 2023-08-15

See Project

xTuring

Easily build, customize and control your own LLMs

xTuring is an open-source AI personalization software. xTuring makes it easy to build and control LLMs by providing a simple interface to personalize LLMs to your own data and application. xTuring provides fast, efficient and simple fine-tuning of LLMs, such as LLaMA, GPT-J, Galactica, and more. By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it simple to build, customize and control LLMs. The entire process can be done inside your computer or in your private cloud, ensuring data privacy and security.

Downloads: 0 This Week

Last Update: 2023-09-06

See Project

Zylthra

Zylthra: A PyQt6 app to generate synthetic datasets with DataLLM.

Welcome to Zylthra, a powerful Python-based desktop application built with PyQt6, designed to generate synthetic datasets using the DataLLM API from data.mostly.ai. This tool allows users to create custom datasets by defining columns, configuring generation parameters, and saving setups for reuse, all within a sleek, dark-themed interface.

Downloads: 0 This Week

Last Update: 2025-04-10

See Project

Email to Calendar Event ETE

The python App/Skrypt automaticly add important events into calendar.

It is use AI running localy and model you can choose. Supproted two API first is as default is Llama, second if full LM Studio api. Skrypt have a tool for automatic add to scheduler or cron-not tested enought. Scrypt now not working with Microsoft outlook and Google gmail, for certifications and api polici reasons . Fuly tested on Seznam.cz* services provider, if you have difrent provier with same type of security or autentification it will be working.

Downloads: 0 This Week

Last Update: 2025-11-15

See Project

ChatGenTitle

A paper title generation model fine-tuned on the LLaMA model

ChatGenTitle: A paper title generation model fine-tuned on the LLaMA model using information from millions of arXiv papers.

Downloads: 0 This Week

Last Update: 2023-08-25

See Project

unit-minions

AI R&D Efficiency Improvement Research: Do-It-Yourself Training LoRA

"AI R&D Efficiency Improvement Research: Do-It-Yourself Training LoRA", including Llama (Alpaca LoRA) model, ChatGLM (ChatGLM Tuning) related Lora training. Training content: user story generation, test code generation, code-assisted generation, text to SQL, text generation code.

Downloads: 0 This Week

Last Update: 2023-08-25

See Project

pyllama

LLaMA: Open and Efficient Foundation Language Models

📢 pyllama is a hacked version of LLaMA based on original Facebook's implementation but more convenient to run in a Single consumer grade GPU.

Downloads: 0 This Week

Last Update: 2023-08-24

See Project

owl_cpp

C++ library for working with OWL ontologies

1 Review

Downloads: 0 This Week

Last Update: 2017-02-14

See Project

Mellum-4b-base

JetBrains’ 4B parameter code model for completions

Mellum-4b-base is JetBrains’ first open-source large language model designed and optimized for code-related tasks. Built with 4 billion parameters and a LLaMA-style architecture, it was trained on over 4.2 trillion tokens across multiple programming languages, including datasets such as The Stack, StarCoder, and CommitPack. With a context window of 8,192 tokens, it excels at code completion, fill-in-the-middle tasks, and intelligent code suggestions for professional developer tools and IDEs. ...

Downloads: 0 This Week

Last Update: 2025-09-11

See Project

Search Results for "llama-cpp-python.whl" - Page 2

Showing 43 open source projects for "llama-cpp-python.whl"

SGLang

OpenLLM

Pruna AI

Curated Transformers

h2oGPT

Speech-AI-Forge

CogVLM2

CSM (Conversational Speech Model)

Chinese-LLaMA-Alpaca-2 v2.0

OpenFlamingo

xTuring

Zylthra

Email to Calendar Event ETE

ChatGenTitle

unit-minions

pyllama

owl_cpp

Mellum-4b-base

Search Results for "llama-cpp-python.whl" - Page 2

Showing 43 open source projects for "llama-cpp-python.whl"

SGLang

OpenLLM

Pruna AI

Curated Transformers

h2oGPT

Speech-AI-Forge

CogVLM2

CSM (Conversational Speech Model)

Chinese-LLaMA-Alpaca-2 v2.0

OpenFlamingo

xTuring

Zylthra

Email to Calendar Event ETE

ChatGenTitle

unit-minions

pyllama

owl_cpp

Mellum-4b-base

Related Searches

Related Categories