Page 2 | llama-cpp-static free download

SGLang

SGLang is a fast serving framework for large language models

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

Downloads: 4 This Week

Last Update: 2026-02-23

See Project

Chinese-LLaMA-Alpaca 2

Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project

This project is developed based on the commercially available large model Llama-2 released by Meta. It is the second phase of the Chinese LLaMA&Alpaca large model project. The Chinese LLaMA-2 base model and the Alpaca-2 instruction fine-tuning large model are open-sourced. These models expand and optimize the Chinese vocabulary on the basis of the original Llama-2, use large-scale Chinese data for incremental pre-training, and further improve the basic semantics and command understanding of Chinese. ...

Downloads: 0 This Week

Last Update: 2024-01-23

See Project

model2Vec

Fast State-of-the-Art Static Embeddings

model2vec is an innovative embedding framework that converts large sentence transformer models into compact, high-speed static embedding models while preserving much of their semantic performance. The project focuses on dramatically reducing the computational cost of generating embeddings, achieving significant improvements in speed and model size without requiring large datasets for retraining. By using a distillation-based approach, it can produce lightweight models that run efficiently on CPUs, making it suitable for edge applications and large-scale processing pipelines. ...

Downloads: 0 This Week

Last Update: 2 days ago

See Project

LazyLLM

Easiest and laziest way for building multi-agent LLMs applications

LazyLLM is an optimized, lightweight LLM server designed for easy and fast deployment of large language models. It is fully compatible with the OpenAI API specification, enabling developers to integrate their own models into applications that normally rely on OpenAI’s endpoints. LazyLLM emphasizes low resource usage and fast inference while supporting multiple models.

Downloads: 0 This Week

Last Update: 2026-03-04

See Project

Lepton AI

A Pythonic framework to simplify AI service building

A Pythonic framework to simplify AI service building. Cutting-edge AI inference and training, unmatched cloud-native experience, and top-tier GPU infrastructure. Ensure 99.9% uptime with comprehensive health checks and automatic repairs.

Downloads: 1 This Week

Last Update: 2026-01-17

See Project

OpenLLM

Operating LLMs in production

...With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2， StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Downloads: 0 This Week

Last Update: 2025-04-21

See Project

AutoCoder

A long-running autonomous coding agent powered by the Claude Agent

...The core idea is to accelerate software production while preserving correctness and readability, minimizing the cognitive overhead that comes from switching between concept and implementation. Its architecture typically integrates language models with static analysis and template logic so that generated code is not only syntactically valid but also idiomatic and testable.

Downloads: 1 This Week

Last Update: 2026-02-05

See Project

Meta Agents Research Environments (ARE)

Meta Agents Research Environments is a comprehensive platform

Meta Agents Research Environments (ARE) is a simulation and benchmarking platform. It is designed to evaluate AI agents in dynamic, evolving, multi-step tasks. Unlike static benchmarks, ARE supports environments where agents must adapt to changes over time and reason over sequences of actions. It interacts with applications and faces uncertainty. The included Gaia2 benchmark offers 800 scenarios across multiple “universes”. It can test reasoning, memory, tool use, and adaptability. Integration with simulated applications/agent APIs (email, file system, etc.). ...

Downloads: 0 This Week

Last Update: 2026-01-23

See Project

Pruna AI

Pruna is a model optimization framework built for developers

Pruna is an open-source, self-hostable AI inference engine designed to help teams deploy and manage large language models (LLMs) efficiently across private or hybrid infrastructures. Built with performance and developer ergonomics in mind, Pruna simplifies inference workflows by enabling multi-model orchestration, autoscaling, GPU resource allocation, and compatibility with popular open-source models. It is ideal for companies or teams looking to reduce reliance on external APIs while...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

Curated Transformers

PyTorch library of curated Transformer models and their components

...It provides state-of-the-art models that are composed of a set of reusable components. Supports state-of-the-art transformer models, including LLMs such as Falcon, Llama, and Dolly v2. Implementing a feature or bugfix benefits all models. For example, all models support 4/8-bit inference through the bitsandbytes library and each model can use the PyTorch meta device to avoid unnecessary allocations and initialization.

Downloads: 0 This Week

Last Update: 2024-04-17

See Project

Animated Drawings

Code to accompany "A Method for Animating Children's Drawings"

AnimatedDrawings is a framework that converts user sketches or line drawings into fully animated 2D motion sequences using learned motion priors. The idea is that you draw a simple static figure (stick figure, silhouette, or contour lines), and the system produces plausible skeletal motion (walking, jumping, dancing) that adheres to the drawn shape constraints. The architecture separates shape embedding (to understand user-drawn geometry) from motion embedding / generation (to produce temporally coherent movement). ...

Downloads: 1 This Week

Last Update: 2025-10-07

See Project

LongWriter

Unleashing 10,000+ Word Generation from Long Context LLMs

LongWriter is an open-source framework and set of large language models designed to enable ultra-long text generation that can exceed 10,000 words while maintaining coherence and structure. Traditional large language models can process large inputs but often struggle to generate long outputs due to limitations in training data and alignment strategies. LongWriter addresses this challenge by introducing a specialized dataset and training approach that encourages models to produce longer...

Downloads: 1 This Week

Last Update: 2026-03-06

See Project

Chat with LLMs Everywhere

Run PyTorch LLMs locally on servers, desktop and mobile

...TorchChat supports running models through Python interfaces as well as integrating them directly into native applications written in languages such as C or C++. The project also demonstrates how modern LLMs like LLaMA-style models can be deployed locally while maintaining good performance across different hardware platforms.

Downloads: 1 This Week

Last Update: 2026-03-05

See Project

Elia

Terminal-based LLM chat tool with multi-model and local support

...It runs entirely in the command line, offering a keyboard-driven experience that reduces the need for switching between apps. Users can chat with both proprietary models like ChatGPT and Claude, as well as local models such as Llama 3, Mistral, and Gemma. Elia stores conversations in a local SQLite database, making it easy to revisit past interactions. It supports flexible usage with inline and full-screen chat modes, along with simple configuration through a single file. Installation is straightforward via pipx, and users can customize themes, system prompts, and model settings. ...

Downloads: 0 This Week

Last Update: 2026-03-19

See Project

Cosmos-RL

Cosmos-RL is a flexible and scalable Reinforcement Learning framework

...The framework supports multiple parallelism strategies, including tensor, pipeline, and data parallelism, allowing it to leverage large GPU clusters effectively. It is built with compatibility in mind, supporting popular model families such as LLaMA, Qwen, and diffusion-based world models, as well as integration with Hugging Face ecosystems. cosmos-rl also includes support for advanced RL algorithms, low-precision training, and fault-tolerant execution, making it suitable for large-scale production workloads.

Downloads: 0 This Week

Last Update: 2026-03-18

See Project

EmoLLM

Pre & Post-training & Dataset & Evaluation & Depoly & RAG

...Its repository includes multiple model variants and training configurations spanning several underlying model families, including InternLM, Qwen, DeepSeek, Mixtral, LLaMA, and others, which shows that the initiative is structured as a broad ecosystem rather than a single release. The project also covers more than just model weights, with material for datasets, fine-tuning, evaluation, deployment, demos, RAG, and related subprojects such as its psychological digital assistant work.

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

Vulnhuntr

AI tool for detecting complex vulnerabilities in Python codebases

Vulnhuntr is an open source security tool that uses large language models to analyze codebases and identify remotely exploitable vulnerabilities. It focuses on Python projects and applies static code analysis combined with LLM reasoning to trace how user input flows through an application. Instead of scanning entire repositories at once, it builds call chains step by step, allowing deeper inspection of complex, multi-stage issues that traditional tools may miss. Vulnhuntr can generate detailed findings, including vulnerability explanations and potential exploit paths, helping developers and security teams understand risks faster. ...

Downloads: 0 This Week

Last Update: 3 days ago

See Project

AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security

...By simulating realistic development scenarios, the benchmark assesses how well AI code generation systems handle security-sensitive programming tasks. AICGSecEval combines static and dynamic evaluation techniques to analyze generated code for vulnerabilities and functional correctness. The framework includes datasets, test cases, and evaluation metrics that measure how AI programming tools perform across multiple programming languages and vulnerability categories.

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

LLM-Pruner

On the Structural Pruning of Large Language Models

LLM-Pruner is an open-source framework designed to compress large language models through structured pruning techniques while maintaining their general capabilities. Large language models often require enormous computational resources, making them expensive to deploy and inefficient for many practical applications. LLM-Pruner addresses this issue by identifying and removing non-essential components within transformer architectures, such as redundant attention heads or feed-forward...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

TAME LLM

Traditional Mandarin LLMs for Taiwan

TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific reasoning in fields like manufacturing, law, healthcare, and electronics. ...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

Intel LLM Library for PyTorch

Accelerate local LLM inference and finetuning

...The framework provides hardware-aware optimizations and low-precision computation techniques that significantly improve the performance of large language models while reducing memory consumption. IPEX-LLM supports a wide range of popular models, including architectures such as LLaMA, Mistral, Qwen, and other transformer-based systems. The library can integrate with common AI frameworks and serving tools such as Hugging Face Transformers, LangChain, and vLLM, allowing developers to incorporate optimized inference into existing pipelines.

Downloads: 0 This Week

Last Update: 2026-03-04

See Project

h2oGPT

Private chat with local GPT with document, images, video, etc.

h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...

Downloads: 0 This Week

Last Update: 2025-02-22

See Project

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation model

...It is model-agnostic and advertises support for a variety of TTS and speech models such as ChatTTS, CosyVoice, Fish-Speech, FireredTTS and others, as well as Whisper-based ASR, giving you a flexible playground for experimenting with different speech stacks. The project also integrates with general-purpose LLMs (for example GPT- or LLaMA-style models), which can be used to pre-process text, manage conversations.

Downloads: 1 This Week

Last Update: 2026-02-02

See Project

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

CogVLM2 is the second generation of the CogVLM vision-language model series, developed by ZhipuAI and released in 2024. Built on Meta-Llama-3-8B-Instruct, CogVLM2 significantly improves over its predecessor by providing stronger performance across multimodal benchmarks such as TextVQA, DocVQA, and ChartQA, while introducing extended context length support of up to 8K tokens and high-resolution image input up to 1344×1344. The series includes models for both image understanding and video understanding, with CogVLM2-Video supporting up to 1-minute videos by analyzing keyframes. ...

Downloads: 0 This Week

Last Update: 2 days ago

See Project

Strix

Open-source AI hackers to find and fix your app’s vulnerabilities

...The system is designed to mimic the behavior of real attackers by executing dynamic testing and verifying findings through proof-of-concept exploitation. Unlike traditional vulnerability scanners that rely heavily on static analysis, Strix agents actively run code, probe systems, and attempt exploitation to confirm whether vulnerabilities are genuinely exploitable. The platform is intended for developers and security teams that need rapid security assessments without the overhead of manual penetration testing engagements. Strix can orchestrate multiple cooperating agents that divide investigation tasks and collaboratively analyze complex applications or infrastructure.

Downloads: 6 This Week

Last Update: 7 days ago

See Project

Search Results for "llama-cpp-static" - Page 2

Showing 99 open source projects for "llama-cpp-static"

SGLang

Chinese-LLaMA-Alpaca 2

model2Vec

LazyLLM

Lepton AI

OpenLLM

AutoCoder

Meta Agents Research Environments (ARE)

Pruna AI

Curated Transformers

Animated Drawings

LongWriter

Chat with LLMs Everywhere

Elia

Cosmos-RL

EmoLLM

Vulnhuntr

AICGSecEval

LLM-Pruner

TAME LLM

Intel LLM Library for PyTorch

h2oGPT

Speech-AI-Forge

CogVLM2

Strix

Search Results for "llama-cpp-static" - Page 2

Showing 99 open source projects for "llama-cpp-static"

SGLang

Chinese-LLaMA-Alpaca 2

model2Vec

LazyLLM

Lepton AI

OpenLLM

AutoCoder

Meta Agents Research Environments (ARE)

Pruna AI

Curated Transformers

Animated Drawings

LongWriter

Chat with LLMs Everywhere

Elia

Cosmos-RL

EmoLLM

Vulnhuntr

AICGSecEval

LLM-Pruner

TAME LLM

Intel LLM Library for PyTorch

h2oGPT

Speech-AI-Forge

CogVLM2

Strix

Related Searches

Related Categories