http-client free download

OpenLLM

Operating LLMs in production

...Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2， StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Downloads: 5 This Week

Last Update: 2025-04-21

See Project

RWKV Runner

A RWKV management and startup tool, full automation, only 8MB

RWKV (pronounced as RwaKuv) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free. Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility...

Downloads: 11 This Week

Last Update: 2026-05-08

See Project

GPT4All

Run Local LLMs on Any Device. Open-source

GPT4All is an open-source project that allows users to run large language models (LLMs) locally on their desktops or laptops, eliminating the need for API calls or GPUs. The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This...

1 Review

Downloads: 118 This Week

Last Update: 2025-03-17

See Project

TensorFlow Serving

Serving system for machine learning models

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data. The...

Downloads: 4 This Week

Last Update: 2026-04-23

See Project

OpenVINO Model Server

A scalable inference server for models optimized with OpenVINO

...It’s implemented in C++ for scalability and efficiency, making it suitable for both edge and cloud deployments where inference workloads must be reliable and high throughput. The server exposes model inference via standard network protocols like REST and gRPC, allowing any client that speaks those protocols to request predictions remotely, abstracting away the complexity of where and how the model runs. It supports model deployment in diverse environments including Docker, bare-metal machines, and Kubernetes clusters, and is especially useful in microservices architectures where AI services need to scale independently. ...

Downloads: 14 This Week

Last Update: 2026-05-28

See Project

huggingface_hub

The official Python client for the Huggingface Hub

The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.

Downloads: 3 This Week

Last Update: 2 days ago

See Project

Lepton AI

A Pythonic framework to simplify AI service building

A Pythonic framework to simplify AI service building. Cutting-edge AI inference and training, unmatched cloud-native experience, and top-tier GPU infrastructure. Ensure 99.9% uptime with comprehensive health checks and automatic repairs.

Downloads: 4 This Week

Last Update: 2026-06-05

See Project

KubeAI

Private Open AI on Kubernetes

Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text. KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models by using the Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See Operator Pattern) that manages vLLM and Ollama servers.

Downloads: 7 This Week

Last Update: 2026-03-31

See Project

Triton Inference Server

The Triton Inference Server provides an optimized cloud

...Triton delivers optimized performance for many query types, including real-time, batched, ensembles, and audio/video streaming. Provides Backend API that allows adding custom backends and pre/post-processing operations. Model pipelines using Ensembling or Business Logic Scripting (BLS). HTTP/REST and GRPC inference protocols based on the community-developed KServe protocol. A C API and Java API allow Triton to link directly into your application for edge and other in-process use cases.

Downloads: 13 This Week

Last Update: 2026-05-27

See Project

Search Results for "http-client"

Showing 9 open source projects for "http-client"

OpenLLM

RWKV Runner

GPT4All

TensorFlow Serving

OpenVINO Model Server

huggingface_hub

Lepton AI

KubeAI

Triton Inference Server

Search Results for "http-client"

Showing 9 open source projects for "http-client"

OpenLLM

RWKV Runner

GPT4All

TensorFlow Serving

OpenVINO Model Server

huggingface_hub

Lepton AI

KubeAI

Triton Inference Server

Related Searches

Related Categories