http server python free download

LazyLLM

Easiest and laziest way for building multi-agent LLMs applications

LazyLLM is an optimized, lightweight LLM server designed for easy and fast deployment of large language models. It is fully compatible with the OpenAI API specification, enabling developers to integrate their own models into applications that normally rely on OpenAI’s endpoints. LazyLLM emphasizes low resource usage and fast inference while supporting multiple models.

Downloads: 0 This Week

Last Update: 2025-11-01

See Project

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with state-of-art and influential models. Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. Low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. We provide...

Downloads: 1 This Week

Last Update: 2025-03-04

See Project

API-for-Open-LLM

Openai style api for open large language models

API-for-Open-LLM is a lightweight API server designed for deploying and serving open large language models (LLMs), offering a simple way to integrate LLMs into applications.

Downloads: 1 This Week

Last Update: 2025-01-22

See Project

OpenLLM

Operating LLMs in production

...Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2， StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Downloads: 0 This Week

Last Update: 2025-04-21

See Project

Text Generation Inference

Large Language Model Text Generation Inference

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Downloads: 0 This Week

Last Update: 2025-09-16

See Project

LLaVA

Visual Instruction Tuning: Large Language-and-Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

Downloads: 0 This Week

Last Update: 2024-02-04

See Project

dstack

Open-source tool designed to enhance the efficiency of workloads

dstack is an open-source tool designed to enhance the efficiency of running ML workloads in any cloud (AWS, GCP, Azure, Lambda, etc). It streamlines development and deployment, reduces cloud costs, and frees users from vendor lock-in.

Downloads: 0 This Week

Last Update: 5 days ago

See Project

Infinity

Low-latency REST API for serving text-embeddings

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.

Downloads: 1 This Week

Last Update: 2025-08-22

See Project

KServe

Standardized Serverless ML Inference Platform on Kubernetes

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and...

Downloads: 0 This Week

Last Update: 2025-11-03

See Project

SageMaker Hugging Face Inference Toolkit

Library for serving Transformers models on Amazon SageMaker

SageMaker Hugging Face Inference Toolkit is an open-source library for serving Transformers models on Amazon SageMaker. This library provides default pre-processing, predict and postprocessing for certain Transformers models and tasks. It utilizes the SageMaker Inference Toolkit for starting up the model server, which is responsible for handling inference requests. For the Dockerfiles used for building SageMaker Hugging Face Containers, see AWS Deep Learning Containers. The SageMaker Hugging...

Downloads: 0 This Week

Last Update: 2025-04-23

See Project

SageMaker Inference Toolkit

Serve machine learning models within a Docker container

Serve machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. Once you have a trained model, you can include it in a Docker container that runs your inference code. A container provides an effectively isolated environment, ensuring a consistent runtime regardless of where the...

Downloads: 0 This Week

Last Update: 2023-10-25

See Project

SageMaker MXNet Inference Toolkit

Toolkit for allowing inference and serving with MXNet in SageMaker

SageMaker MXNet Inference Toolkit is an open-source library for serving MXNet models on Amazon SageMaker. This library provides default pre-processing, predict and postprocessing for certain MXNet model types and utilizes the SageMaker Inference Toolkit for starting up the model server, which is responsible for handling inference requests. AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep...

Downloads: 2 This Week

Last Update: 2022-07-05

See Project

Hugging Face Transformer

CPU/GPU inference server for Hugging Face transformer models

Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....

Downloads: 0 This Week

Last Update: 2022-08-22

See Project

BudgetML

Deploy a ML inference service on a budget in 10 lines of code

Deploy a ML inference service on a budget in less than 10 lines of code. BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end. We built BudgetML because it's hard to find a simple way to get a model in production fast and cheaply. Deploying from scratch involves learning too many different concepts like SSL certificate generation, Docker, REST,...

Downloads: 0 This Week

Last Update: 2022-08-26

See Project

Search Results for "http server python"

Showing 14 open source projects for "http server python"

LazyLLM

PaddleSpeech

API-for-Open-LLM

OpenLLM

Text Generation Inference

LLaVA

dstack

Infinity

KServe

SageMaker Hugging Face Inference Toolkit

SageMaker Inference Toolkit

SageMaker MXNet Inference Toolkit

Hugging Face Transformer

BudgetML

Search Results for "http server python"

Showing 14 open source projects for "http server python"

LazyLLM

PaddleSpeech

API-for-Open-LLM

OpenLLM

Text Generation Inference

LLaVA

dstack

Infinity

KServe

SageMaker Hugging Face Inference Toolkit

SageMaker Inference Toolkit

SageMaker MXNet Inference Toolkit

Hugging Face Transformer

BudgetML

Related Searches

Related Categories