api free download - SourceForge

Showing 37 open source projects for "api"

View related business solutions

LLM Inference Clear Filters & Widen Search

Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
1

API-for-Open-LLM

Openai style api for open large language models

API-for-Open-LLM is a lightweight API server designed for deploying and serving open large language models (LLMs), offering a simple way to integrate LLMs into applications.

Downloads: 0 This Week

Last Update: 2025-01-22
See Project
2

GPT4All

Run Local LLMs on Any Device. Open-source

GPT4All is an open-source project that allows users to run large language models (LLMs) locally on their desktops or laptops, eliminating the need for API calls or GPUs. The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. ...

1 Review

Downloads: 132 This Week

Last Update: 2025-03-17
See Project
3

whisper.cpp

Port of OpenAI's Whisper model in C/C++

whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples....

Downloads: 399 This Week

Last Update: 2026-03-19
See Project
4

LocalAI

The free, Open Source alternative to OpenAI, Claude and others

...It integrates with multiple backends like llama.cpp, transformers, and diffusers to support different AI workloads. With its self-hosted architecture and OpenAI-compatible API, LocalAI enables developers to build secure, local-first AI applications.

Downloads: 25 This Week

Last Update: 2026-04-07
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Triton Inference Server

The Triton Inference Server provides an optimized cloud

...Triton supports inference across cloud, data center, edge, and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance for many query types, including real-time, batched, ensembles, and audio/video streaming. Provides Backend API that allows adding custom backends and pre/post-processing operations. Model pipelines using Ensembling or Business Logic Scripting (BLS). HTTP/REST and GRPC inference protocols based on the community-developed KServe protocol. A C API and Java API allow Triton to link directly into your application for edge and other in-process use cases.

Downloads: 3 This Week

Last Update: 2026-04-28
See Project
6

Bard API

The unofficial python package that returns response of Google Bard

...Please note that the bardapi is not a free service, but rather a tool provided to assist developers with testing certain functionalities due to the delayed development and release of Google Bard's API. It has been designed with a lightweight structure that can easily adapt to the emergence of an official API. Therefore, I strongly discourage using it for any other purposes. If you have access to official PaLM-2 API, replace the provided response with the corresponding official code.

Downloads: 0 This Week

Last Update: 2024-02-24
See Project
7

Open WebUI

User-friendly AI Interface

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with a built-in inference engine for Retrieval Augmented Generation (RAG), making it a powerful AI deployment solution. Key features include effortless setup via Docker or Kubernetes, seamless integration with OpenAI-compatible APIs, granular permissions and user groups for enhanced security,...

Downloads: 107 This Week

Last Update: 2026-04-24
See Project
8

LazyLLM

Easiest and laziest way for building multi-agent LLMs applications

LazyLLM is an optimized, lightweight LLM server designed for easy and fast deployment of large language models. It is fully compatible with the OpenAI API specification, enabling developers to integrate their own models into applications that normally rely on OpenAI’s endpoints. LazyLLM emphasizes low resource usage and fast inference while supporting multiple models.

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
9

Infinity

Low-latency REST API for serving text-embeddings

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.

Downloads: 0 This Week

Last Update: 2025-08-22
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

hfapigo

Unofficial (Golang) Go bindings for the Hugging Face Inference API

(Golang) Go bindings for the Hugging Face Inference API. Directly call any model available in the Model Hub. An API key is required for authorized access. To get one, create a Hugging Face profile.

Downloads: 0 This Week

Last Update: 2025-11-06
See Project
11

KubeAI

Private Open AI on Kubernetes

Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text. KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models by using the Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See Operator Pattern) that manages vLLM and Ollama servers.

Downloads: 2 This Week

Last Update: 2026-03-31
See Project
12

optillm

Optimizing inference proxy for LLMs

OptiLLM is an optimizing inference proxy for Large Language Models (LLMs) that implements state-of-the-art techniques to enhance performance and efficiency. It serves as an OpenAI API-compatible proxy, allowing for seamless integration into existing workflows while optimizing inference processes. OptiLLM aims to reduce latency and resource consumption during LLM inference.

Downloads: 0 This Week

Last Update: 2026-03-19
See Project
13

DeepDetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch

...Neural network templates for the most effective architectures for GPU, CPU, and Embedded devices. Training in a few hours and with small data thanks to 25+ pre-trained models. Full Open Source, with an ecosystem of tools (API clients, video, annotation, ...) Fast Server written in pure C++, a single codebase for Cloud, Desktop & Embedded.

Downloads: 0 This Week

Last Update: 2025-07-19
See Project
14

RWKV Runner

A RWKV management and startup tool, full automation, only 8MB

RWKV (pronounced as RwaKuv) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free. Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility...

Downloads: 7 This Week

Last Update: 2026-02-01
See Project
15

OpenLLM

Operating LLMs in production

...Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2， StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Downloads: 3 This Week

Last Update: 2025-04-21
See Project
16

DocTR

Library for OCR-related tasks powered by Deep Learning

...User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract. Easy integration (available templates for browser demo & API deployment). End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.

Downloads: 5 This Week

Last Update: 1 day ago
See Project
17

Text Generation Inference

Large Language Model Text Generation Inference

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Downloads: 1 This Week

Last Update: 2025-12-18
See Project
18

Transformer Engine

A library for accelerating Transformer models on NVIDIA GPUs

Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference. TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code. TE also includes a framework-agnostic C++ API that can be integrated with other deep-learning libraries to enable FP8 support for Transformers. As the number of parameters in Transformer models continues to grow, training and inference for architectures such as BERT, GPT, and T5 become very memory and compute-intensive. ...

Downloads: 0 This Week

Last Update: 2026-04-24
See Project
19

Xorbits Inference

Replace OpenAI GPT with another LLM in your app

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits...

Downloads: 0 This Week

Last Update: 2026-04-25
See Project
20

TorchAudio

Data manipulation and transformation for audio signal processing

The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...

Downloads: 1 This Week

Last Update: 2026-02-17
See Project
21

OpenAI DALL·E AsyncImage SwiftUI

OpenAI swift async text to image for SwiftUI app using OpenAI

SwiftUI views that asynchronously loads and displays an OpenAI image from open API. You just type in your idea and AI will give you an art solution. DALL-E and DALL-E 2 are deep learning models developed by OpenAI to generate digital images from natural language descriptions, called "prompts". You need to have Xcode 13 installed in order to have access to Documentation Compiler (DocC) OpenAI's text-to-image model DALL-E 2 is a recent example of diffusion models.

Downloads: 0 This Week

Last Update: 2025-08-14
See Project
22

BrowserAI

Run local LLMs like llama, deepseek, kokoro etc. inside your browser

BrowserAI is a cutting-edge platform that allows users to run large language models (LLMs) directly in their web browser without the need for a server. It leverages WebGPU for accelerated performance and supports offline functionality, making it a highly efficient and privacy-conscious solution. The platform provides a developer-friendly SDK with pre-configured popular models, and it allows for seamless switching between MLC and Transformer engines. Additionally, it supports features such as...

Downloads: 1 This Week

Last Update: 2026-04-18
See Project
23

ModelScope

Bring the notion of Model-as-a-Service to life

...The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation. In particular, with rich layers of API abstraction, the ModelScope library offers unified experience to explore state-of-the-art models spanning across domains such as CV, NLP, Speech, Multi-Modality, and Scientific-computation. Model contributors of different areas can integrate models into the ModelScope ecosystem through the layered APIs, allowing easy and unified access to their models. ...

Downloads: 2 This Week

Last Update: 2026-04-28
See Project
24

RamaLama

Simplifies the local serving of AI models from any source

...Developers can use familiar container commands to pull, run, and interact with AI models from any source, treating models similarly to how container images are handled in OCI workflows. RamaLama supports multiple model registries and offers a REST API or chatbot interface for interacting with running models, making it flexible for local development, testing, or integration into larger systems.

Downloads: 1 This Week

Last Update: 2026-04-27
See Project
25

DALI

A GPU-accelerated library containing highly optimized building blocks

The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built-in data loaders and data iterators in popular deep learning frameworks. Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding,...

Downloads: 1 This Week

Last Update: 2026-04-16
See Project