Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
LLM Inference Tools
Search Results

Search Results for "hardware"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 25
Windows 24
Mac 23
More...
BSD 3
ChromeOS 2
Mobile Operating Systems 1

Category

Artificial Intelligence 25
Software Development 5
System 1

License

OSI-Approved Open Source 25

Translations

English 1

Programming Language

Python 13
C++ 11
C 1
Go 1
More...
Swift 1

Showing 25 open source projects for "hardware"

View related business solutions

LLM Inference Linux Clear Filters & Widen Search

Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
Host LLMs in Production With On-Demand GPUs
NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.

Try Free
1

GPT4All

Run Local LLMs on Any Device. Open-source

...The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This project also supports Python integrations for easy automation and customization. GPT4All is ideal for individuals and businesses seeking private, offline access to powerful LLMs.

1 Review

Downloads: 105 This Week

Last Update: 2025-03-17
See Project
2

LocalAI

The free, Open Source alternative to OpenAI, Claude and others

LocalAI is an open-source platform that allows users to run large language models and other AI systems locally on their own hardware. It acts as a drop-in replacement for APIs such as OpenAI, enabling developers to build AI-powered applications without relying on external cloud services. The platform supports a wide range of model types, including text generation, image creation, speech processing, and embeddings. LocalAI can run on consumer-grade hardware and does not necessarily require a GPU, making it accessible for local development and private deployments. ...

Downloads: 35 This Week

Last Update: 12 hours ago
See Project
3

ONNX Runtime

ONNX Runtime: cross-platform, high performance ML inferencing

...Support for a variety of frameworks, operating systems and hardware platforms. Built-in optimizations that deliver up to 17X faster inferencing and up to 1.4X faster training.

Downloads: 29 This Week

Last Update: 2026-06-22
See Project
4

whisper.cpp

Port of OpenAI's Whisper model in C/C++

...The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. Quantized models require less memory and disk space and depending on the hardware can be processed more efficiently.

Downloads: 537 This Week

Last Update: 2026-06-19
See Project
$300 Free Credits to Build on Google Cloud
New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.

Claim $300 Free
5

FlashInfer

FlashInfer: Kernel Library for LLM Serving

...It provides a high-performance framework that integrates seamlessly with existing systems, aiming to reduce latency and improve efficiency in LLM deployments. FlashInfer supports various hardware architectures and is built to scale with the demands of production environments.

Downloads: 1 This Week

Last Update: 3 days ago
See Project
6

LLM.swift

LLM.swift is a simple and readable library

LLM.swift is a Swift package that enables developers to run Large Language Models (LLMs) directly on Apple devices, including iOS, macOS, and watchOS. By leveraging Apple's hardware and software optimizations, LLM.swift facilitates on-device natural language processing tasks, ensuring user privacy and reducing latency associated with cloud-based solutions.

Downloads: 0 This Week

Last Update: 2025-12-06
See Project
7

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 1 This Week

Last Update: 2025-03-13
See Project
8

ChatGLM.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.

Downloads: 0 This Week

Last Update: 2025-01-21
See Project
9

MIVisionX

Set of comprehensive computer vision & machine intelligence libraries

...AMD MIVisionX delivers highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions along with Convolution Neural Net Model Compiler & Optimizer supporting ONNX, and Khronos NNEF™ exchange formats. The toolkit allows for rapid prototyping and deployment of optimized computer vision and machine learning inference workloads on a wide range of computer hardware, including small embedded x86 CPUs, APUs, discrete GPUs, and heterogeneous servers. AMD OpenVX is a highly optimized open-source implementation of the Khronos OpenVX™ 1.3 computer vision specification. It allows for rapid prototyping as well as fast execution on a wide range of computer hardware, including small embedded x86 CPUs and large workstation discrete GPUs.

Downloads: 1 This Week

Last Update: 2026-06-27
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
10

Bolt NLP

Bolt is a deep learning library with high performance

Bolt is a high-performance deep learning inference framework developed by Huawei Noah's Ark Lab. It is designed to optimize and accelerate the deployment of deep learning models across various hardware platforms. Bolt is a light-weight library for deep learning. Bolt, as a universal deployment tool for all kinds of neural networks, aims to automate the deployment pipeline and achieve extreme acceleration. Bolt has been widely deployed and used in many departments of HUAWEI company, such as 2012 Laboratory, CBG and HUAWEI Product Lines. ...

Downloads: 0 This Week

Last Update: 2025-01-30
See Project
11

RamaLama

Simplifies the local serving of AI models from any source

RamaLama is an open-source developer tool that simplifies working with and serving AI models locally or in production by leveraging container technologies like Docker, Podman, and OCI registries, allowing AI inference workflows to be treated like standard container deployments. It abstracts away much of the complexity of configuring AI runtimes, dependencies, and hardware optimizations by detecting available GPUs (or falling back to CPU) and automatically pulling a container image pre-configured for the detected hardware environment. Developers can use familiar container commands to pull, run, and interact with AI models from any source, treating models similarly to how container images are handled in OCI workflows. ...

Downloads: 0 This Week

Last Update: 2026-06-24
See Project
12

ONNX

Open standard for machine learning interoperability

...It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Currently we focus on the capabilities needed for inferencing (scoring). ONNX is widely supported and can be found in many frameworks, tools, and hardware. Enabling interoperability between different frameworks and streamlining the path from research to production helps increase the speed of innovation in the AI community.

Downloads: 9 This Week

Last Update: 2026-06-15
See Project
13

OpenVINO

OpenVINO™ Toolkit repository

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime,...

Downloads: 15 This Week

Last Update: 2026-06-09
See Project
14

ExecuTorch

On-device AI across mobile, embedded and edge for PyTorch

ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.

Downloads: 1 This Week

Last Update: 2026-05-28
See Project
15

NNCF

Neural Network Compression Framework for enhanced OpenVINO

NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.

Downloads: 0 This Week

Last Update: 2026-06-01
See Project
16

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models

Gemma.cpp is a C++ implementation for running inference with Gemma models efficiently on CPUs and GPUs. Developed by Google, it allows running large language models (LLMs) like Gemma with minimal hardware, focusing on optimized performance and low latency. Gemma.cpp is intended for developers seeking to deploy LLMs in production environments without needing massive computational resources.

Downloads: 0 This Week

Last Update: 2025-03-25
See Project
17

TensorFlow Model Optimization Toolkit

A toolkit to optimize ML models for deployment for Keras & TensorFlow

...Deploy models to edge devices with restrictions on processing, memory, power consumption, network usage, and model storage space. Enable execution on and optimize for existing hardware or new special purpose accelerators. Choose the model and optimization tool depending on your task. In many cases, pre-optimized models can improve the efficiency of your application. Try the post-training tools to optimize an already-trained TensorFlow model. Use training-time optimization tools and learn about the techniques.

Downloads: 1 This Week

Last Update: 2026-05-12
See Project
18

PEFT

State-of-the-art Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full...

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
19

AIMET

AIMET is a library that provides advanced quantization and compression

...With this goal, QuIC presents the AI Model Efficiency Toolkit (AIMET) - a library that provides advanced quantization and compression techniques for trained neural network models. AIMET enables neural networks to run more efficiently on fixed-point AI hardware accelerators. Quantized inference is significantly faster than floating point inference. For example, models that we’ve run on the Qualcomm® Hexagon™ DSP rather than on the Qualcomm® Kryo™ CPU have resulted in a 5x to 15x speedup. Plus, an 8-bit model also has a 4x smaller memory footprint relative to a 32-bit model. However, often when quantizing a machine learning model (e.g., from 32-bit floating point to an 8-bit fixed point value), the model accuracy is sacrificed.

Downloads: 2 This Week

Last Update: 5 days ago
See Project
20

TensorFlow Probability

Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability is a library for probabilistic reasoning and statistical analysis. TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU). It's for data scientists, statisticians, ML researchers, and practitioners who want to encode domain knowledge to understand data and make predictions. Since TFP inherits the benefits of TensorFlow, you can build, fit, and deploy a model using a single language throughout the lifecycle of model exploration and production. ...

Downloads: 0 This Week

Last Update: 2024-11-08
See Project
21

Xorbits Inference

Replace OpenAI GPT with another LLM in your app

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
22

MegEngine

Easy-to-use deep learning framework with 3 key features

MegEngine is a fast, scalable and easy-to-use deep learning framework with 3 key features. You can represent quantization/dynamic shape/image pre-processing and even derivation in one model. After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR...

Downloads: 9 This Week

Last Update: 2024-04-30
See Project
23

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis

AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.

Downloads: 0 This Week

Last Update: 2025-01-21
See Project
24

Autodistill

Images to inference with no labeling

...Using autodistill, you can go from unlabeled images to inference on a custom model running at the edge with no human intervention in between. You can use Autodistill on your own hardware, or use the Roboflow hosted version of Autodistill to label images in the cloud.

Downloads: 0 This Week

Last Update: 2024-08-14
See Project
25

EvaDB

Database system for building simpler and faster AI-powered application

...For example, the state-of-the-art object detection model takes multiple GPU years to process just a week’s videos from a single traffic monitoring camera. Besides the money spent on hardware, these models also increase the time that you spend waiting for the model inference to finish.

Downloads: 0 This Week

Last Update: 2023-11-19
See Project

Previous
You're on page 1
Next

Related Searches

whisper-windows-x64.exe

offline artificial intelligence\

whisper-bin-x64.zip

gpt4all

whisper.cpp

openvino

xinference

local llm

jcop english software

gpt4all-installer-win64.exe

Related Categories

Artificial Intelligence

Software Development

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise