inference free download

Showing 26 open source projects for "inference"

View related business solutions

Multimedia Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

AudioCraft

Audiocraft is a library for audio processing and generation

AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. ...

Downloads: 9 This Week

Last Update: 2025-10-13
See Project
2

TRELLIS 2

Native and Compact Structured Latents for 3D Generation

TRELLIS.2 is a cutting-edge open-source model and codebase for high-fidelity 3D asset generation from 2D images, developed to push forward the state of the art in image-to-3D generation. At its core is a novel sparse voxel structure called O-Voxel that jointly encodes both geometry and surface appearance, enabling reconstruction and generation of complex 3D shapes with arbitrary topology, open surfaces, and physically based rendering (PBR) textures. The system leverages a large...

Downloads: 71 This Week

Last Update: 2026-01-29
See Project
3

RestorePhotos.io

Restoring old and blurry face photos with AI

...The project is production-oriented, not just a toy: it uses Bytescale for storage and image processing, Vercel for hosting and serverless functions, Auth.js + Neon for authentication and database, and Upstash Redis for rate limiting. This combination makes it a good blueprint for building real-world AI apps that must deal with authentication, quotas, and storage as well as inference.

Downloads: 2 This Week

Last Update: 2025-11-19
See Project
4

Transcoder

Hardware-accelerated video transcoding using Android MediaCodec APIs

Transcoder by DeepMedia is an AI-powered video-to-video speech translation engine that enables fully automated multilingual dubbing. Unlike traditional speech translation systems that rely on multi-stage pipelines, Transcoder directly translates one speaker’s video into another language while preserving facial expressions, lip-sync, and vocal identity. Designed for real-time use and production-grade pipelines, Transcoder combines advanced deep learning models with GPU acceleration to deliver...

Downloads: 8 This Week

Last Update: 2025-03-25
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Moshi

A speech-text foundation model for real time dialogue

...Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps). Moshi models two streams of audio: one corresponds to Moshi, and the other one to the user. At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.

Downloads: 4 This Week

Last Update: 2024-11-05
See Project
6

LiveAvatar

Streaming Real-time Audio-Driven Avatar Generation

LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions. The project co-designs algorithms and system...

Downloads: 2 This Week

Last Update: 6 days ago
See Project
7

NovaSR

A lightning fast audio upsampler

NovaSR is an extremely lightweight and high-performance audio upsampling model that transforms low-quality 16 kHz audio into clearer, high-fidelity 48 kHz audio with remarkable speed and efficiency. At only about 50 KB in size, the model is orders of magnitude smaller than typical audio super-resolution networks, yet it achieves high quality and realtime performance thanks to its compact architecture and efficient convolutional design. NovaSR is especially valuable for post-processing tasks...

Downloads: 2 This Week

Last Update: 2026-02-26
See Project
8

Bayesian Optimization

Python implementation of global optimization with gaussian processes

This is a constrained global optimization package built upon bayesian inference and gaussian process, that attempts to find the maximum value of an unknown function in as few iterations as possible. This technique is particularly suited for optimization of high cost functions, situations where the balance between exploration and exploitation is important. More detailed information, other advanced features, and tips on usage/implementation can be found in the examples folder.

Downloads: 6 This Week

Last Update: 2026-03-16
See Project
9

VCClient

Software that uses AI to perform real-time voice conversion

...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.

Downloads: 20 This Week

Last Update: 2026-03-23
See Project
Earn up to 16% annual interest with Nexo.
More flexibility. More control.

Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
10

MMDeploy

OpenMMLab Model Deployment Framework

...Models can be exported and run in several backends, and more will be compatible. All kinds of modules in the SDK can be extended, such as Transform for image processing, Net for Neural Network inference, Module for postprocessing and so on. Install and build your target backend. ONNX Runtime is a cross-platform inference and training accelerator compatible with many popular ML/DNN frameworks. Please read getting_started for the basic usage of MMDeploy.

Downloads: 0 This Week

Last Update: 2023-12-25
See Project
11

enhancr

Video Frame Interpolation & Super Resolution using NVIDIA's TensorRT

...The GUI was designed to provide a stunning experience powered by state-of-the-art technologies without feeling clunky and outdated like other alternatives. It features blazing-fast TensorRT inference by NVIDIA, which can speed up AI processes significantly. Pre-packaged, without the need to install Docker or WSL (Windows Subsystem for Linux) - and NCNN inference by Tencent which is lightweight and runs on NVIDIA, AMD and even Apple Silicon - in contrast to the mammoth of an inference PyTorch is, which only runs on NVIDIA GPUs.

1 Review

Downloads: 18 This Week

Last Update: 2023-06-07
See Project
12

FrankMocap

A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

...Outputs include textured meshes, joint locations, and model parameters that can be exported to common DCC tools and game engines. The codebase offers pretrained models, clear inference scripts, and utilities to visualize results, making single-camera motion capture approachable on commodity hardware. Researchers and creators use it for motion studies, AR/VR prototyping, character animation, and human-in-the-loop editing.

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
13

EnCodec

State-of-the-art deep learning based audio codec

...Encodec has applications in speech and music compression, generative modeling, and efficient data transmission for communication systems. The repository includes pretrained checkpoints, PyTorch inference code, and examples for integrating Encodec as a module in downstream generative or streaming systems.

Downloads: 0 This Week

Last Update: 2025-10-12
See Project
14

Coqui STT

The deep learning toolkit for speech-to-text

Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. Coqui STT is battle-tested in both production and research. Multiple possible transcripts, each with an associated confidence score. Experience the immediacy of script-to-performance. With Coqui text-to-speech, production times go from months to minutes. With Coqui, the post is a pleasure. Effortlessly clone the voices of your talent and have the clone handle the problems...

Downloads: 1 This Week

Last Update: 2022-09-03
See Project
15

FOML

FOML is an expressive logic rule language that supports object modeling, analysis, and inference. It naturally supports model-level activities, such as constraints (extending UML diagrams), dynamic compositional modeling, analysis and reasoning about models, model testing, design pattern modeling, specification of Domain Specific Modeling Languages, and meta-modeling. FOML can reason about: 1. The model meta-data (meta-model level reasoning, or syntax reasoning) 2.

Downloads: 0 This Week

Last Update: 2022-11-17
See Project
16

Video Pre-Training

Learning to Act by Watching Unlabeled Online Videos

...The idea is to learn general priors of control from large-scale, unlabeled video data, and then optionally fine-tune those priors for more goal-directed behavior via environment interaction. The repository contains demonstration models of different widths, fine-tuned variants (e.g. for building houses or early-game tasks), and inference scripts that instantiate agents from pretrained weights. Key modules include the behavioral cloning logic, the agent wrapper, and data loading pipelines (with an accessible skeleton for loading Minecraft demonstration data). The repo also includes a run_agent.py script for testing an agent interactively, and an agent.py module encapsulating the control logic.

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
17

TRACER

Extreme Attention Guided Salient Object Tracing Network

Extreme Attention Guided Salient Object Tracing Network (AAAI 2022) implementation in PyTorch. Now, fast inference mode offers a salient object result with the mask. You can get the more clear salient object by tuning the threshold. We will release initializing TRACER with a version of pre-trained TE-x.

Downloads: 0 This Week

Last Update: 2023-04-05
See Project
18

DeepSpeech

Open source embedded speech-to-text engine

...A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page.

Downloads: 11 This Week

Last Update: 2021-04-08
See Project
19

TTS

Deep learning for text to speech

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed, and quality. TTS comes with pre-trained models, tools for measuring dataset quality, and is already used in 20+ languages for products and research projects. Released models in PyTorch, Tensorflow and TFLite. Tools to curate Text2Speech datasets underdataset_analysis. Demo server for model testing. Notebooks for extensive model...

Downloads: 4 This Week

Last Update: 2021-10-18
See Project
20

Consistent Depth

We estimate dense, flicker-free, geometrically consistent depth

...The system builds upon traditional structure-from-motion (SfM) techniques to provide geometric constraints while integrating a convolutional neural network trained for single-image depth estimation. During inference, the model fine-tunes itself to align with the geometric constraints of a specific input video, ensuring stable and realistic depth maps even in less-constrained regions. This approach achieves improved geometric consistency and visual stability compared to prior monocular reconstruction methods. The project can process challenging hand-held video footage, including those with moderate dynamic motion, making it practical for real-world usage.

Downloads: 0 This Week

Last Update: 2 days ago
See Project
21

YouTube-8M

Starter code for working with the YouTube-8M dataset

youtube-8m is Google’s open source starter code and reference implementation for training and evaluating machine learning models on the YouTube-8M dataset, one of the largest video understanding datasets publicly released. The repository provides a complete pipeline for video-level and frame-level modeling using TensorFlow, including data reading, model training, evaluation, and inference. It was developed to support the YouTube-8M Video Understanding Challenge (hosted on Kaggle and featured at ICCV 2019), enabling researchers and practitioners to benchmark video classification models on large-scale datasets with over millions of labeled videos. The code demonstrates how to process frame-level features, train logistic and deep learning models, evaluate them using metrics like global Average Precision (gAP) and mean Average Precision (mAP), and export trained models for MediaPipe inference.

Downloads: 0 This Week

Last Update: 2 days ago
See Project
22

PyTorch Natural Language Processing

Basic Utilities for PyTorch Natural Language Processing (NLP)

...With your batch in hand, you can use PyTorch to develop and train your model using gradient descent. For example, check out this example code for training on the Stanford Natural Language Inference (SNLI) Corpus. Now you've setup your pipeline, you may want to ensure that some functions run deterministically. Wrap any code that's random, with fork_rng and you'll be good to go. Now that you've computed your vocabulary, you may want to make use of pre-trained word vectors to set your embeddings.

Downloads: 2 This Week

Last Update: 2022-08-09
See Project
23

TenorSpace.js

Neural network 3D visualization framework

...TensorSpace is a neural network 3D visualization framework designed for not only showing the basic model structure but also presenting the processes of internal feature abstractions, intermediate data manipulations and final inference generations. By applying TensorSpace API, it is more intuitive to visualize and understand any pre-trained models built by TensorFlow, Keras, TensorFlow.js, etc.

Downloads: 1 This Week

Last Update: 2022-02-18
See Project
24

Impro-Visor

Leadsheet notation with auto-generated playback, improvisation advice

Impro-Visor® is a music notation tool for producing monophonic lead sheets, specifically intended to help the improviser. Chord symbols are used to generate backing tracks automatically. Improvisation advice exists in the form of note coloration, database of licks, and automatic lick generation from grammars. Grammars can be learned automatically from transcriptions. Styles can be edited and created by the user. Other features include generation of roadmaps for understanding keys and...

28 Reviews

Downloads: 184 This Week

Last Update: 2019-06-12
See Project
25

RNNLIB

RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition. full installation and usage instructions given at http://sourceforge.net/p/rnnl/wiki/Home/

2 Reviews

Downloads: 0 This Week

Last Update: 2016-11-28
See Project