recognition free download

WhisperJAV

Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

...The framework supports several speech recognition models, including Qwen-based ASR systems and fine-tuned Whisper models trained on domain-specific dialogue.

Downloads: 7 This Week

Last Update: 2026-05-11

See Project

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. ...

Downloads: 3 This Week

Last Update: 2026-03-22

See Project

Xorbits Inference

Replace OpenAI GPT with another LLM in your app

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. ...

Downloads: 2 This Week

Last Update: 2026-06-05

See Project

NVIDIA NeMo

Toolkit for conversational AI

NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. ...

Downloads: 1 This Week

Last Update: 2026-04-22

See Project

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

...It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.

Downloads: 0 This Week

Last Update: 2025-09-23

See Project

Flock

Flock is a workflow-based low-code platform for building chatbots

...The platform supports multi-agent collaboration, allowing developers to design workflows where different agents handle specialized tasks within the same system. Flock also includes features such as intent recognition, code execution nodes, and human-in-the-loop approval processes that make it suitable for production AI applications.

Downloads: 2 This Week

Last Update: 1 day ago

See Project

Streamer-Sales

LLM Large Model of Selling Anchor

...The system integrates multiple AI technologies including retrieval-augmented generation to incorporate product knowledge, speech synthesis to convert generated scripts into voice output, and digital human generation to create virtual hosts. It also supports automatic speech recognition and agent-based tools that can retrieve additional information such as logistics or product details during live sessions.

Downloads: 2 This Week

Last Update: 2026-03-05

See Project

NLP-Knowledge-Graph

Research and application of technologies such as nl processing

...The project aims to help researchers and developers understand how structured knowledge representations can enhance language processing systems. It includes curated materials covering key topics such as knowledge graph construction, entity recognition, relation extraction, graph embeddings, and semantic reasoning. By combining NLP techniques with graph-based data models, knowledge graphs allow systems to represent complex relationships between entities and improve tasks such as question answering, information retrieval, and recommendation systems. The repository aggregates research papers, technical articles, tutorials, and open-source tools related to these areas.

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

LLPlayer

The media player for language learning, with dual subtitles

LLPlayer is an open-source media player designed specifically for language learning through video content. Unlike traditional media players, the application focuses on advanced subtitle-related features that help learners understand and interact with foreign language media more effectively. The player supports dual subtitles so users can simultaneously view text in both the original language and their native language while watching videos. It can also automatically generate subtitles in real...

Downloads: 62 This Week

Last Update: 2026-04-19

See Project

Fun Audio Chat

Large Audio Language Model built for natural interactions

Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. The system supports dynamic audio input and output, meaning it can handle different voices, tones, and conversational contexts without forcing users into typed interactions. ...

Downloads: 0 This Week

Last Update: 2026-02-27

See Project

LocalAI

The free, Open Source alternative to OpenAI, Claude and others

LocalAI is an open-source platform that allows users to run large language models and other AI systems locally on their own hardware. It acts as a drop-in replacement for APIs such as OpenAI, enabling developers to build AI-powered applications without relying on external cloud services. The platform supports a wide range of model types, including text generation, image creation, speech processing, and embeddings. LocalAI can run on consumer-grade hardware and does not necessarily require a...

Downloads: 25 This Week

Last Update: 4 days ago

See Project

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

...It holds state-of-the-art performance in many multimodal benchmarks, particularly spoken language understanding, audio reasoning, image/video understanding, etc. Very strong benchmark performance across modalities (audio understanding, speech recognition, image/video reasoning) and often outperforming or matching single-modality models at a similar scale. Real-time streaming responses, including natural speech synthesis (text-to-speech) and chunked inputs for low latency interaction.

Downloads: 0 This Week

Last Update: 2025-09-23

See Project

Qwen3-Coder

Qwen3-Coder is the code version of Qwen3

An open-source ChatGPT app with a voice

...Users can review past chat sessions, modify system prompts, and adjust model parameters such as temperature to control response creativity. The platform also integrates speech capabilities by connecting to text-to-speech systems and speech recognition engines, enabling voice-based conversations with the AI assistant. Additional features include message editing, response regeneration, and the ability to share conversations through public links.

Downloads: 2 This Week

Last Update: 2026-03-06

See Project

Search Results for "recognition"

Showing 23 open source projects for "recognition"

WhisperJAV

LLM-Aided OCR Project

Xorbits Inference

NVIDIA NeMo

Qwen2-Audio

Flock

Streamer-Sales

NLP-Knowledge-Graph

LLPlayer

Fun Audio Chat

LocalAI

Qwen2.5-Omni

Qwen3-Coder

Qwen3-VL

spacy-llm

GLM-4-Voice

Aix-DB

RunAnywhere

ML Ferret

Qwen3-Omni

Qwen-VL

Autolabel

Chat with GPT

Search Results for "recognition"

Showing 23 open source projects for "recognition"

WhisperJAV

LLM-Aided OCR Project

Xorbits Inference

NVIDIA NeMo

Qwen2-Audio

Flock

Streamer-Sales

NLP-Knowledge-Graph

LLPlayer

Fun Audio Chat

LocalAI

Qwen2.5-Omni

Qwen3-Coder

Qwen3-VL

spacy-llm

GLM-4-Voice

Aix-DB

RunAnywhere

ML Ferret

Qwen3-Omni

Qwen-VL

Autolabel

Chat with GPT

Related Searches

Related Categories