Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "speaker recognition"

x

Sort By:

Relevance

OS

Linux 16
Windows 16
Mac 12
More...
BSD 4
ChromeOS 3
Desktop Operating Systems 1
Mobile Operating Systems 1

Category

Artificial Intelligence 14
Multimedia 4
Scientific/Engineering 3
Software Development 2
System 2

License

OSI-Approved Open Source 13
Other License 1

Translations

Arabic 1
English 1
Spanish 1

Programming Language

Python 10
C++ 4
Java 3
C 1
More...
C# 1
Go 1
MATLAB 1
Perl 1

Status

Planning 1
Pre-Alpha 1
Beta 1

Showing 21 open source projects for "speaker recognition"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.

Downloads: 203 This Week

Last Update: 2 days ago
See Project
2

Vosk Speech Recognition Toolkit

Offline speech recognition API for Android, iOS, Raspberry Pi

Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, Polish. More to come. Vosk models are small (50 Mb) but provide continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification. ...

Downloads: 107 This Week

Last Update: 2024-04-22
See Project
3

WhisperX

Automatic Speech Recognition with Word-level Timestamps

WhisperX is an advanced speech recognition system built on top of OpenAI’s Whisper model, designed to improve transcription accuracy and timing precision for long-form audio. It addresses key limitations of standard Whisper implementations by introducing voice activity detection and forced alignment techniques to produce word-level timestamps. The system enables batched inference, significantly increasing transcription speed while maintaining high accuracy.

Downloads: 17 This Week

Last Update: 2026-04-06
See Project
4

The SpeechBrain Toolkit

A PyTorch-based Speech Toolkit

...It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers. Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, and contrastive learning. ...

Downloads: 0 This Week

Last Update: 2026-03-30
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Whisper-WebUI

A Web UI for easy subtitle using whisper model

...It supports multiple input sources including local files, YouTube content, and microphone input, making it versatile for different workflows. Whisper WebUI also includes advanced preprocessing and postprocessing features such as voice activity detection, background music separation, and speaker diarization, enabling more accurate and structured outputs.

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
6

Scriberr

Self-hosted AI audio transcription

...Unlike cloud-based transcription services, Scriberr runs entirely on the user’s machine, ensuring that sensitive recordings are never sent to third-party servers and remain fully under user control. It leverages modern speech recognition models such as Whisper and other advanced architectures to deliver precise transcripts with word-level timing and speaker identification. The application includes a polished user interface that simplifies the management of recordings, transcripts, and annotations, making it suitable for both casual users and professionals handling large volumes of audio. ...

Downloads: 2 This Week

Last Update: 2026-03-19
See Project
7

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the...

Downloads: 16 This Week

Last Update: 2026-04-14
See Project
8

Step-Audio 2

Multi-modal large language model designed for audio understanding

Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. It integrates a latent-space audio encoder, discrete acoustic tokens, and reinforcement-learning–based training (CoT + RL) to enhance its ability to capture and reproduce voice styles, intonations, and subtle vocal cues. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
9

ESPnet

End-to-end speech processing toolkit

ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while still benefiting from the robust data preparation practices developed in the speech community. ...

Downloads: 3 This Week

Last Update: 5 days ago
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

JSpeech

Java library designed to integrate Speech-to-Text

jSpeech is a Java library designed to integrate Speech-to-Text (STT) capabilities, command control, and diarization (speaker identification) into applications in a simple, modular, and decoupled way.

1 Review

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
11

wukong-robot

Chinese voice dialogue robot/smart speaker project

wukong-robot is a Chinese voice assistant / smart speaker project built to let makers and hackers design highly customizable voice-controlled devices. It combines wake-word detection, automatic speech recognition, natural language understanding, and text-to-speech into a single framework aimed at the Chinese-speaking ecosystem. The project is positioned as a simple, flexible, and elegant platform that can run on devices like Raspberry Pi and other Linux-based boards, making it suitable for DIY smart speakers and home-automation hubs. ...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
12

CMU Sphinx

Speech Recognition Toolkit

...----> Maintenance and improvement work has MOVED to https://cmusphinx.github.io/ Please go there for the most recent software and documentation. <---- CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.

58 Reviews

Downloads: 558 This Week

Last Update: 2024-01-11
See Project
13

Lip Reading

Cross Audio-Visual Recognition using 3D Architectures

...This code is aimed to provide the implementation for Coupled 3D Convolutional Neural Networks for audio-visual matching. Lip-reading can be a specific application for this work. Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. ...

Downloads: 1 This Week

Last Update: 2022-08-11
See Project
14

Distant Speech Recognition

Beamforming and Speech Recognition Toolkit

BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods. These toolkits are meant for facilitating research and development of automatic distant speech recognition.

Downloads: 0 This Week

Last Update: 2019-08-21
See Project
15

Student t based Speech Enhancement

These Matlab codes are the implementation of the TASLP paper, "Speech enhancement based on student t modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function". In this paper, we showed how the student t distribution can be used to model the perceptual wavelet packet coefficients and how a proper thresholding based on that perform speech enhancement in an efficient way. Teager energy has been used to amplify the difference between the noisy...

Downloads: 0 This Week

Last Update: 2016-01-22
See Project
16

Speaker Recognition System

Speaker Recognition System - Matlab source code

Speaker identity is correlated with the physiological and behavioral characteristics of the speaker. These characteristics exist both in the spectral envelope (vocal tract characteristics) and in the supra-segmental features (voice source characteristics and dynamic features spanning several segments). Index Terms: speaker, recognition, verification, sound, words.

Downloads: 0 This Week

Last Update: 2015-03-18
See Project
17

Speech Sentiment Analysis

Voice to Text Sentiment Analysis

Voice to text Sentiment analysis converts the audio signal to text to calculate appropriate sentiment polarity of the sentence. The code currently works on one sentence at a time. Sentiment scoring is done on the spot using a speaker. The Speech to text processing system currently being used is the MS Windows speech to text converter. However significant modifications can be made for audio recognition by a refined signal processing system. The sentiment operator in textblob is used for sentiment orientation scoring. The code has been developed in Python 2.7 The following packages are required to be installed before running the program. ...

1 Review

Downloads: 0 This Week

Last Update: 2014-06-03
See Project
18

avimmir

(audio, video, image) Multimedia Multimodal Information Retrieval

audio classification; speaker segmentation; speaker clustering; speaker recognition; spoken document retrieval; image retrieval; video retrieval; etc.

Downloads: 0 This Week

Last Update: 2013-11-23
See Project
19

SRM2 Sound Recognizer Mobile 2

A Sound Recognition Framework developed to J2ME plataform

The SRM Framework 2 was developed for the J2ME platform supporting sound recognition dependent and speaker-independent, through the recognition techniques DTW (Dynamic Time Warping) and HMM (Hidden Markov Models), in addition to providing support to the import and export of data any stage of recognition enabling the use of external resources as well as the tool running steps on remote terminals. It has the following characteristics: - DTW - HMM (Discrete Models) - Suports Sound in PCM - Vector quantization using the k-means algorithm - Uses the algorithm Forward - Uses the algorithm Backward - Uses the Viterbi algorithm - Uses the Baum-Welch algorithm for training - Has features import and export data Further information is on Gaita 2012.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
20

Speaker recognition with Matlab

Speaker recognition with Matlab

Downloads: 0 This Week

Last Update: 2017-05-16
See Project
21

Arabisc

Arabisc is speaker independent large vocabulary continuous speech recognizer for Arabic language released under GNU license.It is also a collection of open source tools that allows researchers and developers to build speech recognition systems for Arab

1 Review

Downloads: 1 This Week

Last Update: 2013-04-26
See Project

Previous
You're on page 1
Next

Related Searches

sherpa-onnx

vosk-win64-0.3.39

vosk

arabic text to speech

speaker recognition

vosk-modelsmall

conversational ai

translate

zh_broadcastnews_16k_ptm256_8000.tar.bz2

vosk-0.3.32.jar

Related Categories

Artificial Intelligence

Multimedia

Scientific/Engineering

Software Development

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise