Showing 176 open source projects for "voice command java"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    ...A Docker image is provided for one-command deployment, and environment variables can be used to configure default voice, language, response format, authentication, and logging options.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 150 This Week
    Last Update:
    See Project
  • 3
    Harbor LLM

    Harbor LLM

    Run a full local LLM stack with one command using Docker

    Harbor is an open source, containerized toolkit designed to simplify running local large language model (LLM) environments. It combines a CLI and companion app to launch backends, frontends, and supporting services with minimal setup. With a single command, users can start preconfigured tools like Ollama and Open WebUI, enabling chat, workflows, and integrations immediately. Harbor supports multiple inference engines, including llama.cpp and vLLM, and connects them seamlessly to user interfaces. It also includes tools for web retrieval, image generation, voice interaction, and workflow automation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    Audiblez is a tool for generating high-quality .m4b audiobooks directly from .epub e-books using the Kokoro-82M neural text-to-speech model. It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    SafeClaw

    SafeClaw

    Chat with it via text and voice

    SafeClaw is an open-source, entirely local alternative to cloud-based AI assistants like OpenClaw, enabling users to build a personal assistant that runs on their own machine without incurring API usage charges or exposing data to third-party services. It emphasizes privacy and predictability by using traditional programming, rule-based intent parsing, and established machine learning tools rather than large language models, meaning there are no per-token API costs and deterministic...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 7
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    annyang!

    annyang!

    Speech recognition for your site

    annyang is a tiny javascript library that lets your visitors control your site with voice commands. annyang supports multiple languages, has no dependencies, weighs just 2kb and is free to use. annyang understands commands with named variables, splats, and optional words. Use named variables for one word arguments in your command. Use splats to capture multi-word text at the end of your command (greedy). Use optional words or phrases to define a part of the command as optional. annyang plays nicely with all browsers, progressively enhancing browsers that support SpeechRecognition, while leaving users with older browsers unaffected. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    Whisper-WebUI

    Whisper-WebUI

    A Web UI for easy subtitle using whisper model

    Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription speed and reducing memory usage compared to standard models. It supports multiple input sources including local files, YouTube content, and microphone input, making it versatile for different workflows. Whisper WebUI also includes advanced preprocessing and postprocessing features such as voice activity detection, background music separation, and speaker diarization, enabling more accurate and structured outputs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Stanford CoreNLP

    Stanford CoreNLP

    Stanford CoreNLP, a Java suite of core NLP tools

    CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Eris

    Eris

    A NodeJS Discord library

    A Node.js wrapper for interfacing with Discord. You will need NodeJS 10.4+. If you need voice support you will also need Python 2.7 and a C++ compiler. Create a directory for your bot, and change to that directory in your command line. If you want to be more updated (at the expense of stability), you can install the beta builds instead. Eris supports a few optional libraries that could potentially improve bot performance but may require additional dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Moltis

    Moltis

    A Rust-native claw you can trust

    Moltis is an open-source personal AI assistant platform written in Rust that is designed to run as a fully self-hosted, local-first agent environment. It compiles the entire assistant stack, including the web interface, model routing, memory, and tools, into a single self-contained binary with no external runtime dependencies. The system supports multiple large language model providers alongside local models, enabling users to maintain privacy while still accessing cloud capabilities when...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Open Interpreter

    Open Interpreter

    A natural language interface for computers

    Open Interpreter is an open-source tool that provides a natural-language interface for interacting with your computer. It lets large language models (LLMs) run code locally (Python, JavaScript, shell, etc.), enabling you to ask your computer to do tasks like data analysis, file manipulation, browsing, etc. in human terms (“chat with your computer”), with safeguards. Runs locally or via configured remote LLM servers/inference backends, giving flexibility to use models you trust or have...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    JavaCV

    JavaCV

    Java interface to OpenCV, FFmpeg, and more

    JavaCV uses wrappers from the JavaCPP Presets of commonly used libraries by researchers in the field of computer vision (OpenCV, FFmpeg, libdc1394, FlyCapture, Spinnaker, OpenKinect, librealsense, CL PS3 Eye Driver, videoInput, ARToolKitPlus, flandmark, Leptonica, and Tesseract) and provides utility classes to make their functionality easier to use on the Java platform, including Android. JavaCV also comes with hardware accelerated full-screen image display (CanvasFrame and GLCanvasFrame),...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 16
    Telegram SMS

    Telegram SMS

    An SMS-forwarding Robot Running on Your Android Device

    With the power of Telegram SMS, your multi-phone life is much easier than before. Receiving and sending SMS, relaying APP notifications, monitoring battery status. All stuff can be done with a single Telegram bot. You can use the bot in both private chat and group chat, in case you have more than 2 Android phones, or sharing the bot with other people. Telegram SMS connects with Telegram's bot API server directly, no 3rd-party services involved.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    Qodo Cover

    Qodo Cover

    AI tool that generates tests to improve code coverage quickly

    Qodo Cover is an open source developer tool designed to automate the creation of unit tests using generative AI, helping teams improve code coverage with minimal manual effort. It operates as a command-line interface and can also be integrated into continuous integration workflows, making it adaptable to different development environments. It analyzes an existing codebase, identifies gaps in test coverage, and generates new tests that target uncovered or weakly tested areas. It follows an...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Weka

    Weka

    Machine learning software to solve data mining problems

    Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
    Leader badge
    Downloads: 10,142 This Week
    Last Update:
    See Project
  • 19
    elevenlabs-api

    elevenlabs-api

    elevenlabs-api is an open source Java wrapper around the ElevenLabs

    Elevenlabs-api is an open-source Java wrapper around the ElevenLabs Voice Synthesis and Cloning Web API. Compiled JARs are available via the Releases tab. To access your ElevenLabs API key, head to the official website, you can view your xi-API-key using the 'Profile' tab on the website. To set up your ElevenLabs API key, you must register it with the ElevenLabsAPI Java API.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    KoboldCpp

    KoboldCpp

    Run GGUF models easily with a UI or API. One File. Zero Install.

    KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
    Leader badge
    Downloads: 409 This Week
    Last Update:
    See Project
  • 21
    MOA - Massive Online Analysis

    MOA - Massive Online Analysis

    Big Data Stream Analytics Framework.

    A framework for learning from a continuous supply of examples, a data stream. Includes classification, regression, clustering, outlier detection and recommender systems. Related to the WEKA project, also written in Java, while scaling to adaptive large scale machine learning.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 22
    chessPDFBrowser

    chessPDFBrowser

    Chess application whichs allows working with chess PDF books and PGNs.

    Chess application which allows working with PDFs and PGNs. You can work with the chess games of the PDF and edit their tree of variants. Graphical environment. Standard PGN TAGs. PGN comments. Ocr like (Fen string detection from chess board position images). Connection to Uci chess engines (like stockfish). Position analysis, full game analysis. You can now play games against uci engines. pdf2pgn command line command included. Detailed documentation. Multilanguage...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 23
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Audio Webui

    Audio Webui

    A webui for different audio related Neural Networks

    ...For more advanced users, it exposes a rich set of command-line flags to control behavior such as skipping installation, disabling venv, changing model cache directories, sharing Gradio links, setting passwords, and specifying themes or ports.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Jason is a fully-fledged interpreter for an extended version of AgentSpeak, a BDI agent-oriented logic programming language, and is implemented in Java. Using JADE a multi-agent system can be distributed over a network effortlessly. This project was moved to https://jason-lang.github.io
    Downloads: 28 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB