Showing 47 open source projects for "recognition"

View related business solutions
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. ...
    Downloads: 387 This Week
    Last Update:
    See Project
  • 2
    Kaldi

    Kaldi

    kaldi-asr/kaldi is the official location of the Kaldi project

    Kaldi is an open source toolkit for speech recognition research. It provides a powerful framework for building state-of-the-art automatic speech recognition (ASR) systems, with support for deep neural networks, Gaussian mixture models, hidden Markov models, and other advanced techniques. The toolkit is widely used in both academia and industry due to its flexibility, extensibility, and strong community support.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Seamless Communication

    Seamless Communication

    Foundational Models for State-of-the-Art Speech and Text Translation

    ...The system architecture includes a real-time multimodal signal pipeline for audio, video, and sensor data, a dialog manager that can decide when to act (speak, gesture, point) or query, and a cross-modal reasoning layer that fuses perception with semantic context. The research prototype includes components for visual grounding (understanding when a user references something in view), gesture recognition and synthesis, and turn-taking mechanisms that mirror human conversational timing. Because latency and synchronization are critical, the codebase invests in asynchronous scheduling, overlap of perception and reasoning, and fast fallback responses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    TEN Framework

    TEN Framework

    TEN, a voice agent framework to create conversational AI.

    TEN (Transformative Extensions Network) is a voice agent framework for creating conversational AI applications, focusing on high performance and modularity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    audioFlux

    audioFlux

    A library for audio and music analysis, feature extraction

    A library for audio and music analysis, and feature extraction. Can be used for deep learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training and is used to study various tasks in the audio field such as Classification, Separation, Music Information Retrieval(MIR) ASR, etc.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    AI File Sorter

    AI File Sorter

    Local AI file organization with categorization and rename suggestions

    AI File Sorter is a cross-platform desktop application that uses AI (local LLMs run on your computer) to organize files and suggest meaningful file names based on real content, not just filenames or extensions. The app can analyze images locally and propose descriptive rename suggestions (for example, IMG_2048.jpg → clouds_over_lake.jpg). It can also analyze document text to improve categorization and renaming. Supported formats include PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP, and common...
    Leader badge
    Downloads: 246 This Week
    Last Update:
    See Project
  • 7
    dibnn

    dibnn

    Drop In the Bucket Neural Networks

    One more lightweight neural network in C.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Mozilla JPEG Encoder Project

    Mozilla JPEG Encoder Project

    Improved JPEG encoder

    MozJPEG improves JPEG compression efficiency achieving higher visual quality and smaller file sizes at the same time. It is compatible with the JPEG standard, and the vast majority of the world's deployed JPEG decoders. MozJPEG is compatible with the libjpeg API and ABI. It is intended to be a drop-in replacement for libjpeg. MozJPEG is a strict superset of libjpeg-turbo's functionality. All MozJPEG's improvements can be disabled at run time, and in that case it behaves exactly like...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    Physics Simulation Software based on user sketchs running a pattern recognition agent, this app is able to animate a physics sketch, from a blackboard
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10

    cuneiformplus

    Fork of OCR software cuneiform

    Fork of OCR software cuneiform Original software see: https://launchpad.net/cuneiform-linux by Cognitive Technologies and Jussi Pakkanen Other Open Source OCR stuff see * Tesseract by Ray Smith (using the Leptonica image library) * GOCR * OCRAD
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Speech Recognition in English & Polish

    Speech Recognition in English & Polish

    Speech recognition software for English & Polish languages

    Software for speech recognition in English & Polish languages. Basic versions of SkryBot: 1. SkryBot Home Speech (English Language) - https://sourceforge.net/projects/skrybotdomowy/files/ReleasesEnglish/InstalatorSkryBotHomeSpeechDemo-2.6.9.18117.exe/download 2. SkryBot DoMowy (Polish Language) - https://sourceforge.net/projects/skrybotdomowy/files/ReleasesPolish/InstalatorSkryBotDoMowyDemo-2.4.9.18117.exe/download More help: https://sourceforge.net/p/skrybotdomowy/wiki/ Domain advanced versions (Polish Language) 1. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MultiPathNet

    MultiPathNet

    A Torch implementation of the object detection network

    MultiPathNet is a Torch-7 implementation of the “A MultiPath Network for Object Detection” paper (BMVC 2016), developed by Facebook AI Research. It extends the Fast R-CNN framework by introducing multiple network “paths” to enhance feature extraction and object recognition robustness. The MultiPath architecture incorporates skip connections and multi-scale processing to capture both fine-grained details and high-level context within a single detection pipeline. This results in improved detection accuracy across various object sizes and categories compared to standard single-path architectures. The repository supports training, evaluation, and visualization for object detection tasks on popular datasets such as PASCAL VOC and MS COCO. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    OpenPR
    OpenPR stands for Open Pattern Recognition project and is intended to be an open source library for algorithms of image processing, computer vision, natural language processing, pattern recognition, machine learning and the related fields.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Animal is AN IMAging Library written in C. Its simple API supports over 80 image formats, and is intended to make massive use of other image processing libraries. Animal aims at image analysis and recognition. It is mainly the C basis of the SIP toolbox.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    piffle

    Speech recognition for Ubuntu

    Speech recognition system for Ubuntu which takes Palaver as codebase and integrates Pocketsphinx instead of Google speech-api. Codebase is the minimal version of Palaver.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    The AK toolkit is another kit for building and use Hidden Markov Models (HMMs). Originally developed for handwritten text recognition (HTR) using Bernoulli HMMs, it also implements diagonal Gaussians and can be used for any other purpose.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OpenCV Facerecog
    Face Detection and Recognition using Intel's OpenCV library
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    This software records and replays user interaction with the computer. It can be interfaced through voice commands.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    World Voice Recognition est un programme OpenSource de reconnaissance vocal dont le but est de faire la liaison entre plusieurs modules crée par n'importe quelle developpeur ( Module du microphone, module de reconnaissance vocal, module pour faire parler l'ordinateur, ou des plugins : par exemple la météo ). La SDK est compatible avec n'importe quelle language de programmation (ASM,C++,Ada,Java...) sur toutes les platformes (Windows, Mac et Linux).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    An omnifont OCR engine. The long-term goal is recognition of formulas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Vedvarsha is an application for 2 purposes: 1. Handwariting script recognition that extracts recognized letters into documents. 2. OCR (Optical Character Recogniton) that works only for non-cursive and isolated characters. It depends upon libsyntactic,
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Real time face tracking and recognition refers to the task of locating human faces in a video stream and identifying the faces by matching them against the database of known faces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Arabisc is speaker independent large vocabulary continuous speech recognizer for Arabic language released under GNU license.It is also a collection of open source tools that allows researchers and developers to build speech recognition systems for Arab
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    1.) Investigation with cosine transform, and anti transform algorithm, with some voice recognition code. 2.) Translator: Croatian, English. 3.) 2D to 3D picture algorithm (principle) and new 2Dto3D video conversion code with AviSynth video scripting
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    OcrGui
    A GUI for OCR programs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next