StreamSpeech is a seamless model for offline speech recognition
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen3-omni is a natively end-to-end, omni-modal LLM
Intelligent companion for seamless AI engineering and research
Open source NLP guide with models, methods, and real use cases
This repository contains the official implementation of FastVLM
A Binary Ninja plugin, MCP server
Machine Learning automation and tracking
A robust, efficient, low-latency speech-to-text library
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Request recommended movies, TV shows and anime to Jellyseer/Overseer
Converts text to speech in realtime
Toloka-Kit is a Python library for working with Toloka API
Time-lapse Video Generation Models as Metamorphic Simulators
OpenSpace: Make Your Agents: Smarter, Low-Cost, Self-Evolving
Anomaly detection related books, papers, videos, and toolboxes
Tokenizer-Free TTS for Multilingual Speech Generation
From-scratch PyTorch implementation of Google's TurboQuant
Agent Zero AI framework
Large Multimodal Models for Video Understanding and Editing
Open source platform for the machine learning lifecycle
Openai style api for open large language models
Large Language Model Text Generation Inference
🐈 nanobot: The Ultra-Lightweight Clawdbot / OpenClaw
Hunyuan Translation Model Version 1.5