Robust Speech Recognition via Large-Scale Weak Supervision
Provides code for running inference with the SegmentAnything Model
A standalone, portable generic Ada package for decoding images
A Family of Open Foundation Models for Code Intelligence
Multimodal model achieving SOTA performance
Accurate × Fast × Comprehensive
Industrial-level controllable zero-shot text-to-speech system
AV1 Image File Format Specification - ISO-BMFF/HEIF derivative
An incredibly fast, pure Elixir JSON library
Boilerplate-free Kotlin config library for loading configuration files
TorchMultimodal is a PyTorch library
End-to-end speech processing toolkit
DeepSeek LLM: Let there be answers
Audio codecs extracted from Android Open Source Project
110+ developer tools as native MacOS, Linux & Windows desktop apps.
A Conversational Speech Generation Model
An Ada 2012 library for reading and writing PNG image files
A High Performance Library for Sequence Processing and Generation
Singing Voice Synthesis via Shallow Diffusion Mechanism
Code release for "Masked-attention Mask Transformer
PyTorch implementation of MAE
Facebook AI Research Sequence-to-Sequence Toolkit