Robust Speech Recognition via Large-Scale Weak Supervision
Contexts Optical Compression
Video understanding codebase from FAIR for reproducing video models
The no-nonsense RAG chunking library
StreamSpeech is a seamless model for offline speech recognition
Real-time voice interactive digital human
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Chinese XLNet pre-trained model
End-to-end speech processing toolkit
A fast, powerful, and simple hierarchical vision transformer
Code release for Cut and Learn for Unsupervised Object Detection
PyTorch code and models for VJEPA2 self-supervised learning from video
Framework for building neural networks
Refer and Ground Anything Anywhere at Any Granularity
Language modeling in a sentence representation space
Bailing is a voice dialogue robot similar to GPT-4o
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
fast C++ library for linear algebra & scientific computing
Resources, corpora, and tools for Chinese natural language processing
Code release for ConvNeXt V2 model
Code release for ConvNeXt model
The official pytorch implementation of our paper
Efficient 3D human pose estimation in video using 2D keypoint
VGGFace2 Dataset for Face Recognition
Resources about activity recognition