Automatically translates the text of a video based on a subtitle file
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Offline Text To Speech synthesis for python
PyTorch3D is FAIR's library of reusable components for deep learning
Aider is AI pair programming in your terminal
Simplifies the local serving of AI models from any source
CodeGeeX4-ALL-9B, a versatile model for all AI software development
Label Studio is a multi-type data labeling and annotation tool
Generate audiobooks from e-books
A nearly-live implementation of OpenAI's Whisper
Code for running inference with the SAM 3D Body Model 3DB
Code release for Cut and Learn for Unsupervised Object Detection
ContextGem: Effortless LLM extraction from documents
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Agent S: an open agentic framework that uses computers like a human
Release for Improved Denoising Diffusion Probabilistic Models
"Big Model" trains a visual multimodal VLM with 26M parameters
Collection of Gemma 3 variants that are trained for performance
Tool for exploring and debugging transformer model behaviors
Automate browser-based workflows with LLMs and Computer Vision
Evaluation suite designed to assess the performance of LLMs
TextWorld is a sandbox learning environment for the training
Document Image Parsing via Heterogeneous Anchor Prompting”
Enable AI to control your desktop, mobile and HMI devices
MapAnything: Universal Feed-Forward Metric 3D Reconstruction