GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Image processing in Python
Python SDK for the Computer Use model Lux, developed by OpenAGI
Framework for building neural networks
Refer and Ground Anything Anywhere at Any Granularity
Official DeiT repository
Open deep learning compiler stack for cpu, gpu, etc.
An open sourced end-to-end VLM-based GUI Agent
Implementation of the Surya Foundation Model for Heliophysics
Scalable generative AI framework built for researchers and developers
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Large Multimodal Models for Video Understanding and Editing
Capable of understanding text, audio, vision, video
MMEditing is a low-level vision toolbox based on PyTorch
A computer vision framework to create and deploy apps in minutes
human detection using yolov8
OpenFieldAI is an AI based Open Field Test Rodent Tracker
CoTracker is a model for tracking any point (pixel) on a video
AI-powered tool to quickly remove watermarks from videos flawlessly
Database system for building simpler and faster AI-powered application
FAIR's research platform for object detection research
An open-source framework for training large multimodal models
This repository contains the official implementation of research
Implementation of Nougat Neural Optical Understanding