Agent toolkit providing semantic retrieval and editing capabilities
Controllable & emotion-expressive zero-shot TTS
DeepMind model for tracking arbitrary points across videos & robotics
An open sourced end-to-end VLM-based GUI Agent
A Unified Framework for Image Customization
Python SDK for the Computer Use model Lux, developed by OpenAGI
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
LLM-based Reinforcement Learning audio edit model
Pushing the Limits of Mathematical Reasoning in Open Language Models
Tool-integrated Reasoning LLM Agents
Real-time behaviour synthesis with MuJoCo, using Predictive Control
AI-powered tool to quickly remove watermarks from videos flawlessly
Guiding Instruction-based Image Editing via Multimodal Large Language
Official Code for DragGAN (SIGGRAPH 2023)
Let us control diffusion models
Codes for "Chameleon: Plug-and-Play Compositional Reasoning
BCI: Breast Cancer Immunohistochemical Image Generation
Object detection architectures and models pretrained on the COCO data
Reading Wikipedia to Answer Open-Domain Questions
Stuttering Chinese word segmentation