Vision AI browser agent for automation, testing, and extraction
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
High-Fidelity and Controllable Generation of Textured 3D Assets
A Protocol for Agent-Driven Interfaces
Genome modeling and design across all domains of life
A collection of open-source skills for AI coding agents
Run Coding Agents in Sandboxes
Python tool for browser-based interactive data apps in one file
Decomposable Multiscale Mixing for Time Series Forecasting
Faster and easier training and deployments
Running large language models on a single GPU
MemoryOS is designed to provide a memory operating system
Neural Network architecture based on ideas of the original LSTM
Leaderboard Comparing LLM Performance at Producing Hallucinations
4M: Massively Multimodal Masked Modeling
JAX-based neural network library
Flexible Photo Recrafting While Preserving Your Identity
A personal context-agent that learns how you work
Controllable & emotion-expressive zero-shot TTS
This repo contains the code for 1D tokenizer and generator
A SOTA open-source image editing model
Build cross-modal and multimodal applications on the cloud
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Powering Amazon custom machine learning chips
Framework for building AI agents that automate complex web tasks