End-to-end pipeline converting generative videos
Implementation of "MobileCLIP" CVPR 2024
CLIP, Predict the most relevant text snippet given an image
From Addition, Subtraction, Multiplication, and Division to ML
Bridging LLM and Recommender System
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
CoreNet: A library for training deep neural networks
The ChatGPT Retrieval Plugin lets you easily find personal documents
Simplest working implementation of Stylegan2
Performance meets Productivity
Cosmos-RL is a flexible and scalable Reinforcement Learning framework
Pretrained time-series foundation model developed by Google Research
PyTorch code and models for the DINOv2 self-supervised learning
Tools for publishing transcripts for Claude Code sessions
Big Model Application Development Practice 1
An Open-source Framework for Data-centric Language Agents
Build multimodal language agents for fast prototype and production
OCR expert VLM powered by Hunyuan's native multimodal architecture
Deep learning optimization library making distributed training easy
Autonomous GPT-4 agent platform
The open source post-building layer for agents
Parallax is a distributed model serving framework
MetricFlow allows you to define, build, and maintain metrics in code
Build cross-modal and multimodal applications on the cloud
A Personalized LLM-powered Agent Frameworks