Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms
Repo of Qwen2-Audio chat & pretrained large audio language model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
OCR expert VLM powered by Hunyuan's native multimodal architecture
Controllable & emotion-expressive zero-shot TTS
Global weather forecasting model using graph neural networks and JAX
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Capable of understanding text, audio, vision, video
Open-source large language model family from Tencent Hunyuan
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Audio foundation model excelling in audio understanding
A state-of-the-art open visual language model
ChatGPT interface with better UI
Stable Diffusion with Core ML on Apple Silicon
Towards Real-World Vision-Language Understanding
Pushing the Limits of Mathematical Reasoning in Open Language Models
The ChatGPT Retrieval Plugin lets you easily find personal documents
High-Resolution Image Synthesis with Latent Diffusion Models
Chat & pretrained large vision language model
Open-source, high-performance Mixture-of-Experts large language model