A library for accelerating Transformer models on NVIDIA GPUs
Library for OCR-related tasks powered by Deep Learning
High-level training, data augmentation, and utilities for Pytorch
On-device Speech-to-Intent engine powered by deep learning
Benchmarking synthetic data generation methods
Generate blog articles from video or audio
Controllable and fast Text-to-Speech for over 7000 languages
AI discovers 520000 stable inorganic crystal structures for research
Tooling for the Common Objects In 3D dataset
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
Making RAG Simpler with Small and Open-Sourced Language Models
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Ultimate meta-skill for generating best-in-class Claude Code skills
Multi-agent autonomous startup system for Claude Code
A New Axis of Sparsity for Large Language Models
"Big Model" trains a visual multimodal VLM with 26M parameters
AI Agent Networks for Open Collaboration
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster
LLM based autonomous agent that does online comprehensive research
Superfast AI decision making and processing of multi-modal data
GLM-4 series: Open Multilingual Multimodal Chat LMs
Open-weight, large-scale hybrid-attention reasoning model
Real-time voice interactive digital human
Open Source Differentiable Computer Vision Library