Parse files for optimal RAG
Enhances Tesseract OCR output using LLMs (local or API)
Code and models for ICML 2024 paper, NExT-GPT
Open-source multi-speaker long-form text-to-speech model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
An Open Source implementation of Notebook LM with more flexibility
Qwen2.5-VL is the multimodal large language model series
Fast stable diffusion on CPU and AI PC
StarVector is a foundation model for SVG generation
Accurate × Fast × Comprehensive
Evaluate and monitor ML models from validation to production
Open-Sora: Democratizing Efficient Video Production for All
Python framework for adversarial attacks, and data augmentation
Real-time voice interactive digital human
Open source machine learning framework to automate text conversations
Capable of understanding text, audio, vision, video
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
21 Lessons, Get Started Building with Generative AI
LLM abstractions that aren't obstructions
Simple, Pythonic building blocks to evaluate LLM applications
Open Source Document Management System for Digital Archives
Scalable generative AI framework built for researchers and developers
Long-form streaming TTS system for multi-speaker dialogue generation
Marrying Grounding DINO with Segment Anything & Stable Diffusion
A Systematic Framework for Interactive World Modeling