Master the fundamentals of machine learning, deep learning
Open-source evaluation toolkit of large multi-modality models (LMMs)
VMZ: Model Zoo for Video Modeling
Gemma open-weight LLM library, from Google DeepMind
A new kind of Progress Bar, with real-time throughput, ETA
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Generate audiobooks from e-books
Benchmarking Multimodal Agents for Open-Ended Tasks
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Browse the web, directly from Cursor etc.
PDF to Markdown with vision models
Pixel-Aligned 3D Generation from Images
Phi-3.5 for Mac: Locally-run Vision and Language Models
A Pioneering Open-Source Alternative to GPT-4o
Label Studio is a multi-type data labeling and annotation tool
Extension of Google Research’s PaperBanana
Detects phishing and lookalike domains using DNS fuzzing techniques
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Open-source and free to self-host
The book "Performance Analysis and Tuning on Modern CPU"
3D plotting and mesh analysis through a streamlined interface
A frontier, first-principles handbook
Taming Stable Diffusion for Lip Sync
Chinese and English multimodal conversational language model
Modular quant framework