"Big Model" trains a visual multimodal VLM with 26M parameters
PaddlePaddle End-to-End Development Toolkit
Open source feature flagging and remote config service
Cross-platform API testing client for humans
Unifying 3D Mesh Generation with Language Models
Capstone disassembly/disassembler framework
Open multimodal web agent built by Ai2
Learning agent trained in a diffusion world model
No-code LLM Platform to launch APIs and ETL Pipelines
Fast, powerful, git-native ticket tracking in a single bash script
Inference script for Oasis 500M
TorchMultimodal is a PyTorch library
ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
A command-line utility for taking automated screenshots of websites
OCR expert VLM powered by Hunyuan's native multimodal architecture
Gracefully face hCaptcha challenge with multimodal llms
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Flexible Photo Recrafting While Preserving Your Identity
Open-source framework for conversational voice AI agents
Lightning fast C++/CUDA neural network framework
Dockerized Nerd Fonts patcher
Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc
Towards Real-World Vision-Language Understanding
Large-language-model & vision-language-model based on Linear Attention