Advancing Open-source World Models
Effortless data labeling with AI support from Segment Anything
A lightweight vision library for performing large object detection
Multimodal embedding and reranking models built on Qwen3-VL
Spring AI Alibaba examples for building and testing AI apps
Document Image Parsing via Heterogeneous Anchor Prompting”
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Project Lyra: Open Generative 3D World Models
An extensive node suite that enables ComfyUI to process 3D inputs
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
PyTorch code and models for V-JEPA self-supervised learning from video
Data Lake for Deep Learning. Build, manage, and query datasets
A Customizable Image-to-Video Model based on HunyuanVideo
Build cross-modal and multimodal applications on the cloud
Overcoming Data Limitations for High-Quality Video Diffusion Models
AI Suite for upscaling, interpolating & restoring images/videos
A fast, powerful, and simple hierarchical vision transformer
It's possible for machines to become self-aware.
computer vision projects | Fun AI projects related to computer vision
CLIP + FFT/DWT/RGB = text to image/video
YoloV3 Implemented in Tensorflow 2.0
Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.
One-click face swap
A walk along memory lane
A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator