CLI tool to extract (meta)data from PDF and manipulate PDF files
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
A Systematic Framework for Interactive World Modeling
Unified Multimodal Understanding and Generation Models
Multilingual sentence & image embeddings with BERT
Free, high-quality text-to-speech API endpoint to replace OpenAI
Official implementation of DreamCraft3D
Phi-3.5 for Mac: Locally-run Vision and Language Models
The data structure for multimodal data
Large-language-model & vision-language-model based on Linear Attention
Open-Sora: Democratizing Efficient Video Production for All
Generate Any 3D Scene in Seconds
Open source personal AI Assistant for Linux, Windows and Mac
Open source libraries and APIs to build custom preprocessing pipelines
Extract one time password (OTP) secrets from QR codes
Windrecorder is a memory search app by records everything
Sample code and notebooks for Generative AI on Google Cloud
Pretrained model hub for Keras 3
Powerful open source team chat application
Framework for building neural networks
GenAI Processors is a lightweight Python library
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
tensorboard for pytorch (and chainer, mxnet, numpy, etc.)
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning