Build cross-modal and multimodal applications on the cloud
Data Lake for Deep Learning. Build, manage, and query datasets
Open-source MCP server that gives your coding agent
OCR expert VLM powered by Hunyuan's native multimodal architecture
Chatbot daemon that connects to your favorite chat services
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.
Basaran, an open-source alternative to the OpenAI text completion API
Based on the Disco Diffusion, version of the AI art creation software
IPTV/NVR/CCTV/Video cloud https://fastocloud.com
Easy-OCR solution and Tesseract trainer for GNU/Linux
Cross Audio-Visual Recognition using 3D Architectures