Automate mouse clicks and keyboard input
Simple crossplatform IDE for NASM, MASM, GAS and FASM languages
GLM-4-Voice | End-to-End Chinese-English Conversational Model
GPT4V-level open-source multi-modal model based on Llama3-8B
State-of-the-art (SoTA) text-to-video pre-trained model
Chat & pretrained large vision language model
An open sourced end-to-end VLM-based GUI Agent
Spark-TTS Inference Code
Qwen2.5-VL is the multimodal large language model series
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
The first Chinese LLaMA2 model in the open source community
Text-to-Image generation. The repo for NeurIPS 2021 paper
CPT: A Pre-Trained Unbalanced Transformer
Based on the Disco Diffusion, version of the AI art creation software