tiktoken is a fast BPE tokeniser for use with OpenAI's models
SpikingJelly is an open-source deep learning framework
Less Code, Lower Barrier, Faster Deployment
This repository contains the official implementation of FastVLM
Official repository for LTX-Video
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Reverse engineering Gemini's SynthID detection
ComfyUI wrapper nodes for HunyuanVideo
Qwen2.5-VL is the multimodal large language model series
Unified Multimodal Understanding and Generation Models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
A python tool that uses GPT-4, FFmpeg, and OpenCV
State-of-the-art (SoTA) text-to-video pre-trained model
Implementation of Vision Transformer, a simple way to achieve SOTA
Hackable and optimized Transformers building blocks
Unifying 3D Mesh Generation with Language Models
SOTA discrete acoustic codec models with 40/75 tokens per second
Multi-modal large language model designed for audio understanding
Large-language-model & vision-language-model based on Linear Attention
Chinese Llama-3 LLMs) developed from Meta Llama 3
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm
AI agent that streamlines the entire process of data analysis
Code for the paper Language Models are Unsupervised Multitask Learners
VITS2 backbone with multilingual-bert
Framework that is dedicated to making neural data processing