Less Code, Lower Barrier, Faster Deployment
NLTK Source
Curated list of datasets and tools for post-training
Information hub for our project training the largest possible LLMs
A full spaCy pipeline and models for scientific/biomedical documents
Language modeling in a sentence representation space
Video-based AI memory library. Store millions of text chunks in MP4
The Classical Language Toolkit
Topic Modelling for Humans
The simplest, fastest repository for training/finetuning models
Your Fully-Automated Personal AI Assistant
A fast TTS architecture with conditional flow matching
A New Axis of Sparsity for Large Language Models
Code release for Cut and Learn for Unsupervised Object Detection
Traditional Mandarin LLMs for Taiwan
Chinese XLNet pre-trained model
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Chinese Llama-3 LLMs) developed from Meta Llama 3
SOTA discrete acoustic codec models with 40/75 tokens per second
A subtitle generator for Japanese Adult Videos.
Quick guide (especially) for trending instruction finetuning dataset
Unofficial Parallel WaveGAN
Resources, corpora, and tools for Chinese natural language processing
Code release for "Detecting Twenty-thousand Classes