Less Code, Lower Barrier, Faster Deployment
NLTK Source
Curated list of datasets and tools for post-training
Information hub for our project training the largest possible LLMs
Language modeling in a sentence representation space
A full spaCy pipeline and models for scientific/biomedical documents
Video-based AI memory library. Store millions of text chunks in MP4
The Classical Language Toolkit
Topic Modelling for Humans
Your Fully-Automated Personal AI Assistant
A fast TTS architecture with conditional flow matching
Web application for Markdown note taking
A New Axis of Sparsity for Large Language Models
Code release for Cut and Learn for Unsupervised Object Detection
Indexing and query tools for very large text corpora
Traditional Mandarin LLMs for Taiwan
Chinese XLNet pre-trained model
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Chinese Llama-3 LLMs) developed from Meta Llama 3
SOTA discrete acoustic codec models with 40/75 tokens per second
Quick guide (especially) for trending instruction finetuning dataset
Unofficial Parallel WaveGAN
Resources, corpora, and tools for Chinese natural language processing
All-in-one text de-duplication