Less Code, Lower Barrier, Faster Deployment
Curated list of datasets and tools for post-training
NLTK Source
Information hub for our project training the largest possible LLMs
A full spaCy pipeline and models for scientific/biomedical documents
Language modeling in a sentence representation space
Video-based AI memory library. Store millions of text chunks in MP4
The Classical Language Toolkit
Topic Modelling for Humans
A fast TTS architecture with conditional flow matching
Your Fully-Automated Personal AI Assistant
Web application for Markdown note taking
A New Axis of Sparsity for Large Language Models
Code release for Cut and Learn for Unsupervised Object Detection
Traditional Mandarin LLMs for Taiwan
Chinese XLNet pre-trained model
Indexing and query tools for very large text corpora
Chinese Llama-3 LLMs) developed from Meta Llama 3
SOTA discrete acoustic codec models with 40/75 tokens per second
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Aligns tokens in two versions of a text with differing tokenization.
Quick guide (especially) for trending instruction finetuning dataset
Unofficial Parallel WaveGAN
Resources, corpora, and tools for Chinese natural language processing
All-in-one text de-duplication