Less Code, Lower Barrier, Faster Deployment
NLTK Source
Curated list of datasets and tools for post-training
Information hub for our project training the largest possible LLMs
A full spaCy pipeline and models for scientific/biomedical documents
Language modeling in a sentence representation space
Video-based AI memory library. Store millions of text chunks in MP4
The Classical Language Toolkit
Topic Modelling for Humans
Your Fully-Automated Personal AI Assistant
A fast TTS architecture with conditional flow matching
A New Axis of Sparsity for Large Language Models
Code release for Cut and Learn for Unsupervised Object Detection
Traditional Mandarin LLMs for Taiwan
Chinese XLNet pre-trained model
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Chinese Llama-3 LLMs) developed from Meta Llama 3
SOTA discrete acoustic codec models with 40/75 tokens per second
Quick guide (especially) for trending instruction finetuning dataset
Unofficial Parallel WaveGAN
Resources, corpora, and tools for Chinese natural language processing
Code release for "Detecting Twenty-thousand Classes
Editing large language models within 10 seconds
Unicode XML TEI text analysis platform
Repo for external large-scale work