Less Code, Lower Barrier, Faster Deployment
Curated list of datasets and tools for post-training
NLTK Source
Information hub for our project training the largest possible LLMs
A full spaCy pipeline and models for scientific/biomedical documents
Language modeling in a sentence representation space
Video-based AI memory library. Store millions of text chunks in MP4
Reading book source
The Classical Language Toolkit
Topic Modelling for Humans
A fast TTS architecture with conditional flow matching
The simplest, fastest repository for training/finetuning models
Your Fully-Automated Personal AI Assistant
Web application for Markdown note taking
A New Axis of Sparsity for Large Language Models
Code release for Cut and Learn for Unsupervised Object Detection
Traditional Mandarin LLMs for Taiwan
Chinese XLNet pre-trained model
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Chinese Llama-3 LLMs) developed from Meta Llama 3
Indexing and query tools for very large text corpora
SOTA discrete acoustic codec models with 40/75 tokens per second
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
A subtitle generator for Japanese Adult Videos.
Aligns tokens in two versions of a text with differing tokenization.