Less Code, Lower Barrier, Faster Deployment
Curated list of datasets and tools for post-training
NLTK Source
Information hub for our project training the largest possible LLMs
A full spaCy pipeline and models for scientific/biomedical documents
Language modeling in a sentence representation space
Video-based AI memory library. Store millions of text chunks in MP4
Reading book source
The Classical Language Toolkit
Topic Modelling for Humans
A fast TTS architecture with conditional flow matching
The simplest, fastest repository for training/finetuning models
Your Fully-Automated Personal AI Assistant
Web application for Markdown note taking
A New Axis of Sparsity for Large Language Models
Code release for Cut and Learn for Unsupervised Object Detection
Indexing and query tools for very large text corpora
Traditional Mandarin LLMs for Taiwan
Chinese Llama-3 LLMs) developed from Meta Llama 3
Chinese XLNet pre-trained model
Omnilingual ASR Open-Source Multilingual SpeechRecognition
SOTA discrete acoustic codec models with 40/75 tokens per second
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
A subtitle generator for Japanese Adult Videos.
Aligns tokens in two versions of a text with differing tokenization.