A small library for converting tokenized PHP source code into XML
Unsupervised text tokenizer for Neural Network-based text generation
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Tokenizer-Free TTS for Multilingual Speech Generation
This repo contains the code for 1D tokenizer and generator
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Python library and CLI tool to interface with Google Translate
PostgreSQL extension for full-text search of Chinese language
Long-form streaming TTS system for multi-speaker dialogue generation
The best ChatGPT that $100 can buy
Pre-trained Neural Network models in Axon
LLM-based Reinforcement Learning audio edit model
A Foundation Model for the Language of Financial Markets
The official PyTorch implementation of Google's Gemma models
MOSS‑TTS Family open‑source speech and sound generation model
Qwen3-Coder is the code version of Qwen3
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Large Language Model Principles and Practice Tutorial from Scratch
Unified Multimodal Understanding and Generation Models
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm
Audiocraft is a library for audio processing and generation
Data loaders and abstractions for text and NLP
A plugin that integrates Lucene IK analyzer into elasticsearch
The regex-centric, fast lexical analyzer generator for C++