YouTokenToMe is a fast and efficient unsupervised text tokenization library designed for training subword embeddings, particularly useful for NLP models.
Features
- Implements Byte Pair Encoding (BPE) and Unigram language models
- Optimized for processing large text corpora
- Provides a lightweight and fast tokenization pipeline
- Supports vocabulary pruning and model compression
- Works with Unicode and multilingual text inputs
Categories
Natural Language Processing (NLP)License
MIT LicenseFollow YouTokenToMe
Other Useful Business Software
Gemini 3 and 200+ AI Models on One Platform
Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of YouTokenToMe!