YouTokenToMe is a fast and efficient unsupervised text tokenization library designed for training subword embeddings, particularly useful for NLP models.
Features
- Implements Byte Pair Encoding (BPE) and Unigram language models
- Optimized for processing large text corpora
- Provides a lightweight and fast tokenization pipeline
- Supports vocabulary pruning and model compression
- Works with Unicode and multilingual text inputs
Categories
Natural Language Processing (NLP)License
MIT LicenseFollow YouTokenToMe
Other Useful Business Software
Earn up to 16% annual interest with Nexo.
Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform.
Geographic restrictions, eligibility, and terms apply.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of YouTokenToMe!