Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-06-24 | 3.8 kB | |
v0.21.2 source code.tar.gz | 2025-06-24 | 1.6 MB | |
v0.21.2 source code.zip | 2025-06-24 | 1.7 MB | |
Totals: 3 Items | 3.3 MB | 4 |
What's Changed
This release if focused around some performance optimization, enabling broader python no gil support, and fixing some onig issues!
- Update the release builds following 0.21.1. by @Narsil in https://github.com/huggingface/tokenizers/pull/1746
- replace lazy_static with stabilized std::sync::LazyLock in 1.80 by @sftse in https://github.com/huggingface/tokenizers/pull/1739
- Fix no-onig no-wasm builds by @414owen in https://github.com/huggingface/tokenizers/pull/1772
- Fix typos in strings and comments by @co63oc in https://github.com/huggingface/tokenizers/pull/1770
- Fix type notation of merges in BPE Python binding by @Coqueue in https://github.com/huggingface/tokenizers/pull/1766
- Bump http-proxy-middleware from 2.0.6 to 2.0.9 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1762
- Fix data path in test_continuing_prefix_trainer_mismatch by @GaetanLepage in https://github.com/huggingface/tokenizers/pull/1747
- clippy by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1781
- Update pyo3 and rust-numpy depends for no-gil/free-threading compat by @Qubitium in https://github.com/huggingface/tokenizers/pull/1774
- Use ApiBuilder::from_env() in from_pretrained function by @BenLocal in https://github.com/huggingface/tokenizers/pull/1737
- Upgrade onig, to get it compiling with GCC 15 by @414owen in https://github.com/huggingface/tokenizers/pull/1771
- Itertools upgrade by @sftse in https://github.com/huggingface/tokenizers/pull/1756
- Bump webpack-dev-server from 4.10.0 to 5.2.1 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1792
- Bump brace-expansion from 1.1.11 to 1.1.12 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1796
- Fix features blending into a paragraph by @bionicles in https://github.com/huggingface/tokenizers/pull/1798
- Adding throughput to benches to have a more consistent measure across by @Narsil in https://github.com/huggingface/tokenizers/pull/1800
- Upgrading dependencies. by @Narsil in https://github.com/huggingface/tokenizers/pull/1801
- [docs] Whitespace by @stevhliu in https://github.com/huggingface/tokenizers/pull/1785
- Hotfixing the stub. by @Narsil in https://github.com/huggingface/tokenizers/pull/1802
- Bpe clones by @sftse in https://github.com/huggingface/tokenizers/pull/1707
- Fixed Length Pre-Tokenizer by @jonvet in https://github.com/huggingface/tokenizers/pull/1713
- Consolidated optimization ahash dary compact str by @Narsil in https://github.com/huggingface/tokenizers/pull/1799
- 🚨 breaking: Fix training with special tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1617
New Contributors
- @414owen made their first contribution in https://github.com/huggingface/tokenizers/pull/1772
- @co63oc made their first contribution in https://github.com/huggingface/tokenizers/pull/1770
- @Coqueue made their first contribution in https://github.com/huggingface/tokenizers/pull/1766
- @GaetanLepage made their first contribution in https://github.com/huggingface/tokenizers/pull/1747
- @Qubitium made their first contribution in https://github.com/huggingface/tokenizers/pull/1774
- @BenLocal made their first contribution in https://github.com/huggingface/tokenizers/pull/1737
- @bionicles made their first contribution in https://github.com/huggingface/tokenizers/pull/1798
- @stevhliu made their first contribution in https://github.com/huggingface/tokenizers/pull/1785
- @jonvet made their first contribution in https://github.com/huggingface/tokenizers/pull/1713
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.21.1...v0.21.2rc0