Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
lingua_language_detector-1.4.0.tar.gz | 2024-10-29 | 93.2 MB | |
lingua_language_detector-1.4.0-py3-none-any.whl | 2024-10-29 | 93.4 MB | |
Lingua 1.4.0 source code.tar.gz | 2024-10-29 | 101.6 MB | |
Lingua 1.4.0 source code.zip | 2024-10-29 | 104.3 MB | |
README.md | 2024-10-29 | 1.0 kB | |
Totals: 5 Items | 392.5 MB | 0 |
Features
- This release introduces an absolute confidence metric based on unique and most common ngrams for each supported language. It allows to build a language detector from a single language only. Such a detector serves as a binary classifier, telling you whether some text is written in your selected language or not. (#235)
Improvements
- The new absolute confidence metric helps to improve accuracy in low accuracy mode. The mean of average detection accuracy (single words, word pairs and sentences combined) increases from 77% to 80%.
Bug Fixes
- The tokenization of texts written in the Devanagari alphabet was flawed. This has been fixed, leading to better detection accuracy for Hindi and Marathi.
Compatibility
- The newest Python 3.13 is now officially supported.
- Support for Python 3.8 and 3.9 has been dropped. The lowest supported Python version is 3.10 now.
Please note: All new features and bug fixes will also be part of the next Rust-based Python extension release 2.1.0.