Download Latest Version Lingua 2.1.1 source code.tar.gz (7.8 MB)
Email in envelope

Get an email when there's a new version of Lingua-Py

Home / v1.4.0
Name Modified Size InfoDownloads / Week
Parent folder
lingua_language_detector-1.4.0.tar.gz 2024-10-29 93.2 MB
lingua_language_detector-1.4.0-py3-none-any.whl 2024-10-29 93.4 MB
Lingua 1.4.0 source code.tar.gz 2024-10-29 101.6 MB
Lingua 1.4.0 source code.zip 2024-10-29 104.3 MB
README.md 2024-10-29 1.0 kB
Totals: 5 Items   392.5 MB 0

Features

  • This release introduces an absolute confidence metric based on unique and most common ngrams for each supported language. It allows to build a language detector from a single language only. Such a detector serves as a binary classifier, telling you whether some text is written in your selected language or not. (#235)

Improvements

  • The new absolute confidence metric helps to improve accuracy in low accuracy mode. The mean of average detection accuracy (single words, word pairs and sentences combined) increases from 77% to 80%.

Bug Fixes

  • The tokenization of texts written in the Devanagari alphabet was flawed. This has been fixed, leading to better detection accuracy for Hindi and Marathi.

Compatibility

  • The newest Python 3.13 is now officially supported.
  • Support for Python 3.8 and 3.9 has been dropped. The lowest supported Python version is 3.10 now.

Please note: All new features and bug fixes will also be part of the next Rust-based Python extension release 2.1.0.

Source: README.md, updated 2024-10-29