Lingua-RS - Browse /v1.7.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
Lingua 1.7.0 source code.tar.gz	2025-03-20	103.0 MB	0
Lingua 1.7.0 source code.zip	2025-03-20	105.5 MB	0
README.md	2025-03-20	1.5 kB	0
Totals: 3 Items		208.5 MB	0

This release introduces an absolute confidence metric based on unique and most common ngrams for each supported language. It allows to build a language detector from a single language only. Such a detector serves as a binary classifier, telling you whether some text is written in your selected language or not. (#413)

The new absolute confidence metric helps to improve accuracy in low accuracy mode. The mean of average detection accuracy (single words, word pairs and sentences combined) increases from 77% to 80%.
The rule-based algorithm for the recognition of Japanese texts has been improved. Texts including both Japanese and Chinese characters are now classified more often correctly as Japanese instead of Chinese. (#406)
The characters Щщ are now correctly identified as possible indicators for the Ukrainian language, leading to slightly higher accuracy when identifying Ukrainian texts.
The accuracy_reports binary now supports the arguments --detectors and --languages, allowing to select only a specific subset of detector / language combinations.

Text spans created by LanguageDetector.detect_multiple_languages_of() sometimes skipped characters in the last span. This has been fixed.
The tokenization of texts written in the Devanagari alphabet was flawed. This has been fixed, leading to better detection accuracy for Hindi and Marathi.

Source: README.md, updated 2025-03-20

Lingua-RS Files