Download Latest Version trafilatura-2.0.0 source code.tar.gz (31.4 MB)
Email in envelope

Get an email when there's a new version of Trafilatura

Home / v1.11.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2024-06-27 707 Bytes
trafilatura-1.11.0 source code.tar.gz 2024-06-27 32.6 MB
trafilatura-1.11.0 source code.zip 2024-06-27 33.0 MB
Totals: 3 Items   65.7 MB 0

Breaking change: - metadata now skipped by default (#613), to trigger inclusion in all output formats: - with_metadata=True (Python) - --with-metadata (CLI)

Extraction: - add HTML as output format (#614) - better and faster baseline extraction (#619) - better handling of HTML/XML elements (#628) - XPath rules added with @felipehertzer (#540) - fix: avoid faulty readability_lxml content (#635)

Evaluation: - new scripts and data with @LydiaKoerber (#606, [#615]) - additional data with @swetepete (#197)

Maintenance: - docs extended and updated, added page on deduplication (#618) - review code, add tests and types in part of the submodules (#620, [#623], [#624], [#625])

Source: README.md, updated 2024-06-27