Tokenized Text Aligner download

This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization. In its default implementation, it produces a human-readable CSV table associating tokens in text A with tokens in text B, and can also inject token-level annotation from text B to text A. The Aligner class on which the default implementation is based can be incorporated into more complex workflows.

Project Activity

See All Activity >

License

GNU General Public License version 3.0 (GPLv3)

Follow Tokenized Text Aligner

Tokenized Text Aligner Web Site

Other Useful Business Software

Go from Code to Production URL in Seconds

Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free

Rate This Project

User Reviews

Be the first to post a review of Tokenized Text Aligner!

Additional Project Details

Operating Systems

BSD, Linux

Intended Audience

Advanced End Users, Science/Research

User Interface

Command-line

Programming Language

Python

Related Categories

Python Linguistics Software

Registered

2015-09-23

Similar Business Software

QBench

The modern, flexible, easy-to-use LIMS. QBench enables our customers to get a LIMS up and running faster. Automate your entire lab with our developer-friendly API, Inventory Management, Customer Portal, Billing, and Quality Management System modules. QBench is a cloud-based Laboratory...

See Software
Lockbox LIMS

A sample tracking, test result capture, and inventory management cloud LIMS for life science research, biotech/NGS, and industrial QC labs. Includes regulatory support for CLIA, HIPAA, Part 11, and ISO 17025. Nothing is more critical to a lab’s success than the quality, security, and...

See Software
RegDesk

RegDesk is a Regulatory Information Management System (RIMS) that helps medical device companies manage global regulatory submissions, product registrations, and compliance in one centralized platform. It streamlines regulatory workflows, organizes regulatory data, and provides global regulatory...

See Software
Qualio

Qualio is the leading quality and compliance platform built exclusively for emerging life sciences companies. MedTech, pharma, biotech, and diagnostics teams use Qualio to standardize quality processes, connect them to regulatory obligations, and gain real-time visibility into compliance...

See Software
SAP S/4HANA Cloud Public Edition

SAP Cloud ERP is the premier ERP solution for growth-focused organizations. Seamlessly integrating AI, and predictive analytics, it empowers businesses to digitally transform and streamline processes end to end. Leveraging built-in industry best practices, SAP Cloud ERP accelerates...

See Software
Calira

Calira is the world-leading lab management system helping all types of laboratories optimize their operations and accelerate results. Calira equips research teams with an easy-to-use software solution which enables them to coordinate equipment usage, plan maintenance activities, and generate...

See Software