Tokenized Text Aligner download

This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization. In its default implementation, it produces a human-readable CSV table associating tokens in text A with tokens in text B, and can also inject token-level annotation from text B to text A. The Aligner class on which the default implementation is based can be incorporated into more complex workflows.

Project Activity

See All Activity >

License

GNU General Public License version 3.0 (GPLv3)

Follow Tokenized Text Aligner

Tokenized Text Aligner Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of Tokenized Text Aligner!

Additional Project Details

Operating Systems

BSD, Linux

Intended Audience

Advanced End Users, Science/Research

User Interface

Command-line

Programming Language

Python

Related Categories

Python Linguistics Software

Registered

2015-09-23

Similar Business Software

Altium Develop

Altium Develop is a multidisciplinary product creation platform that breaks down silos and empowers teams to design collaboratively without limits. Built on Altium Designer and Altium 365, it unifies electrical, mechanical, software, sourcing, and manufacturing teams in a shared environment....

See Software
Evocon

Trusted by manufacturers worldwide, Evocon is a simple and easy-to-use OEE software that helps manufacturing companies improve their production efficiency and reduce waste. The system enables automated data collection, real-time data visualization, downtime tracking, bottleneck identification,...

See Software
ActCAD Software

ActCAD is a native dwg/dxf cad software suitable for professional 2D drafting and 3D modeling projects. ActCAD is trusted by over 30000 users in over 103 countries for more than 10 years. The interface, commands, icons, dialogs, shortcuts etc. are very much similar to other popular cad software...

See Software
Dronedesk

Are you wasting hours on drone flight planning? Still using spreadsheets, doc templates, and paper checklists? If so, it's time to switch to Dronedesk, the web-based drone operations management application that makes planning safe drone flights super-efficient. Dronedesk does all the...

See Software
AvPro Software

AvPro Software is comprehensive and easy-to-use. It's perfect for Aircraft MRO, Certified Repair Station (CRS), Aircraft Operators, and parts brokers. You can track Aircraft Parts I(nventory, Work Orders, and much more. Modular in nature and specifically designed for aircraft maintenance...

See Software
The Asset Guardian EAM (TAG)

Meet The Asset Guardian (TAG) Mobi – Now with mobiMentor AI to Maximize Wrench Time TAG Mobi is an AI-powered EAM solution for Microsoft Dynamics 365 Business Central, now enhanced with mobiMentor AI — an agentic AI ecosystem that gives maintenance experts more wrench time by automating admin...

See Software