Menu

Home

Jan Keymeulen

TextComparer Usage

When you run the program, you can load the orgininal text
and the copied text by using the file menu. These text should be in
UTF-8 encoding and plain text (.txt) files. The program will strip
all capitalization and punctuation.

In the compare menu, you can set the parameters for which you want to
use. LD distance is the so called Levenstein-Dammereau distance, a metric
for how much a given word differs from another. More details on this can
be found in this http://en.wikipedia.org/wiki/Levenshtein_distance wiki
article. Word length is the minimum length of words to be considered, you
can use this to skip articles. Search window is how far you will look forward
of backward when you have found a word. For instance, when one sentence is
"aaaa bbbb cccc dddd" in the source text and "aaaa dddd" in the destination
text, then it will be considered a valid match when the window is 3.
Word skip allow this much words to be skipped in the source text, for instance
"aaaa bbbb cccc dddd" will match "aaaa cccc dddd" when the word skip is 1,
allowing to skip "bbbb" in the source. Result length will finally filter on
how long (in words) the resulting match should be.

You can click compare to start comparing a text, which can take quite long.
However, once you have found a result, you can search within this result, which
will be much faster. So I would advise to search with relatively broad parameters
such as a low minimum result length in the beginning and refine this to more
strick parameters afterwards.

Finally, when a result is found, it will be indicated in both source and
destination texts and you can click on either one to see the corresponding
fragment.


MongoDB Logo MongoDB