Menu

Help added

Regexp Testing Tool
-------------------
https://sourceforge.net/projects/regexper

Regexp testing tool allows to apply group of regexps to huge arrays of data (millions or so) in order to investigate search
or search/replacement possibilities of regexp group. It useful when you need to debug results of some parsing algorithm which assume
that you apply regexps one by one and compare results of one or several steps via checkpoints.

Features:
---------
1. Allows to apply multiple regexp group and remember results of this (checkpoint)
2. Allows to compare results of one regexp group vs other regexp group (checkpoint comparision)
3. Works fast on huge arrays of data (millions or so)

Short demo scenario:
--------------------
0. All binaries located in regexper\bin\ directory, Windows lauch script is called regexp.cmd, Linux users must use regexp.sh.

1. Start as "regexp.cmd -f data.txt".
Data.txt is demo file which could be replaced by any file of your own.

2. Select regexp "remove 2 or 3 spaces" and click "Process".
In the table below, you see list of strings with this regexp applied to data.txt. 2 or 3 spaces are replaced with 1 space.

3. In order to save those results for future comparition, click "Add". This would save results of the regexp execution as checkpoint.
Checkpoint#3 would appear in checkpoints list (left).

4. Now let's see how we could apply group of regexps to data.txt.
Select regexps "remove float values, leave dot" and "remove dot from russian initials" with Ctrl+click and click "Process".
In the table below, you see list of strings with regexps applied. Only dot left from float values and dot is removed between initials.
It doesn't make sense but this is just a sample, you know ;)

5. Let's see results of comparision of current results ("remove float values, leave dot" + "remove dot from russian initials") and step 2 ("remove 2 or 3 spaces") results.
Select Checkpoint#3 in the left table. This would compare currents results (after step 4) with saved results of step 2.
You would see results of comparition in "Difference" column:
"-" means result present in checkpoint#3 (after appliance of "remove 2 or 3 spaces regexp", step 2), but absent now (after step 4)
"+" means result absent in checkpoint#3, but present now
"!=" means results are different in checkpoint and now

P.S. Don't forget to look "regexp.cmd --help" for more command-line options.

Posted by EZhuravleva 2010-11-08

Log in to post a comment.