Concordia - Roman goddess of agreement. Concordance searcher - tool for translators who need their translations to "agree" with one standard.
Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures.
The effects are stunning - Concordia is able to do simple substring lookup at the pace of 5000 queries per second (on personal PC) - a speed which can not be achieved by any other search library.
Moreover, Concordia can perform its own "concordia search". For a given input sentece, all substring matches covering this sentence are retrieved.
This project now contains fully functional Concordia search library. In the near future, it will be extended by concordia-server: ligthweight, robust web server providing corpora search functionalities
concordia
Powerful search library, best suited for computer-aided translation
Status: Alpha
Brought to you by:
rjawor
Downloads:
0 This Week
Linux