Sanchay is a platform for working on languages (especially South Asian) using computers. It is still in the development stage, but components like a text editor with customizable support for languages and encodings, annotation interfaces, etc. are ready
Be the first to post a text review of Sanchay. Rate and review a project by clicking thumbs up or thumbs down in the right column.
New: Some more features added in the Syntactic Annotation Interface. Most of the repetitive actions can now be performed with keys or by context (right-click) menu. This should make annotation much easier and faster. --- Many new applications have been included in this release. Also, there is now a single GUI from which all the applications can be started and it is possible to open multiple windows for one or more applications. The following applications are included in this release: - Sanchay Text Editor that is connected to some other NLP/CL components of Sanchay. - Table Editor with all the usual facilities. - A more intelligent Find-Replace-Extract Tool (can search over annotated data and allows you to see the matching files in the annotation interface). - Word List Builder. - Word List FST (Finite State Transducer) Visualizer that can be useful for anyone working with morphological analysis etc. - One of the Most Accurate Language and Encoding Identifier (currently trained for 54 langauge-encoding pairs, including most of the major Indian languages). - A user friendly Syntactic Annotation Interface that is the perhaps most heavily used part of Sanchay till now. Hopefully there will be an even more user friendly version soon. - A Parallel Corpus Annotation Interface, which is another heavily used components. (Don't take that 'heavily' too seriously). - An N-gram Language Modeling Tool that allows you to compile models in terms of bytes, letters and words. - A Discourse Annotation Interface that is yet to be actually used. - A more intelligent File Splitter. - An Automatic Annotation tool for POS (Part Of Speech) tagging, chunking and Named Entity Recognition. The first two should work reasonably work, but the last one may not be that useful for practical purposes. This is a CRF (Conditional Random Fields) based tool and it has been trained for Hindi for these three purposes. If you have annotated data, you can use it to train your own taggers and chunkers.
This was the version finished more than one year ago, but it was not released on Sourceforge due to some reasons. Since the current version (0.3.0) is being released soon, this version is not recommended for new users.
Be the first person to add a text review.
Copyright © 2009 Geeknet, Inc. All rights reserved. Terms of Use
Thanks for your rating!
Would you also like to write a review?