This project is an approach for synonym extraction and extending WordNet by the so found synonyms.
The python application is realised as a kind of pipe that starts with a web-corpus-reader which is followed by several workers (tokenizers, lemmatizers, ...) and finally completed by a result writer.
In contrast to the state of the art approaches, this implementation is based on single words found in the web used as a corpus and translated to other languages. If translations of different source words intersect, it is assumed that the source words are synonymous.
Finally, the matches are written into a proprietary file format in conjunction with WordNet synsets (note currently the result writer uses a very trivial method for placing the matches into WordNet and will be modified in the near future)
Features
- Extracts synonym pairs from the web
- Inserts found pairs to WordNet synsets