Home
Name Modified Size InfoDownloads / Week
README.txt 2011-01-04 1.8 kB
description_icelandic.pdf 2011-01-04 788.5 kB
so.pl 2011-01-04 110.8 kB
kk_hk.pl 2011-01-04 121.5 kB
kvk.pl 2011-01-04 41.5 kB
lo.pl 2011-01-04 25.8 kB
Totals: 6 Items   1.1 MB 0
Morphological analysis for dummies v 2.0
Beygingarlýsing for dummies v 2.0

The project started as an entry in the Orðið competition (www.ordid.is) and it won a special prize. 
The purpose of the project is to assign a unique code to each Icelandic word from the open classes. The project uses the CSV database of BÍN (Beygingarlýsing Íslensks Nútímamáls), http://bin.arnastofnun.is/gogn/, which contains around 270,000 lemmata. The idea is that words, which inflect in the same way, will have the same code. A list of such words can easily be extracted from the output files.
Version 2.0 includes four Perl-scripts:

kk_hk.pl - for masculine and neuter nouns
lo.pl - for adjectives
so.pl - for verbs
kvk.pl - for feminine nouns

They can be run by the following command:

  	perl “pl_file” SHsnid.csv > “output_file”

Version 1.0 analysed only masculine and neuter nouns. Version 2.0 includes all open-class words (no adverbs).
The methodology for masculine and neuter nouns is different than the one for feminine nouns, adjectives and verbs. Further information can be found (in Icelandic) in the PDF file with the description of the project. 

Copyright information: All the code released here is published under the GNU General Public License (http://www.gnu.org/licenses/gpl.html). The database, which it uses is subject to conditions described on its download page.

AUTHORS: Tihomir Rangelov (tihomir.rangelov@gmail.com)

ACKNOWLEDGEMENTS: Special thanks to the Árni Magnússon Institute for Icelandic Studies and specially to Kristín Bjarnadóttir. Also to Já.is for supporting the competion and to the other members of the jury (Hjálmar Gíslason and Hrafn Loftsson). Eiríkur Rögnvaldsson also supported this project.
/T.
Source: README.txt, updated 2011-01-04