Re: [Openadaptxt-linguists] New Language
Brought to you by:
keypoint,
openadaptxt
|
From: Jens C. <jch...@ke...> - 2012-10-15 13:28:18
|
The dictionary creator can take an input corpus in txt (Unicode) format. You do need a list of valid words for it to work though. Additionally since the input files will be published on SourceForge, there might be copyright issues (if there isn't then it's fine to use a normal text corpus) and we'll have to go with the same approach as we do for our other dictionaries (a corpus with each word repeated as many times as needed and some generic context). I'm not sure how Michael did it for the Gaelic languages, but he might be able to help you. Cheers, Jens -----Original Message----- From: Chris Bickers [mailto:cbi...@gm...] Sent: 15 October 2012 12:59 To: Jens Christensen Cc: fi...@ak...; ope...@li... Subject: Re: [Openadaptxt-linguists] New Language Sounds ok, just macrons with this first language. Is there an easy way to seperate words from the corpus so I can build a wordlist from it and avoid repetition? Long weekend here so I haven't had a chance to ask my programmers. Regards ChrisB On 10/16/12, Jens Christensen <jch...@ke...> wrote: > Hi Chris, > Once you have the dictionary files ready we can have a test dictionary ready > within a day or two, depending on any issues that we come across of course. > Given that the Polynesian languages are relatively simple (at least > technically speaking) I don't think there should be too many problems. > > The only thing that can hold us back is if a language has some completely > unsupported feature which will have to be developed first (as was the case > for the Gaelic languages), but I can't immediately see that for the > Polynesian languages (at least not from looking at Wikipedia :-). > > Cheers, > Jens > > -----Original Message----- > From: Michael Bauer [mailto:fi...@ak...] > Sent: 15 October 2012 11:01 > To: Chris Bickers > Cc: Jens Christensen; ope...@li... > Subject: Re: [Openadaptxt-linguists] New Language > > The Gaelics started in July 2011 but before you pass out, we had some > language specific problems (Gaelic has "weird" stuff around prefixes > like h- t- n-) which required some development work. I think Polynesian > languages will be much faster as they tend only to have a macron at most > but I'd say it depends on how many "special requirements" the language > has and how long you take testing. > > Michael > > 15/10/2012 10:51, sgrìobh Chris Bickers: >> Excellent Jens, nice to meet you and great to be here, Michael has >> already given me some pointers on how to get started, I'm cleaning up >> a corpus and should have the first language to submit in a couple of >> days. What sort of timeframes would I be looking at before I have >> something to download and use in the community? >> Regards >> ChrisB > > > > -- Christopher Bickers Managing Director Bickers Services Samoa |