From: Egon W. <ego...@gm...> - 2012-08-22 10:57:28
|
On Tue, Aug 21, 2012 at 9:46 PM, Peter Murray-Rust <pm...@ca...> wrote: > On Tue, Aug 21, 2012 at 5:04 PM, Vance - <van...@gm...> wrote: >> Can someone confirm that OSCAR can only deal with English-language text? > The version we released is English only. We have tried to design both OSCAR > and OPSIN so that they are modular. I haven't heard of anyone doing other > languages. However we think the archirecture would create a good start Agreed. When Oscar4 was (re)designed, we added Locale support: http://chem-bla-ics.blogspot.nl/2010/12/text-mining-chemistry-from-dutch-or.html But as this post writes, the key things involved are: - proper training data in the new language - the tokenizer may need tuning for this language And perhaps some further issues. It will require hacking, but certainly not impossible, and at least for me, most welcome! Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |