From: Dominic W. <wi...@ma...> - 2006-07-22 01:11:56
|
Hi Felipe, There is a VALID_CHARS_FILE that you can set in the default-params. The default is admin/valid_chars.en that contains the string ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~ You need to add the characters you want to this file, or create a new =20= one and tell default-params to look there. Hope this helps. Best wishes, Dominic On Jul 20, 2006, at 6:37 AM, Felipe S=E1nchez Mart=EDnez wrote: > Hi all, > > I have just started working around infomap-nlp, and I found out that > there is some kind of problem with non-ascii characters. When building = =20 > a > single model from a single file, the wordlist file containing the = whole > corpus with only one word per line shows things like the following: > > ... > uni > n > ... > > > while it should be just 'uni=F3n'. The system is splitting single = words =20 > by > non-ascii character like '=F3'. > > How can I solve this problem? > > Thank you very much in advance. > > Cheeers > -- =20 > Felipe S=E1nchez Mart=EDnez > ------------------------------------------------------------------- > Departamento de Lenguajes E-mail: fsa...@dl... > y Sistemas Inform=E1ticos Homepage: www.dlsi.ua.es/~fsanchez > Universidad de Alicante Fax: +34 965 90 93 26 > E-03071 Alicante (Spain) Phone: +34 965 90 34 00, ext: 2038 > > > > = -----------------------------------------------------------------------=20= > -- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to =20 > share your > opinions on IT & business topics through brief surveys -- and earn = cash > http://www.techsay.com/default.php?=20 > page=3Djoin.php&p=3Dsourceforge&CID=3DDEVDEV > _______________________________________________ > infomap-nlp-users mailing list > inf...@li... > https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > |