From: Tonio W. <ton...@un...> - 2006-07-14 14:57:06
|
Hello, first of all: Thank you very much for making your software available. It spares me a lot of work! I have two questions concerning the model construction. 1. Is there a switch to allow upper case characters without modifying the code? Even though I have "A-Z" included in valid.chars list, all words are transformed to lower case. 2. Is there a reason why the maximum number of columns is set to 2999 ? I would like to try out squared termxterm matrices, as used by Reinhard Rapp (2003), who got excellent results in the TOEFL synonym test. Does anyone have experiences about the optimal number of columns? Thank you very much in advance ! Tonio |
From: David H. <dl...@st...> - 2006-07-15 01:33:04
|
On 7/14/06, Tonio Wandmacher <ton...@un...> wrote: > first of all: Thank you very much for making your software available. It > spares me a lot of work! > I have two questions concerning the model construction. > > 1. Is there a switch to allow upper case characters without modifying the > code? Even though I have "A-Z" included in valid.chars list, all words are > transformed to lower case. Before I say anything, despite my email address, I'm not affiliated with this project, so take everything I say with a grain of salt. That said, if you look at preprocessing/tokenizer.c:249 (in some version...) you'll see a call to "tolower", simply remove the call, and that should help you out. Devs, if you read this, would there be interest in making this a switch? I can try to write up a patch... > > 2. Is there a reason why the maximum number of columns is set to 2999 ? I > would like to try out squared termxterm matrices, as used by Reinhard Rapp > (2003), who got excellent results in the TOEFL synonym test. > Does anyone have experiences about the optimal number of columns? > Have you looked at the default.params file? Otherwise, I don't know exactly what's going on here. HTH, David |
From: Dominic W. <wi...@ma...> - 2006-07-15 16:37:46
|
Thanks David for jumping in. I heartily encourage and appreciate people taking the Infomap software into their own hands and answering questions, and I'm sure the other original developers from the project at CSLI agree. My latest and greatest excuse for being too busy to devote much attention to maintaining the software is my one week old daughter Elinor, who is sitting on my knee as I scribble a frantic message. So my availability isn't going to be increasing any time soon, that's for sure! I agree with the solutions you posted below, couldn't have done better myself. It's possible to set COLUMNS in the default-params file, I don't know of any flag that prevents the number from being greater than 2999. Nor do I know off hand whether the matrix format in memory is scalable enough to handle a full term-by-term matrix. But this is certainly the place to begin looking. Good luck, and thanks for using the software and contributing any insight. Best wishes, Dominic On Jul 14, 2006, at 9:33 PM, David Hall wrote: > On 7/14/06, Tonio Wandmacher <ton...@un...> > wrote: >> first of all: Thank you very much for making your software available. >> It >> spares me a lot of work! >> I have two questions concerning the model construction. >> >> 1. Is there a switch to allow upper case characters without modifying >> the >> code? Even though I have "A-Z" included in valid.chars list, all >> words are >> transformed to lower case. > > Before I say anything, despite my email address, I'm not affiliated > with this project, so take everything I say with a grain of salt. That > said, if you look at > > preprocessing/tokenizer.c:249 (in some version...) you'll see a call > to "tolower", simply remove the call, and that should help you out. > Devs, if you read this, would there be interest in making this a > switch? I can try to write up a patch... > >> >> 2. Is there a reason why the maximum number of columns is set to 2999 >> ? I >> would like to try out squared termxterm matrices, as used by Reinhard >> Rapp >> (2003), who got excellent results in the TOEFL synonym test. >> Does anyone have experiences about the optimal number of columns? >> > > Have you looked at the default.params file? Otherwise, I don't know > exactly what's going on here. > > HTH, > David > > > ----------------------------------------------------------------------- > -- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > infomap-nlp-users mailing list > inf...@li... > https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > |
From: Beate D. <do...@im...> - 2006-07-17 09:19:17
|
Hi Tonio, David and Dominic, On Sat, 15 Jul 2006, Dominic Widdows wrote: > I agree with the solutions you posted below, couldn't have done better > myself. It's possible to set COLUMNS in the default-params file, I > don't know of any flag that prevents the number from being greater than > 2999. Nor do I know off hand whether the matrix format in memory is > scalable enough to handle a full term-by-term matrix. But this is > certainly the place to begin looking. The SVD interface sets a limit on the number of columns: #define NMAX 3000 /* bound on ncol, order of A */ (it's in infomap-nlp/svd/svdinterface/las2.c) I don't know whether this line is part of the original SVDPACKC package (which was developed at the University of Tennesse and not by the infomap project). It probably is. You can try increasing the limit and see if the singular value decomposition is still successful. Good luck! Beate |
From: Tonio W. <ton...@un...> - 2006-07-17 09:50:20
|
Hello, well, congratulations to your daughter, Dominic! I can deeply imagine tha= t you are now occupied by more important things than software maintenance... Thanks for your advice concerning my questions. Indeed, there is a built-in limit to 3000, and I did not dare to modify it. However, I don't see why, (using Michael Berry's GTP) I applied SVD to much larger matrice= s (~120.000 cols). I will try that out and I'll let you know about my results. Best wishes, Tonio > > Hi Tonio, David and Dominic, > > On Sat, 15 Jul 2006, Dominic Widdows wrote: > >> I agree with the solutions you posted below, couldn't have done better >> myself. It's possible to set COLUMNS in the default-params file, I >> don't know of any flag that prevents the number from being greater tha= n >> 2999. Nor do I know off hand whether the matrix format in memory is >> scalable enough to handle a full term-by-term matrix. But this is >> certainly the place to begin looking. > > The SVD interface sets a limit on the number of columns: > #define NMAX 3000 /* bound on ncol, order of A */ > (it's in infomap-nlp/svd/svdinterface/las2.c) > > I don't know whether this line is part of the original SVDPACKC > package (which was developed at the University of Tennesse and not by t= he > infomap project). It probably is. You can try increasing the limit and > see if the singular value decomposition is still successful. > > Good luck! > Beate > > --=20 Wissenschaftlicher Mitarbeiter Institut f=FCr Kognitionswissenschaft Universit=E4t Osnabr=FCck Raum 31/450c Albrechtstra=DFe 28 49076 Osnabr=FCck Tel: 0541/969-3391 |