From: Beate D. <do...@IM...> - 2004-03-10 17:47:56
|
Dear Scott, I am busy writing lately, but I don't mind adding this feature. Do you think it'd be early enough if I did it during the weekend? It's the initialize_column_indices routine (in dict.c) which picks the column labels. I remember that we did earlier experiments with picking the top words according to tf-idf as column labels rather than the top frequent ones. I think it wouldn't be a big deal to hand over a Boolean variable $FROM_FILE to initialize_column_indices which indicates whether column indices should be computed or read from a file. We could let a user "turn on" this variable by adding an option -cols_from_file to infomap-build which passes the value to intialize_col_indices via count_wordvec.c. Does that make sense? Best wishes, Beate On Tue, 9 Mar 2004, Scott James Cederberg wrote: >Dominic and Shugi, > > I'm CC'ing this reply to infomap-nlp-devel, because in theory the > sort of discussion touched off by Dominic's message below (about > how to add this feature) should take place there. > > I'm not familiar with where and how count_wordvec chooses the > content-bearing words, but I think the easiest thing would be to > modularize the part where it does that (e.g. into a separate function), > and then create another function that instead read content bearing > words from a file. Which function was called could be controlled > by a command-line option. > > I've already got a bit of a backlog of reported but unfixed bugs; > I'm hoping to dig my way out from under that by the end of the > week. Hopefully next week I would then have time to add this > feature. > > If anyone else wants to take it on, please let me know. > > Scott > > >On Fri, Mar 05, 2004 at 07:00:19PM -0800, Dominic Widdows wrote: >> >> Hi Scott, >> >> I know we talked about this in the past - is it doable or shall we tell >> people it's on the back burner? >> >> As far as I can tell, it's just a question of putting a different list of >> words into memory and telling the count_wordvec program to look there. >> Which could be a total can of worms in C. >> >> Best wishes, >> Dominic >> >> ---------- Forwarded message ---------- >> Date: Fri, 5 Mar 2004 18:53:17 -0800 >> From: Shuji Yamaguchi <yam...@ya...> >> To: inf...@li... >> Subject: [infomap-nlp-users] Infomap. Can I choose and feed >> "content-bearing words" to "count_wordvec"? >> >> Hi InfoMap admin and users, >> >> I wonder whether I could choose the "content-bearing words" myself and feed >> them into the pre-processing of InfoMap. >> The count_wordvec appears to be the program that does it. According to its >> man page, the content words are chosen from the ones in "ranking 50-1049". >> Are there any way to customize this by use of options and/or parameters? >> >> Thank you for your support. >> Regards, Shuji >> >> Shuji Yamaguchi, >> Fellow, Reuters Digital Vision Program, CSLI, Stanford. >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IBM Linux Tutorials >> Free Linux tutorial presented by Daniel Robbins, President and CEO of >> GenToo technologies. Learn everything from fundamentals to system >> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click >> _______________________________________________ >> infomap-nlp-users mailing list >> inf...@li... >> https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > >-- >Scott Cederberg >Researcher > >Infomap Project >Computational Semantics Lab >Center for the Study of Language and Information (CSLI) >Stanford University > >http://infomap.stanford.edu/ > > >------------------------------------------------------- >This SF.Net email is sponsored by: IBM Linux Tutorials >Free Linux tutorial presented by Daniel Robbins, President and CEO of >GenToo technologies. Learn everything from fundamentals to system >administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click >_______________________________________________ >infomap-nlp-devel mailing list >inf...@li... >https://lists.sourceforge.net/lists/listinfo/infomap-nlp-devel > |