From: Scott J. C. <ced...@cs...> - 2004-03-10 23:18:17
|
Beate, Thanks for your help! What you describe sounds like a reasonable approach. Unfortunately, I need to do some housekeeping with our CVS repository before it can be changed by multiple people without making a mess. I am planning to do that by the end of the week, and I'll get back to you. Scott On Wed, Mar 10, 2004 at 06:37:49PM +0100, Beate Dorow wrote: > > > Dear Scott, > > I am busy writing lately, but I don't mind adding this feature. Do you > think it'd be early enough if I did it during the weekend? > > It's the initialize_column_indices routine (in dict.c) which picks the > column labels. I remember that we did earlier experiments with picking the > top words according to tf-idf as column labels rather than the top > frequent ones. > > I think it wouldn't be a big deal to hand over a Boolean variable > $FROM_FILE to initialize_column_indices which indicates whether column > indices should be computed or read from a file. > We could let a user "turn on" this variable by adding an option > -cols_from_file to infomap-build which passes the value to > intialize_col_indices via count_wordvec.c. Does that make sense? > > Best wishes, > Beate > > > > On Tue, 9 Mar 2004, Scott James Cederberg wrote: > > >Dominic and Shugi, > > > > I'm CC'ing this reply to infomap-nlp-devel, because in theory the > > sort of discussion touched off by Dominic's message below (about > > how to add this feature) should take place there. > > > > I'm not familiar with where and how count_wordvec chooses the > > content-bearing words, but I think the easiest thing would be to > > modularize the part where it does that (e.g. into a separate function), > > and then create another function that instead read content bearing > > words from a file. Which function was called could be controlled > > by a command-line option. > > > > I've already got a bit of a backlog of reported but unfixed bugs; > > I'm hoping to dig my way out from under that by the end of the > > week. Hopefully next week I would then have time to add this > > feature. > > > > If anyone else wants to take it on, please let me know. > > > > Scott > > > > > >On Fri, Mar 05, 2004 at 07:00:19PM -0800, Dominic Widdows wrote: > >> > >> Hi Scott, > >> > >> I know we talked about this in the past - is it doable or shall we tell > >> people it's on the back burner? > >> > >> As far as I can tell, it's just a question of putting a different list of > >> words into memory and telling the count_wordvec program to look there. > >> Which could be a total can of worms in C. > >> > >> Best wishes, > >> Dominic > >> > >> ---------- Forwarded message ---------- > >> Date: Fri, 5 Mar 2004 18:53:17 -0800 > >> From: Shuji Yamaguchi <yam...@ya...> > >> To: inf...@li... > >> Subject: [infomap-nlp-users] Infomap. Can I choose and feed > >> "content-bearing words" to "count_wordvec"? > >> > >> Hi InfoMap admin and users, > >> > >> I wonder whether I could choose the "content-bearing words" myself and feed > >> them into the pre-processing of InfoMap. > >> The count_wordvec appears to be the program that does it. According to its > >> man page, the content words are chosen from the ones in "ranking 50-1049". > >> Are there any way to customize this by use of options and/or parameters? > >> > >> Thank you for your support. > >> Regards, Shuji > >> > >> Shuji Yamaguchi, > >> Fellow, Reuters Digital Vision Program, CSLI, Stanford. > >> > >> > >> > >> > >> ------------------------------------------------------- > >> This SF.Net email is sponsored by: IBM Linux Tutorials > >> Free Linux tutorial presented by Daniel Robbins, President and CEO of > >> GenToo technologies. Learn everything from fundamentals to system > >> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > >> _______________________________________________ > >> infomap-nlp-users mailing list > >> inf...@li... > >> https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > > > >-- > >Scott Cederberg > >Researcher > > > >Infomap Project > >Computational Semantics Lab > >Center for the Study of Language and Information (CSLI) > >Stanford University > > > >http://infomap.stanford.edu/ > > > > > >------------------------------------------------------- > >This SF.Net email is sponsored by: IBM Linux Tutorials > >Free Linux tutorial presented by Daniel Robbins, President and CEO of > >GenToo technologies. Learn everything from fundamentals to system > >administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > >_______________________________________________ > >infomap-nlp-devel mailing list > >inf...@li... > >https://lists.sourceforge.net/lists/listinfo/infomap-nlp-devel > > -- Scott Cederberg Researcher Infomap Project Computational Semantics Lab Center for the Study of Language and Information (CSLI) Stanford University http://infomap.stanford.edu/ |