|
From: Scott J. C. <ced...@cs...> - 2004-03-10 23:18:17
|
Beate,
Thanks for your help! What you describe sounds like a reasonable
approach.
Unfortunately, I need to do some housekeeping with our CVS
repository before it can be changed by multiple people without
making a mess. I am planning to do that by the end of the week,
and I'll get back to you.
Scott
On Wed, Mar 10, 2004 at 06:37:49PM +0100, Beate Dorow wrote:
>
>
> Dear Scott,
>
> I am busy writing lately, but I don't mind adding this feature. Do you
> think it'd be early enough if I did it during the weekend?
>
> It's the initialize_column_indices routine (in dict.c) which picks the
> column labels. I remember that we did earlier experiments with picking the
> top words according to tf-idf as column labels rather than the top
> frequent ones.
>
> I think it wouldn't be a big deal to hand over a Boolean variable
> $FROM_FILE to initialize_column_indices which indicates whether column
> indices should be computed or read from a file.
> We could let a user "turn on" this variable by adding an option
> -cols_from_file to infomap-build which passes the value to
> intialize_col_indices via count_wordvec.c. Does that make sense?
>
> Best wishes,
> Beate
>
>
>
> On Tue, 9 Mar 2004, Scott James Cederberg wrote:
>
> >Dominic and Shugi,
> >
> > I'm CC'ing this reply to infomap-nlp-devel, because in theory the
> > sort of discussion touched off by Dominic's message below (about
> > how to add this feature) should take place there.
> >
> > I'm not familiar with where and how count_wordvec chooses the
> > content-bearing words, but I think the easiest thing would be to
> > modularize the part where it does that (e.g. into a separate function),
> > and then create another function that instead read content bearing
> > words from a file. Which function was called could be controlled
> > by a command-line option.
> >
> > I've already got a bit of a backlog of reported but unfixed bugs;
> > I'm hoping to dig my way out from under that by the end of the
> > week. Hopefully next week I would then have time to add this
> > feature.
> >
> > If anyone else wants to take it on, please let me know.
> >
> > Scott
> >
> >
> >On Fri, Mar 05, 2004 at 07:00:19PM -0800, Dominic Widdows wrote:
> >>
> >> Hi Scott,
> >>
> >> I know we talked about this in the past - is it doable or shall we tell
> >> people it's on the back burner?
> >>
> >> As far as I can tell, it's just a question of putting a different list of
> >> words into memory and telling the count_wordvec program to look there.
> >> Which could be a total can of worms in C.
> >>
> >> Best wishes,
> >> Dominic
> >>
> >> ---------- Forwarded message ----------
> >> Date: Fri, 5 Mar 2004 18:53:17 -0800
> >> From: Shuji Yamaguchi <yam...@ya...>
> >> To: inf...@li...
> >> Subject: [infomap-nlp-users] Infomap. Can I choose and feed
> >> "content-bearing words" to "count_wordvec"?
> >>
> >> Hi InfoMap admin and users,
> >>
> >> I wonder whether I could choose the "content-bearing words" myself and feed
> >> them into the pre-processing of InfoMap.
> >> The count_wordvec appears to be the program that does it. According to its
> >> man page, the content words are chosen from the ones in "ranking 50-1049".
> >> Are there any way to customize this by use of options and/or parameters?
> >>
> >> Thank you for your support.
> >> Regards, Shuji
> >>
> >> Shuji Yamaguchi,
> >> Fellow, Reuters Digital Vision Program, CSLI, Stanford.
> >>
> >>
> >>
> >>
> >> -------------------------------------------------------
> >> This SF.Net email is sponsored by: IBM Linux Tutorials
> >> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> >> GenToo technologies. Learn everything from fundamentals to system
> >> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> >> _______________________________________________
> >> infomap-nlp-users mailing list
> >> inf...@li...
> >> https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users
> >
> >--
> >Scott Cederberg
> >Researcher
> >
> >Infomap Project
> >Computational Semantics Lab
> >Center for the Study of Language and Information (CSLI)
> >Stanford University
> >
> >http://infomap.stanford.edu/
> >
> >
> >-------------------------------------------------------
> >This SF.Net email is sponsored by: IBM Linux Tutorials
> >Free Linux tutorial presented by Daniel Robbins, President and CEO of
> >GenToo technologies. Learn everything from fundamentals to system
> >administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> >_______________________________________________
> >infomap-nlp-devel mailing list
> >inf...@li...
> >https://lists.sourceforge.net/lists/listinfo/infomap-nlp-devel
> >
--
Scott Cederberg
Researcher
Infomap Project
Computational Semantics Lab
Center for the Study of Language and Information (CSLI)
Stanford University
http://infomap.stanford.edu/
|