Re: [Senseclusters-users] Unsupervided Relation labeling
Status: Beta
Brought to you by:
tpederse
From: Stefano S. <ste...@gm...> - 2014-12-01 13:48:41
|
Yes Ted, thanks... In Italy we don't have Thanksgiving... so I forgot that you were on holidays in USA... Excuse me for the disturb! I worked here and meanwhile I've started to use CLUTO output for a first cluster label estimation. Anyway, I'm waiting for your response, to try to complete our system with sensclusters. Thank you so much. Stefano Silvestri, Università di Napoli "Federico II" 2014-12-01 14:29 GMT+01:00 Ted Pedersen <dul...@gm...>: > My apologies for the very delayed response - you caught us right > before the start of the Thanksgiving Holidays here in the USA - we are > back to work now, and so I'll take a look at this today. > > On Wed, Nov 26, 2014 at 3:19 PM, Stefano Silvestri > <ste...@gm...> wrote: > > Hi Ted, > > as described in the previous email, I've launched my experiment. As said, > > the final step of my pipeline is the cluster labeling, using > Sensclusters. > > I want to remember to you that the system performs an unsupervised > relation > > extraction from the entities found in 988 clinical records (the entities > > have been extracted through UMLS databases and we cluster the couples of > > entities). > > > > To integrate Sunslusters cluster_label in our system, I've produced a > > cluto-style output for the clustering results (around 160000 elements) > and > > an rlabel file (same number), with the list of all the clustered > elements. > > At this point, I have problems in running format_cluster. > > > > To perform the labeling, I need the the format_cluster's output, > generated > > with the --context option. So, I've created a senseval-2 file with > > text2sval.pl. The input file of text2sval is a plain text with each > whole > > clinical record on each line. > > Naturally, each context contains more than one cluster members. > > I haven't used any optional argument in text2sval. > > > > This output has 988 instance ids. Now, when I try to launch > format_cluster, > > I have the following error, occurring during the parse of the senseval > file: > > Use of uninitialized value $sentence in pattern match (m//) at > > ../.cpan/build/Text-SenseClusters-1.03-FMoSjn/Toolkit/evaluate/ > format_clusters.pl > > line 309, <SCON> line 5938. (when it reaches the last line of senseval2 > > file). > > > > I'm thinking that the context used are wrong... so my question are: > > 1) do I have to put in the context only the extracted entities or the > > relations? > > 2) Do the contexts must be in the same number of clustered elements? > > 3) If nothing is (theoretically) wrong, what should be the error in the > > sense-eval file? > > > > I'm waiting for your response... > > Thank you for the attention and I hope that you can help us to complete > our > > research. > > > > > > 2014-10-23 16:02 GMT+02:00 Stefano Silvestri < > ste...@gm...>: > >> > >> Hi Ted and thanks. > >> > >> The PoS tagging, entity recognition, feature extraction and the > clustering > >> tasks have been created with our system (not Senseclusters) - still in > >> developement. > >> Now I'm trying to use the cluster_labeling module of SenseClusters to > show > >> that we have found, in a unsupervised approach, the relation between > medical > >> entities in the clinical records (i.e. diabetes mellitus <> glycemia) > and > >> have, in this way, some labels for the clusters. > >> > >> I'm now writing the code to create the context files and then I'll run > the > >> experiments on cluster labeling. I'll let you know in a few days if > >> everything worked well and, in case of a new publication, I'll cite your > >> great work. > >> > >> I'm sure that I will ask some more things in the next days, so I thank > you > >> in advance. > >> Stefano Silvestri > >> > >> > >> 2014-10-23 15:07 GMT+02:00 Ted Pedersen <dul...@gm...>: > >>> > >>> Hi Stefano, > >>> > >>> This sounds like an interesting project, and it's good to know > >>> SenseClusters is proving to be useful. See my responses inline... > >>> > >>> On Wed, Oct 22, 2014 at 5:58 AM, Stefano Silvestri > >>> <ste...@gm...> wrote: > >>> > I've used a clustering techniques to discover, in an unsupervised > way, > >>> > relations between medical entities contained in a large collection of > >>> > anonymized medical records, in a reserch project of University of > >>> > Neaples. > >>> > The data set is composed by a large set of features - all the results > >>> > will > >>> > be shortly published on a journal. > >>> > > >>> > The next step in the development of our system is performing an > >>> > unsupervised > >>> > cluster (relation) labeling. To do that, I think to try the > >>> > clusterlabeling > >>> > module from Senseclusters. For creating the input to clusterlabeling > I > >>> > have > >>> > to use format_clusters module with --context option and now I have > some > >>> > problems. > >>> > > >>> > I have already produced a cluto-style cluster solution file (no > problem > >>> > for > >>> > that) from my system. > >>> > > >>> > The rlabel file, if I'm right, is a file containing the explicit > >>> > corresponding name of each entity in the cluster (in my case the > >>> > relation). > >>> > Is that right? > >>> > >>> Yes, rlabel shows the cluster to which each instance has been assigned. > >>> > >>> > > >>> > And now the problems about the context file... > >>> > It should be in senseval2 format. My experimental assesment is made > of > >>> > a > >>> > plain text files - so I should use plain text to headless senseval2 > >>> > utility. > >>> > > >>> > I have some questions. > >>> > > >>> > 1) Does the context file have to put together all my input files (the > >>> > medical records) in one large file (and each context must correspond > to > >>> > a > >>> > medical record)? > >>> > >>> Yes, the input for each run of SenseClusters should be a single file > >>> with all your contexts included. > >>> > >>> > > >>> > 2) Does the contexts be headless, or I have to tag (<head></head>) > all > >>> > the > >>> > entities (medical names) in input? > >>> > >>> Your contexts can be headless, and so there is no need to include > >>> <head> tags in your contexts. > >>> > >>> > > >>> > 3) Are other costrains in the context files (formatting, tags, or > >>> > other)? > >>> > > >>> > >>> There shouldn't be. The output from text2sval.pl should be acceptable > >>> for input "as is". > >>> > >>> > In case of success of the experiments, of course, I'll credit and > cite > >>> > the > >>> > Senseclusters project. > >>> > > >>> > PS - my system works on italian language. > >>> > >>> That's great! We'd be happy to answer further questions as they arise, > >>> and will be curious to know how things work out! > >>> > >>> Good luck, > >>> Ted > >>> > >>> > > >>> > Thanks for response, > >>> > Stefano Silvestri, > >>> > NLP researcher at University of Neaples "Federico II" > >>> > > >>> > > >>> > > ------------------------------------------------------------------------------ > >>> > Comprehensive Server Monitoring with Site24x7. > >>> > Monitor 10 servers for $9/Month. > >>> > Get alerted through email, SMS, voice calls or mobile push > >>> > notifications. > >>> > Take corrective actions from your mobile device. > >>> > http://p.sf.net/sfu/Zoho > >>> > _______________________________________________ > >>> > senseclusters-users mailing list > >>> > sen...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/senseclusters-users > >>> > > >>> > >>> > >>> > >>> -- > >>> Ted Pedersen > >>> http://www.d.umn.edu/~tpederse > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> _______________________________________________ > >>> senseclusters-users mailing list > >>> sen...@li... > >>> https://lists.sourceforge.net/lists/listinfo/senseclusters-users > >> > >> > > > > > > > ------------------------------------------------------------------------------ > > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > > with Interactivity, Sharing, Native Excel Exports, App Integration & more > > Get technology previously reserved for billion-dollar corporations, FREE > > > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > > _______________________________________________ > > senseclusters-users mailing list > > sen...@li... > > https://lists.sourceforge.net/lists/listinfo/senseclusters-users > > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > senseclusters-users mailing list > sen...@li... > https://lists.sourceforge.net/lists/listinfo/senseclusters-users > |