From: Ted P. <tpederse@d.umn.edu> - 2008-07-24 15:49:59
|
Hi Bridget, One small question below... On Wed, Jul 23, 2008 at 4:00 PM, Bridget Thomson McInnes <bth...@cs...> wrote: > Hi Ted, > > You are using the wrong version of the NLM-WSD dataset. There are two > version, one in which the identifiers are PMIDs and the other which are UIs. > Unfortuently, the UIs one is the first one. > > If you go to the NLM-WSD website: http://wsd.nlm.nih.gov/ > > and log in, scroll down to the first white box. You will see: > > Switch to PMID identified version of the WSD Test Collection > > > It originally didn't matter which version that you used but now when using > the --mesh option or running the make-MMTxNLM-data.sh Demo - the PMID > version is required. I think I'm running the make-MMTxNLM-data.sh script on the same data I'm using for everything else (which isn't probably the PMID data....) Is there something I should see in the output that would let me know that it wasn't the PMID data? I can't remember if I asked this before, but I'm wondering if we should simply require the PMID data, to avoid having the user get two different forms of the same data (and potentially use them in the wrong places...) It sounds like some things will work with both forms of the data, but a few things will not (and require PMID), so that sort of suggests that PMID is the more "generic" form of the data....is there any downside to using PMID rather than the other form? Related to this, are there any other possible enhancements in the future (that we've already discussed) that would require PMID? What do you think? Thanks, Ted > > I will think about how to go about adding a check for that. I have done it > myself to many times as well. And state it clearer in the INSTALL > documentation. > > Sorry about that! > > Thanks, > > Bridget > > On Wed, 23 Jul 2008, Ted Pedersen wrote: > >> Hi Bridget, >> >> I'm using whichever one is used by the McInnes07 demo, since I'm just >> copying McInnes07.mm and using that as TDP.mm. >> >> Here's the first few lines of adjustment.mm, hopefully that will make >> it clear what I'm using - I *think* this is the PMID version since it >> has that number in it (which I think is the PMID)? >> >> <corpus lang='en'> >> <lexelt item="adjustment" senses="M1,M2,M3,None"> >> <instance id="98076825" alias="adjustment"> >> <answer instance="98076825.ab.7" senseid="M2"/> >> <context line="Influence of physiological factors on the >> age-related increase in blood pressure in healthy men. The independent >> a >> nd collective influences of several physiological factors on the >> age-related increase in blood pressure in healthy men were examined. T >> wenty-seven younger and 25 older, mostly normotensive, healthy men >> were studied. Blood pressure, body fat, body fat distribution, maxim >> al oxygen consumption (VO2max), plasma norepinephrine, dietary Na, and >> erythrocyte Na-K pump activity were measured. Older men showed 5 >> 7% higher percent body fat, 40% higher plasma norepinephrine >> concentration, 14% greater mean arterial blood pressure (MAP), and 5% >> high >> er plasma K concentration than younger men (all p < 0.01). Older men >> showed a 38% (p < 0.01) lower VO2max, 19% (p < 0.05) lower energy >> intake, 18% (p < 0.05) lower Na-K pump rate constant, and a 17% (p < >> 0.05) lower Na-K pump rate. Group means for MAP were adjusted for >> combinations of plasma norepinephrine, waist:thigh ratio, VO2max, and >> the Na-K pump rate constant, to determine if any one variable or >> combination could account for the age related increase in MAP. >> Statistical adjustment for plasma norepinephrine, waist:thigh ratio, >> and >> Na-K pump rate constant eliminated the significant difference between >> MAPs for the two groups. Thus, alterations in sympathetic nervou >> s system activity, body fat distribution, and the membrane Na-K pump >> activity independently contribute to the age-related increase in M >> AP in healthy men. "/> >> <sentence tw="" id="98076825.ti.1" line="Influence of >> physiological factors on the age-related increase in blood pressure in >> he >> althy men."> >> >> Does that look like the right format? >> >> Thanks! >> Ted >> >> On Wed, Jul 23, 2008 at 3:30 PM, Bridget Thomson McInnes >> <bth...@cs...> wrote: >>> >>> Hi Ted, >>> >>> What version of the NLM-WSD dataset are you using? The --mesh option >>> requires that the PMID version be used. >>> >>> Thanks! >>> >>> Bridget >>> >>> On Wed, 23 Jul 2008, Bridget Thomson McInnes wrote: >>> >>>> Hi Ted, >>>> >>>> I just downloaded the CuiTools from the webpage and got the same error. >>>> I >>>> will see why it is doing this! >>>> >>>> Thanks! >>>> >>>> Bridget >>>> >>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: >>>> >>>>> Hi Bridget, >>>>> >>>>> I'm in the process of running the --mesh option again, this time just >>>>> using adjustment and --mesh, as in... >>>>> >>>>> supervised-disambiguate.pl Demos/TDP.mm/adjustment.mm --mesh >>>>> --directory ted-adjustment-mesh >>>>> >>>>> One thing I've noticed in the previous cases with --mesh is that the >>>>> log directory is empty - which I guess means that no features were >>>>> found, or something....then of course the ARFF files don't have any >>>>> features in the them either, leading to the majority classifier... >>>>> >>>>> This isn't just specific to --mesh, but I do think it would be a good >>>>> idea to issue a warning or possibly even an error when no features are >>>>> found, just so the user doesn't end up getting a majority classifier >>>>> without realizing it - unless I had been curious about the Mesh >>>>> features I might not have noticed any of this, just because you do get >>>>> results back even after finding no features. I think it might be ok to >>>>> default to a majority classifier in this case, but we'd want the user >>>>> to know that this has happened... >>>>> >>>>> My adjustment run just finished, so I've attached a zip file with the >>>>> log, arff, weka and results directories... >>>>> >>>>> Thanks, >>>>> Ted >>>>> >>>>> On Wed, Jul 23, 2008 at 2:33 PM, Bridget Thomson McInnes >>>>> <bth...@cs...> wrote: >>>>>> >>>>>> Hi Ted, >>>>>> >>>>>> I am not certain. I am going to redownload the package, do a clean >>>>>> install >>>>>> and try it again. Hopefully I will be able to recreate it. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Bridget >>>>>> >>>>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: >>>>>> >>>>>>> Hi Bridget, >>>>>>> >>>>>>> Thanks for this script - I ran it and it seems to give me back a fair >>>>>>> number of results...so I seem to be able to access PubMed ok... >>>>>>> >>>>>>> marimba(4): perl get-mesh.pl 9337195 >>>>>>> 9337195 : *Birth Weight,Blood Glucose/metabolism,Blood >>>>>>> Pressure,Brachial Artery/anatomy & >>>>>>> histology/*physiology/ultrasonography,Cardiovascular >>>>>>> Diseases/*epidemiology,Child,Cholesterol, >>>>>>> LDL/blood,England,Female,Humans,Lipids/blood,Muscle, Smooth, >>>>>>> Vascular/anatomy & >>>>>>> histology/physiology/ultrasonography,Parity,Pregnancy,Regional Blood >>>>>>> Flow,Risk Factors,Socioeconomic Factors,Tobacco Smoke >>>>>>> Pollution,*Vasodilation >>>>>>> >>>>>>> What could I check next? >>>>>>> >>>>>>> Thanks, >>>>>>> Ted >>>>>>> >>>>>>> On Wed, Jul 23, 2008 at 2:17 PM, Bridget Thomson McInnes >>>>>>> <bth...@cs...> wrote: >>>>>>>> >>>>>>>> Hi Ted >>>>>>>> >>>>>>>> I attached a test script to check. It is called : get-msh.pl >>>>>>>> >>>>>>>> Here is an example run: >>>>>>>> >>>>>>>> bthomson@caesar (~) % perl get-msh.pl 9337195 >>>>>>>> 9337195 : *Birth Weight,Blood Glucose/metabolism,Blood >>>>>>>> Pressure,Brachial >>>>>>>> Artery/anatomy & >>>>>>>> histology/*physiology/ultrasonography,Cardiovascular >>>>>>>> Diseases/*epidemiology,Child,Cholesterol, >>>>>>>> LDL/blood,England,Female,Humans,Lipids/blood,Muscle, Smooth, >>>>>>>> Vascular/anatomy & >>>>>>>> histology/physiology/ultrasonography,Parity,Pregnancy,Regional Blood >>>>>>>> Flow,Risk Factors,Socioeconomic Factors,Tobacco Smoke >>>>>>>> Pollution,*Vasodilation >>>>>>>> >>>>>>>> >>>>>>>> This is the same result that I get on my computer at school and here >>>>>>>> at >>>>>>>> work. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Bridget >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: >>>>>>>> >>>>>>>>> Hi Bridget, >>>>>>>>> >>>>>>>>> TDP.mm is just a copy of McInnesPC07.dir.mm from the Demos >>>>>>>>> directory. >>>>>>>>> Did you mean >>>>>>>>> that, or the actual output? >>>>>>>>> >>>>>>>>> I do have an internet connection so I don't think that's the >>>>>>>>> problem. >>>>>>>>> How would I know if >>>>>>>>> PubMed cut me off? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> Ted >>>>>>>>> >>>>>>>>> On Wed, Jul 23, 2008 at 12:40 PM, Bridget Thomson McInnes >>>>>>>>> <bth...@cs...> wrote: >>>>>>>>>> >>>>>>>>>> Hi Ted, >>>>>>>>>> >>>>>>>>>> I am not certain why this is happening. I don't have this problem. >>>>>>>>>> The >>>>>>>>>> mesh terms are obtained using the PubMed API. I can see two >>>>>>>>>> potential >>>>>>>>>> problems: >>>>>>>>>> 1. No internet connection >>>>>>>>>> - which I will put in the documentation! >>>>>>>>>> >>>>>>>>>> 2. Do you think PubMed cut you off? They have done that to me >>>>>>>>>> before. They just start rejecting my queries if they >>>>>>>>>> think I have been using it to much. I have not quite >>>>>>>>>> determined what to much is yet. I will write a >>>>>>>>>> check in the program to make certain that something >>>>>>>>>> is coming back and if not error out. >>>>>>>>>> >>>>>>>>>> Otherwise I can't think of what it is. It isn't like using the >>>>>>>>>> UMLSKS >>>>>>>>>> API >>>>>>>>>> where the ip address needs to be registered. I have only tested >>>>>>>>>> this >>>>>>>>>> inside NLM - my connection at the apartment goes in and out so I >>>>>>>>>> haven't >>>>>>>>>> been able to test this on my laptop. >>>>>>>>>> >>>>>>>>>> Could you send me your TDP.mm file? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> Bridget >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: >>>>>>>>>> >>>>>>>>>>> Hi Bridget, >>>>>>>>>>> >>>>>>>>>>> When using the Mesh option (--mesh) by itself, I seem to never >>>>>>>>>>> get >>>>>>>>>>> any >>>>>>>>>>> features... >>>>>>>>>>> >>>>>>>>>>> My arff files all look something like this... >>>>>>>>>>> >>>>>>>>>>> @RELATION pressure >>>>>>>>>>> @ATTRIBUTE Sense {M1,M2,M3,None} >>>>>>>>>>> @DATA >>>>>>>>>>> M1 % 97403834 >>>>>>>>>>> M1 % 98281278 >>>>>>>>>>> M1 % 98124304 >>>>>>>>>>> >>>>>>>>>>> And so I end up getting a majority classifier... >>>>>>>>>>> >>>>>>>>>>> Is there something I am supposed to be doing to get the mesh >>>>>>>>>>> features? >>>>>>>>>>> I am just running like this... >>>>>>>>>>> >>>>>>>>>>> supervised-disambiguate.pl TDP.mm --mesh >>>>>>>>>>> >>>>>>>>>>> where TDP.mm is a directory with all the NLM-WSD data in .mm >>>>>>>>>>> format >>>>>>>>>>> (one file per word). All the files >>>>>>>>>>> seem to be getting processed, and no errors are shown, but the >>>>>>>>>>> results >>>>>>>>>>> are pretty much just a majority >>>>>>>>>>> classifier (due to lack of features...) >>>>>>>>>>> >>>>>>>>>>> Any idea on this? >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> Ted >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ted Pedersen >>>>>>>>>>> http://www.d.umn.edu/~tpederse >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ted Pedersen >>>>>>>>> http://www.d.umn.edu/~tpederse >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ted Pedersen >>>>>>> http://www.d.umn.edu/~tpederse >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Ted Pedersen >>>>> http://www.d.umn.edu/~tpederse >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge >>>> Build the coolest Linux based applications with Moblin SDK & win great >>>> prizes >>>> Grand prize is a trip for two to an Open Source event anywhere in the >>>> world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ >>>> Cuitools-users mailing list >>>> Cui...@li... >>>> https://lists.sourceforge.net/lists/listinfo/cuitools-users >>>> >>> >> >> >> >> -- >> Ted Pedersen >> http://www.d.umn.edu/~tpederse >> > -- Ted Pedersen http://www.d.umn.edu/~tpederse |