From: Bridget T. M. <bth...@cs...> - 2008-07-25 13:49:07
|
Hi Ted, This sounds good. I will add it! Thanks, Bridget On Thu, 24 Jul 2008, Ted Pedersen wrote: > Hi Bridget, > > In your INSTALL document, I think we probably want to clarify what > people should be downloading by giving not only what they see on the > link (which you have below) but also the names of the files they get. > Also, I think what they download from the full test collection are the > first and second files there...? > > I see that you say in the INSTALL document we should use the PMID > version, and that seems fine to me...I did not use the PMID version > when I did my install so I'm going back to use PMID.... > > ======================== > The first in the ``Basic Test Collection'': 1. Basic Reviewed Set > > The second and third in the ``Full Test Collection'': 2. Common Files > 3. Full Reviewed Result Set > > Unpack the files in a directory called NLM-WSD (for example - you can > call it anything you would like). You should end up with three > directories in the NLM-WSD directory: 1. Basic_Reviewed_Results 2. > common 3. Reviewed_Results > ========================= > > We may also want caution the user that the file names for the PMID and > non-PMID versions are the same, and give them some idea of how to tell > them apart...and we should probably still have a check somewhere that > automatically verifies that we have the PMID format, not only in the > event that people download the wrong version, but that they might try > to use completely wrong sorts of data.... > > Thanks! > Ted > > On Thu, Jul 24, 2008 at 10:49 AM, Ted Pedersen <tpederse@d.umn.edu> wrote: > > Hi Bridget, > > > > One small question below... > > > > On Wed, Jul 23, 2008 at 4:00 PM, Bridget Thomson McInnes > > <bth...@cs...> wrote: > >> Hi Ted, > >> > >> You are using the wrong version of the NLM-WSD dataset. There are two > >> version, one in which the identifiers are PMIDs and the other which are UIs. > >> Unfortuently, the UIs one is the first one. > >> > >> If you go to the NLM-WSD website: http://wsd.nlm.nih.gov/ > >> > >> and log in, scroll down to the first white box. You will see: > >> > >> Switch to PMID identified version of the WSD Test Collection > >> > >> > >> It originally didn't matter which version that you used but now when using > >> the --mesh option or running the make-MMTxNLM-data.sh Demo - the PMID > >> version is required. > > > > I think I'm running the make-MMTxNLM-data.sh script on the same data I'm using > > for everything else (which isn't probably the PMID data....) Is there something > > I should see in the output that would let me know that it wasn't the PMID data? > > > > I can't remember if I asked this before, but I'm wondering if we should simply > > require the PMID data, to avoid having the user get two different > > forms of the same > > data (and potentially use them in the wrong places...) It sounds like > > some things > > will work with both forms of the data, but a few things will not (and require > > PMID), so that sort of suggests that PMID is the more "generic" form of the > > data....is there any downside to using PMID rather than the other form? > > > > Related to this, are there any other possible enhancements in the > > future (that we've > > already discussed) that would require PMID? > > > > What do you think? > > > > Thanks, > > Ted > > > >> > >> I will think about how to go about adding a check for that. I have done it > >> myself to many times as well. And state it clearer in the INSTALL > >> documentation. > >> > >> Sorry about that! > >> > >> Thanks, > >> > >> Bridget > >> > >> On Wed, 23 Jul 2008, Ted Pedersen wrote: > >> > >>> Hi Bridget, > >>> > >>> I'm using whichever one is used by the McInnes07 demo, since I'm just > >>> copying McInnes07.mm and using that as TDP.mm. > >>> > >>> Here's the first few lines of adjustment.mm, hopefully that will make > >>> it clear what I'm using - I *think* this is the PMID version since it > >>> has that number in it (which I think is the PMID)? > >>> > >>> <corpus lang='en'> > >>> <lexelt item="adjustment" senses="M1,M2,M3,None"> > >>> <instance id="98076825" alias="adjustment"> > >>> <answer instance="98076825.ab.7" senseid="M2"/> > >>> <context line="Influence of physiological factors on the > >>> age-related increase in blood pressure in healthy men. The independent > >>> a > >>> nd collective influences of several physiological factors on the > >>> age-related increase in blood pressure in healthy men were examined. T > >>> wenty-seven younger and 25 older, mostly normotensive, healthy men > >>> were studied. Blood pressure, body fat, body fat distribution, maxim > >>> al oxygen consumption (VO2max), plasma norepinephrine, dietary Na, and > >>> erythrocyte Na-K pump activity were measured. Older men showed 5 > >>> 7% higher percent body fat, 40% higher plasma norepinephrine > >>> concentration, 14% greater mean arterial blood pressure (MAP), and 5% > >>> high > >>> er plasma K concentration than younger men (all p < 0.01). Older men > >>> showed a 38% (p < 0.01) lower VO2max, 19% (p < 0.05) lower energy > >>> intake, 18% (p < 0.05) lower Na-K pump rate constant, and a 17% (p < > >>> 0.05) lower Na-K pump rate. Group means for MAP were adjusted for > >>> combinations of plasma norepinephrine, waist:thigh ratio, VO2max, and > >>> the Na-K pump rate constant, to determine if any one variable or > >>> combination could account for the age related increase in MAP. > >>> Statistical adjustment for plasma norepinephrine, waist:thigh ratio, > >>> and > >>> Na-K pump rate constant eliminated the significant difference between > >>> MAPs for the two groups. Thus, alterations in sympathetic nervou > >>> s system activity, body fat distribution, and the membrane Na-K pump > >>> activity independently contribute to the age-related increase in M > >>> AP in healthy men. "/> > >>> <sentence tw="" id="98076825.ti.1" line="Influence of > >>> physiological factors on the age-related increase in blood pressure in > >>> he > >>> althy men."> > >>> > >>> Does that look like the right format? > >>> > >>> Thanks! > >>> Ted > >>> > >>> On Wed, Jul 23, 2008 at 3:30 PM, Bridget Thomson McInnes > >>> <bth...@cs...> wrote: > >>>> > >>>> Hi Ted, > >>>> > >>>> What version of the NLM-WSD dataset are you using? The --mesh option > >>>> requires that the PMID version be used. > >>>> > >>>> Thanks! > >>>> > >>>> Bridget > >>>> > >>>> On Wed, 23 Jul 2008, Bridget Thomson McInnes wrote: > >>>> > >>>>> Hi Ted, > >>>>> > >>>>> I just downloaded the CuiTools from the webpage and got the same error. > >>>>> I > >>>>> will see why it is doing this! > >>>>> > >>>>> Thanks! > >>>>> > >>>>> Bridget > >>>>> > >>>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: > >>>>> > >>>>>> Hi Bridget, > >>>>>> > >>>>>> I'm in the process of running the --mesh option again, this time just > >>>>>> using adjustment and --mesh, as in... > >>>>>> > >>>>>> supervised-disambiguate.pl Demos/TDP.mm/adjustment.mm --mesh > >>>>>> --directory ted-adjustment-mesh > >>>>>> > >>>>>> One thing I've noticed in the previous cases with --mesh is that the > >>>>>> log directory is empty - which I guess means that no features were > >>>>>> found, or something....then of course the ARFF files don't have any > >>>>>> features in the them either, leading to the majority classifier... > >>>>>> > >>>>>> This isn't just specific to --mesh, but I do think it would be a good > >>>>>> idea to issue a warning or possibly even an error when no features are > >>>>>> found, just so the user doesn't end up getting a majority classifier > >>>>>> without realizing it - unless I had been curious about the Mesh > >>>>>> features I might not have noticed any of this, just because you do get > >>>>>> results back even after finding no features. I think it might be ok to > >>>>>> default to a majority classifier in this case, but we'd want the user > >>>>>> to know that this has happened... > >>>>>> > >>>>>> My adjustment run just finished, so I've attached a zip file with the > >>>>>> log, arff, weka and results directories... > >>>>>> > >>>>>> Thanks, > >>>>>> Ted > >>>>>> > >>>>>> On Wed, Jul 23, 2008 at 2:33 PM, Bridget Thomson McInnes > >>>>>> <bth...@cs...> wrote: > >>>>>>> > >>>>>>> Hi Ted, > >>>>>>> > >>>>>>> I am not certain. I am going to redownload the package, do a clean > >>>>>>> install > >>>>>>> and try it again. Hopefully I will be able to recreate it. > >>>>>>> > >>>>>>> Thanks! > >>>>>>> > >>>>>>> Bridget > >>>>>>> > >>>>>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: > >>>>>>> > >>>>>>>> Hi Bridget, > >>>>>>>> > >>>>>>>> Thanks for this script - I ran it and it seems to give me back a fair > >>>>>>>> number of results...so I seem to be able to access PubMed ok... > >>>>>>>> > >>>>>>>> marimba(4): perl get-mesh.pl 9337195 > >>>>>>>> 9337195 : *Birth Weight,Blood Glucose/metabolism,Blood > >>>>>>>> Pressure,Brachial Artery/anatomy & > >>>>>>>> histology/*physiology/ultrasonography,Cardiovascular > >>>>>>>> Diseases/*epidemiology,Child,Cholesterol, > >>>>>>>> LDL/blood,England,Female,Humans,Lipids/blood,Muscle, Smooth, > >>>>>>>> Vascular/anatomy & > >>>>>>>> histology/physiology/ultrasonography,Parity,Pregnancy,Regional Blood > >>>>>>>> Flow,Risk Factors,Socioeconomic Factors,Tobacco Smoke > >>>>>>>> Pollution,*Vasodilation > >>>>>>>> > >>>>>>>> What could I check next? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Ted > >>>>>>>> > >>>>>>>> On Wed, Jul 23, 2008 at 2:17 PM, Bridget Thomson McInnes > >>>>>>>> <bth...@cs...> wrote: > >>>>>>>>> > >>>>>>>>> Hi Ted > >>>>>>>>> > >>>>>>>>> I attached a test script to check. It is called : get-msh.pl > >>>>>>>>> > >>>>>>>>> Here is an example run: > >>>>>>>>> > >>>>>>>>> bthomson@caesar (~) % perl get-msh.pl 9337195 > >>>>>>>>> 9337195 : *Birth Weight,Blood Glucose/metabolism,Blood > >>>>>>>>> Pressure,Brachial > >>>>>>>>> Artery/anatomy & > >>>>>>>>> histology/*physiology/ultrasonography,Cardiovascular > >>>>>>>>> Diseases/*epidemiology,Child,Cholesterol, > >>>>>>>>> LDL/blood,England,Female,Humans,Lipids/blood,Muscle, Smooth, > >>>>>>>>> Vascular/anatomy & > >>>>>>>>> histology/physiology/ultrasonography,Parity,Pregnancy,Regional Blood > >>>>>>>>> Flow,Risk Factors,Socioeconomic Factors,Tobacco Smoke > >>>>>>>>> Pollution,*Vasodilation > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> This is the same result that I get on my computer at school and here > >>>>>>>>> at > >>>>>>>>> work. > >>>>>>>>> > >>>>>>>>> Thanks! > >>>>>>>>> > >>>>>>>>> Bridget > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: > >>>>>>>>> > >>>>>>>>>> Hi Bridget, > >>>>>>>>>> > >>>>>>>>>> TDP.mm is just a copy of McInnesPC07.dir.mm from the Demos > >>>>>>>>>> directory. > >>>>>>>>>> Did you mean > >>>>>>>>>> that, or the actual output? > >>>>>>>>>> > >>>>>>>>>> I do have an internet connection so I don't think that's the > >>>>>>>>>> problem. > >>>>>>>>>> How would I know if > >>>>>>>>>> PubMed cut me off? > >>>>>>>>>> > >>>>>>>>>> Thanks! > >>>>>>>>>> Ted > >>>>>>>>>> > >>>>>>>>>> On Wed, Jul 23, 2008 at 12:40 PM, Bridget Thomson McInnes > >>>>>>>>>> <bth...@cs...> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Ted, > >>>>>>>>>>> > >>>>>>>>>>> I am not certain why this is happening. I don't have this problem. > >>>>>>>>>>> The > >>>>>>>>>>> mesh terms are obtained using the PubMed API. I can see two > >>>>>>>>>>> potential > >>>>>>>>>>> problems: > >>>>>>>>>>> 1. No internet connection > >>>>>>>>>>> - which I will put in the documentation! > >>>>>>>>>>> > >>>>>>>>>>> 2. Do you think PubMed cut you off? They have done that to me > >>>>>>>>>>> before. They just start rejecting my queries if they > >>>>>>>>>>> think I have been using it to much. I have not quite > >>>>>>>>>>> determined what to much is yet. I will write a > >>>>>>>>>>> check in the program to make certain that something > >>>>>>>>>>> is coming back and if not error out. > >>>>>>>>>>> > >>>>>>>>>>> Otherwise I can't think of what it is. It isn't like using the > >>>>>>>>>>> UMLSKS > >>>>>>>>>>> API > >>>>>>>>>>> where the ip address needs to be registered. I have only tested > >>>>>>>>>>> this > >>>>>>>>>>> inside NLM - my connection at the apartment goes in and out so I > >>>>>>>>>>> haven't > >>>>>>>>>>> been able to test this on my laptop. > >>>>>>>>>>> > >>>>>>>>>>> Could you send me your TDP.mm file? > >>>>>>>>>>> > >>>>>>>>>>> Thanks! > >>>>>>>>>>> > >>>>>>>>>>> Bridget > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, 23 Jul 2008, Ted Pedersen wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi Bridget, > >>>>>>>>>>>> > >>>>>>>>>>>> When using the Mesh option (--mesh) by itself, I seem to never > >>>>>>>>>>>> get > >>>>>>>>>>>> any > >>>>>>>>>>>> features... > >>>>>>>>>>>> > >>>>>>>>>>>> My arff files all look something like this... > >>>>>>>>>>>> > >>>>>>>>>>>> @RELATION pressure > >>>>>>>>>>>> @ATTRIBUTE Sense {M1,M2,M3,None} > >>>>>>>>>>>> @DATA > >>>>>>>>>>>> M1 % 97403834 > >>>>>>>>>>>> M1 % 98281278 > >>>>>>>>>>>> M1 % 98124304 > >>>>>>>>>>>> > >>>>>>>>>>>> And so I end up getting a majority classifier... > >>>>>>>>>>>> > >>>>>>>>>>>> Is there something I am supposed to be doing to get the mesh > >>>>>>>>>>>> features? > >>>>>>>>>>>> I am just running like this... > >>>>>>>>>>>> > >>>>>>>>>>>> supervised-disambiguate.pl TDP.mm --mesh > >>>>>>>>>>>> > >>>>>>>>>>>> where TDP.mm is a directory with all the NLM-WSD data in .mm > >>>>>>>>>>>> format > >>>>>>>>>>>> (one file per word). All the files > >>>>>>>>>>>> seem to be getting processed, and no errors are shown, but the > >>>>>>>>>>>> results > >>>>>>>>>>>> are pretty much just a majority > >>>>>>>>>>>> classifier (due to lack of features...) > >>>>>>>>>>>> > >>>>>>>>>>>> Any idea on this? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks! > >>>>>>>>>>>> Ted > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Ted Pedersen > >>>>>>>>>>>> http://www.d.umn.edu/~tpederse > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Ted Pedersen > >>>>>>>>>> http://www.d.umn.edu/~tpederse > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Ted Pedersen > >>>>>>>> http://www.d.umn.edu/~tpederse > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Ted Pedersen > >>>>>> http://www.d.umn.edu/~tpederse > >>>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's > >>>>> challenge > >>>>> Build the coolest Linux based applications with Moblin SDK & win great > >>>>> prizes > >>>>> Grand prize is a trip for two to an Open Source event anywhere in the > >>>>> world > >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>>>> _______________________________________________ > >>>>> Cuitools-users mailing list > >>>>> Cui...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/cuitools-users > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Ted Pedersen > >>> http://www.d.umn.edu/~tpederse > >>> > >> > > > > > > > > -- > > Ted Pedersen > > http://www.d.umn.edu/~tpederse > > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > |