Diana,
Which method would you suggest as being the better for cleaning the variables:
1) the Clean.jape file
2) Annotation Set Transfer
Ultimately which method do you prefer?
Dave
-----Original Message-----
From: Diana Maynard [mailto:d.maynard@...]
Sent: Tuesday, March 30, 2010 5:04 PM
To: Harrill, David C
Cc: gate
Subject: Re: [gate-users] JAPE_URL_Rule
hi Dave
There's only one rule in clean.jape at the moment, so I guess you mean
the order of the annotations listed within the rule, rather than the
order of the rules.
Since they're all joined by "or" statements, the order makes no difference.
Just add any other annotations to the Input line and anywhere you like
within the rule, separated by more "or" statements.
Diana
Harrill, David C wrote:
> That helps tremendously. Is there any order that they need to be called in the clean.jape file? Or can they be in any order?
>
> Thanks,
> Dave
>
> -----Original Message-----
> From: Diana Maynard [mailto:d.maynard@...]
> Sent: Tuesday, March 30, 2010 4:55 PM
> To: Harrill, David C; gate
> Subject: Re: [gate-users] JAPE_URL_Rule
>
> Hi David
> If you get temporary annotations that are not removed by the clean.jape
> file, it may be because they are of a new type that the clean.jape file
> doesn't know about.
> For example if you have made up your own temporary annotations, then you
> need to add these to the rule (and input list) in clean.jape.
>
> It's not necessary to run the clean.jape, but it just tidies things up
> so that you don't see all those temporary annotations that are not
> useful any more.
>
> You could also just add a remove statement in any "final" kind of rules
> you generate when you don't need the temporary annotations any more. I
> think there are a few examples of this in the ANNIE grammars too.
> Personally, I find it useful to have them all in one place (where you
> can also suppress the clean.jape from firing) for testing purposes -
> sometimes the temporary annotations cause some side effect, so
> suppressing the deletion of them (temporarily, for testing) makes it
> clearer what's happening.
>
> Does that help?
> Diana
>
>
>
> Harrill, David C wrote:
>> I have referred to the clean.jape file and I do understand the
>> concept of what it is doing. That being said, should the clean.jape
>> always be run after everything. If that's the case I have still found
>> situations that occur where I have items that are still annotated as
>> the Temp variables.
>>
>> Thanks, Dave
>>
>>
>>
>> -----Original Message----- From: Diana Maynard
>> [mailto:d.maynard@...] Sent: Tuesday, March 30, 2010 11:54
>> AM To: Harrill, David C Cc: gate-users@... Subject:
>> Re: [gate-users] JAPE_URL_Rule
>>
>> Right, glad you solved the mystery - I thought it must be something
>> you'd added! To prevent these kind of issues in future, it might be
>> worth comparing the default ANNIE with your current application to
>> check whether it's your modified version or the original ANNIE which
>> is causing the problem in question.
>>
>> As for "preventing all temp variables appearing", do you mean you
>> want to delete them? Or just that you don't want to see them in the
>> annotation list?
>>
>> If the former, then you can just add some grammar rule to remove them
>> once you've finished using them. See the clean.jape for an example.
>> If you just don't want to see them in your final annotation set, then
>> try using an Annotation Set transfer to transfer the relevant
>> annotations (that you do want to see) to a new set.
>>
>> Diana
>>
>>
>>
>> Harrill, David C wrote:
>>> Diana,
>>>
>>> I actually caused the issue myself, I discovered I added something
>>> to the "PersonFull" rule which caused annotations to be ignored. As
>>> an example: John Frank Smith smith_john@....
>>>
>>> It was annotating "John Frank Smith smith" as a Person. It
>>> annotated the "smith_john@..." under the emailaddress rule as
>>> opposed to the emailfinal rule. It is certainly strange behavior
>>> that I have not encountered before.
>>>
>>> Are you aware of an efficient manner to prevent all of the temp
>>> variables (such as boundary annotations and range annotations we
>>> have discussed in previous posts) from appearing. These annotations
>>> ultimately do not have any real meaning except in the final rule
>>> that is established.
>>>
>>> Thanks, Dave
>>>
>>> -----Original Message----- From: Diana Maynard
>>> [mailto:d.maynard@...] Sent: Tuesday, March 30, 2010
>>> 9:29 AM To: Harrill, David C Cc: gate-users@...
>>> Subject: Re: [gate-users] JAPE_URL_Rule
>>>
>>> Hmm that is odd because I can't see any other rules that generate
>>> "Email" annotations except the one you mention. My only explanation
>>> is that some other rule in final.jape is overriding the Email one
>>> in those exceptional cases, and so the original email rule is left
>>> intact. Can you send some examples of the case where this happens?
>>> (include the context and any annotations attached to the context)
>>> Diana
>>>
>>>
>>>
>>> Harrill, David C wrote:
>>>
>>>> Diana,
>>>>
>>>> I was mistaken and the actual rule itself refers to the email
>>>> rule. As you know it is as follows:
>>>>
>>>> Phase: Email Input: Token Lookup SpaceToken Options:
>>>> control = appelt
>>>>
>>>> Rule:Emailaddress1 Priority: 50 ( All of the email logic )
>>>> :emailAddress --> :emailAddress.Email= {kind = "emailAddress",
>>>> rule = "Emailaddress1"}
>>>>
>>>> The rule is subsequently called in the final.jape file and is as
>>>> follows:
>>>>
>>>> Rule: EmailFinal Priority: 100 ( {Email} ) :email --> { //removes
>>>> Email annotation, gets the rule feature and adds a new email
>>>> annotation gate.AnnotationSet email =
>>>> (gate.AnnotationSet)bindings.get("email"); gate.Annotation
>>>> emailAnn = (gate.Annotation)email.iterator().next();
>>>> gate.FeatureMap features = Factory.newFeatureMap();
>>>> features.put("rule1", emailAnn.getFeatures().get("rule"));
>>>> features.put("rule2", "EmailFinal"); features.put("kind",
>>>> "email"); outputAS.add(email.firstNode(), email.lastNode(),
>>>> "EmailAddress", features); outputAS.removeAll(email); }
>>>>
>>>> What ends up happening is that the annotations predominantly get
>>>> identified as: Rule1=Email1, Rule2=EmailFinal And the one or two
>>>> remaining get flagged under the Rule EmailAddress1 which
>>>> basically follows the same logic is the other rules with temp
>>>> variables. I don't understand why it was properly annotated but
>>>> listed with the attributes of the EmailAddress1 rule and not the
>>>> EmailFinal Rule. I hope I didn't confuse things at all.
>>>>
>>>> Thanks, Dave
>>>>
>>>> -----Original Message----- From: Diana Maynard
>>>> [mailto:d.maynard@...] Sent: Tuesday, March 30, 2010
>>>> 5:48 AM To: Harrill, David C Cc: gate-users@...
>>>> Subject: Re: [gate-users] JAPE_URL_Rule
>>>>
>>>> Hi David I cannot find any "tempurl" annotation in any of the
>>>> default JAPE files in ANNIE. Are you sure it's not something you
>>>> added? If so, then I would need to see the grammar rules you
>>>> added in order to understand why it's happening.
>>>>
>>>> Regards Diana
>>>>
>>>>
>>>>
>>>> Harrill, David C wrote:
>>>>
>>>>
>>>>> Thanks Diana,
>>>>>
>>>>> I actually found that there was an existing rule within the
>>>>> final.jape file that was interfering. I have another question
>>>>> for you and it is as follows:
>>>>>
>>>>> 1) You have a .jape file with a rule. In this case we can use
>>>>> our url.jape file. 2) You have a single rule (URL1) in which
>>>>> you specify a temp variable of tempurl. 3) You call that temp
>>>>> variable within your final.jape file under a rule called
>>>>> URLFinal. 4) Run the processing resources and examine what was
>>>>> annotated.
>>>>>
>>>>> Why is it that 98% of what was found is annotated correctly as
>>>>> a URL and have: Rule1=URL1, Rule2=URLFinal However 2% of what
>>>>> was found is annotated as tempurl and has: Rule:URL1
>>>>>
>>>>> The same rule is identifying these items just a small
>>>>> percentage are being captured in that temp variable. Can you
>>>>> explain why that would be the case?
>>>>>
>>>>> As always thank you very much, Dave
>>>>>
>>>>> -----Original Message----- From: Diana Maynard
>>>>> [mailto:d.maynard@...] Sent: Monday, March 29, 2010
>>>>> 2:01 PM To: Harrill, David C Cc:
>>>>> gate-users@... Subject: Re: [gate-users]
>>>>> JAPE_URL_Rule
>>>>>
>>>>> Hi Dave The problem with this is that you can't successfully
>>>>> combine two negatives like this together with an OR statement.
>>>>>
>>>>> You're basically annotating EITHER "anything except a single
>>>>> quote" OR "anything except a double quote". So a double quote
>>>>> will match the first of these, and the single quote will match
>>>>> the second.
>>>>>
>>>>> I think you need an AND rather than an OR. Try ({Token.string
>>>>> != "'", Token.string != "\""}) instead. Or, first match all
>>>>> kinds of quote mark (I think there is actually a predefined
>>>>> Token feature for this, something like "Token.puncttype" but I
>>>>> could be wrong - if not you could define one, or just use a
>>>>> previous phase to match. Diana
>>>>>
>>>>>
>>>>> On 29/03/2010 18:48, Harrill, David C wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Along the same problem, I created an additional rule called
>>>>>> URL2 which is as follows:
>>>>>>
>>>>>> Rule: Url2 Priority: 50
>>>>>>
>>>>>> ( ({Token})* ({DNExtensionPost}) ):urlAddress -->
>>>>>> :urlAddress2.Url = {kind = "urlAddress", rule = "Url2"}
>>>>>>
>>>>>> I had some cases where I had the following in the text -->
>>>>>> 'test.COM' and "test@..."
>>>>>>
>>>>>> What I did was added this for the single quotes
>>>>>>
>>>>>> Rule: Url2 Priority: 50 ({Token.string != "'"}) ( ({Token})*
>>>>>> ({DNExtensionPost}) ):urlAddress --> :urlAddress2.Url = {kind
>>>>>> = "urlAddress", rule = "Url2"}
>>>>>>
>>>>>> I ran the rule and it worked perfectly fine and annotated -->
>>>>>> test.COM I subsequently wanted to get rid of the double
>>>>>> quotes so I did the following:
>>>>>>
>>>>>> Rule: Url2 Priority: 50 ({Token.string != "'"}|{Token.string
>>>>>> != "\""}) ( ({Token})* ({DNExtensionPost}) ):urlAddress -->
>>>>>> :urlAddress2.Url = {kind = "urlAddress", rule = "Url2"}
>>>>>>
>>>>>> After I added this it was almost as if the line with my
>>>>>> Token.string != were ignored and it went back to annotating
>>>>>> the items as both "test.com and 'test.com
>>>>>>
>>>>>> I checked to see if there was another rule interfering and I
>>>>>> could not locate one. Is this similar behavior to our
>>>>>> SpaceToken issue previously discussed?
>>>>>>
>>>>>> Thanks, Dave
>>>>>>
>>>>>> -----Original Message----- From: Diana Maynard
>>>>>> [mailto:d.maynard@...] Sent: Monday, March 29,
>>>>>> 2010 1:25 PM To: Harrill, David C Cc:
>>>>>> gate-users@... Subject: Re: [gate-users]
>>>>>> JAPE_URL_Rule
>>>>>>
>>>>>> Sounds like there's something else going on...do double check
>>>>>> you've not done anything stupid like load the wrong grammar,
>>>>>> or that it's a different rule that's actually firing. Is the
>>>>>> Space following the URL annotated too in that rule? Diana
>>>>>>
>>>>>>
>>>>>> On 29/03/2010 18:19, Harrill, David C wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> What's odd is that the comma is followed by a space. Not
>>>>>>> exactly sure why the Rule isn't capturing it with the
>>>>>>> SpaceToken in there.
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message----- From: Diana Maynard
>>>>>>> [mailto:d.maynard@...] Sent: Monday, March 29,
>>>>>>> 2010 12:44 PM To: Harrill, David C Cc:
>>>>>>> gate-users@... Subject: Re: [gate-users]
>>>>>>> JAPE_URL_Rule
>>>>>>>
>>>>>>> What comes immediately after your pattern that's matched? I
>>>>>>> was assuming that the comma was followed by a space (but
>>>>>>> was going to suggest that if it wasn't, then you might need
>>>>>>> to remove the space) Diana
>>>>>>>
>>>>>>> On 29/03/2010 17:06, Harrill, David C wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> If I remove the ({SpaceToken}) It does work correctly.
>>>>>>>> Can you explain what the SpaceToken does in this
>>>>>>>> instance?
>>>>>>>>
>>>>>>>> -----Original Message----- From: Diana Maynard
>>>>>>>> [mailto:d.maynard@...] Sent: Monday, March 29,
>>>>>>>> 2010 11:59 AM To: Harrill, David C Cc:
>>>>>>>> gate-users@... Subject: Re:
>>>>>>>> [gate-users] JAPE_URL_Rule
>>>>>>>>
>>>>>>>> Ah yes of course...because the comma will be followed by
>>>>>>>> a space. How about the following?
>>>>>>>>
>>>>>>>> Rule: Url1 Priority: 50
>>>>>>>>
>>>>>>>> ( {UrlPre} ({Token})* ({Token.string !=","}) ):urlAddress
>>>>>>>> ({SpaceToken})
>>>>>>>>
>>>>>>>> -->
>>>>>>>>
>>>>>>>> That way you have in your pattern one or more Tokens of
>>>>>>>> which the last one cannot be a comma, following the
>>>>>>>> URLPre. Does that satisfy your constraints? Diana
>>>>>>>>
>>>>>>>>
>>>>>>>> On 29/03/2010 16:46, Harrill, David C wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hey Diana,
>>>>>>>>>
>>>>>>>>> I did think of that but I would still get the following
>>>>>>>>> URL annotated:
>>>>>>>>>
>>>>>>>>> http://www.AOL.COM,
>>>>>>>>>
>>>>>>>>> Does that make sense?
>>>>>>>>>
>>>>>>>>> -----Original Message----- From: Diana Maynard
>>>>>>>>> [mailto:d.maynard@...] Sent: Monday, March
>>>>>>>>> 29, 2010 11:42 AM To: Harrill, David C Cc:
>>>>>>>>> gate-users@... Subject: Re:
>>>>>>>>> [gate-users] JAPE_URL_Rule
>>>>>>>>>
>>>>>>>>> Hi David Presumably you still want to match the URL
>>>>>>>>> when followed by a comma, but you just don't want to
>>>>>>>>> include the comma. So why not do this:
>>>>>>>>>
>>>>>>>>> Rule: Url1 Priority: 50
>>>>>>>>>
>>>>>>>>> ( {UrlPre} ({Token})* ):urlAddress
>>>>>>>>>
>>>>>>>>> ({SpaceToken}|{Token.string == ","})
>>>>>>>>>
>>>>>>>>> -->
>>>>>>>>>
>>>>>>>>> :urlAddress.Url = {kind = "urlAddress", rule = "Url1"}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 29/03/2010 16:24, Harrill, David C wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I have a quick and easy question as it pertains to
>>>>>>>>>> the existing URL rule. As you know the rule is as
>>>>>>>>>> follows:
>>>>>>>>>>
>>>>>>>>>> Phase: Url
>>>>>>>>>>
>>>>>>>>>> Input: Lookup SpaceToken Token UrlPre
>>>>>>>>>>
>>>>>>>>>> Options: control = applet
>>>>>>>>>>
>>>>>>>>>> Rule: Url1
>>>>>>>>>>
>>>>>>>>>> Priority: 50
>>>>>>>>>>
>>>>>>>>>> (
>>>>>>>>>>
>>>>>>>>>> {UrlPre}
>>>>>>>>>>
>>>>>>>>>> ({Token})*
>>>>>>>>>>
>>>>>>>>>> ):urlAddress
>>>>>>>>>>
>>>>>>>>>> ({SpaceToken})
>>>>>>>>>>
>>>>>>>>>> -->
>>>>>>>>>>
>>>>>>>>>> :urlAddress.Url = {kind = "urlAddress", rule =
>>>>>>>>>> "Url1"}
>>>>>>>>>>
>>>>>>>>>> I would like to make sure that this rule does not
>>>>>>>>>> capture such things as commas at the end of the Token
>>>>>>>>>> String. What is the easiest manner to accomplish
>>>>>>>>>> this. I understand that I can do the following:
>>>>>>>>>>
>>>>>>>>>> ({Token.string != ","})
>>>>>>>>>>
>>>>>>>>>> I'm just not certain where to place it. I figured it
>>>>>>>>>> would be where SpaceToken is located with the or
>>>>>>>>>> syntax.
>>>>>>>>>>
>>>>>>>>>> Dave
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>> Download Intel® Parallel Studio Eval Try the
>>>>>>>>>> new software tools for yourself. Speed compiling,
>>>>>>>>>> find bugs proactively, and fine-tune applications for
>>>>>>>>>> parallel performance. See why Intel Parallel Studio
>>>>>>>>>> got high marks during beta.
>>>>>>>>>> http://p.sf.net/sfu/intel-sw-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> GATE-users mailing list
>>>>>>>>>> GATE-users@...
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gate-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> Download Intel® Parallel Studio Eval Try the new
>>>>>>>> software tools for yourself. Speed compiling, find bugs
>>>>>>>> proactively, and fine-tune applications for parallel
>>>>>>>> performance. See why Intel Parallel Studio got high marks
>>>>>>>> during beta. http://p.sf.net/sfu/intel-sw-dev
>>>>>>>> _______________________________________________
>>>>>>>> GATE-users mailing list GATE-users@...
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gate-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
|