Revision: 15318
http://gate.svn.sourceforge.net/gate/?rev=15318&view=rev
Author: ian_roberts
Date: 2012-02-03 16:35:31 +0000 (Fri, 03 Feb 2012)
Log Message:
-----------
Substantial rewrite of the JAPE chapter, and removed the japeimpl appendix
altogether.
Modified Paths:
--------------
userguide/trunk/jape.tex
userguide/trunk/tao_main.tex
Removed Paths:
-------------
userguide/trunk/fsm-dfa-example.png
userguide/trunk/fsm-nfa-example.png
userguide/trunk/japeimpl.tex
Deleted: userguide/trunk/fsm-dfa-example.png
===================================================================
(Binary files differ)
Deleted: userguide/trunk/fsm-nfa-example.png
===================================================================
(Binary files differ)
Modified: userguide/trunk/jape.tex
===================================================================
--- userguide/trunk/jape.tex 2012-02-03 16:19:33 UTC (rev 15317)
+++ userguide/trunk/jape.tex 2012-02-03 16:35:31 UTC (rev 15318)
@@ -154,55 +154,13 @@
on the LHS of your JAPE grammar.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsect{Matching a Simple Text String}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-To match a simple text string, you need to refer to a feature on an
-annotation that contains the string; for example,
-
-\begin{small}
-\begin{verbatim}
-{Token.string == "of"}
-\end{verbatim}
-\end{small}
-
-The following grammar shows a sequence of strings being matched. Bracketing,
-along with the `or' operator, is used to define how the strings should come
-together:
-
-\begin{small}
-\begin{verbatim}
-Phase: UrlPre
-Input: Token SpaceToken
-Options: control = appelt
-
-Rule: Urlpre
-
-( (({Token.string == "http"} |
- {Token.string == "ftp"})
- {Token.string == ":"}
- {Token.string == "/"}
- {Token.string == "/"}
- ) |
- ({Token.string == "www"}
- {Token.string == "."}
- )
-):urlpre
--->
-:urlpre.UrlPre = {rule = "UrlPre"}
-\end{verbatim}
-\end{small}
-
-Alternatively you could use the `string' metaproperty. See
-Section~\ref{sec:jape:metaproperties} for an example of using metaproperties.
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsect{Matching Entire Annotation Types}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-You can specify the presence of an annotation previously assigned
-from a gazetteer, tokeniser, or other module. For example, the following will
-match a Lookup annotation:
+The simplest pattern in JAPE is to match any single annotation of a particular
+annotation type. You can match only annotation types you specified in the
+``Input'' line at the top of the file. For example, the following will match
+any Lookup annotation:
\begin{small}
\begin{verbatim}
@@ -210,45 +168,13 @@
\end{verbatim}
\end{small}
-The following will match if there is \emph{not} a Lookup
-annotation at this location:
-
-\begin{small}
-\begin{verbatim}
-{!Lookup}
-\end{verbatim}
-\end{small}
-
-The following rule shows several different annotation types being matched. We
-also see a string being matched, and again, the use of the `or' operator:
-
-\begin{small}
-\begin{verbatim}
-Rule: Known
-Priority: 100
-(
- {Location}|
- {Person}|
- {Date}|
- {Organization}|
- {Address}|
- {Money} |
- {Percent}|
- {Token.string == "Dear"}|
- {JobTitle}|
- {Lookup}
-):known
--->
-{}
-\end{verbatim}
-\end{small}
-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsect{Using Attributes and Values}
+\subsect{Using Features and Values}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-You can specify the attributes (and values) of an annotation to be matched.
-Several operators are supported; see Section~\ref{sec:jape:operators} for full details:
+You can specify the features (and values) of an annotation to be matched.
+Several operators are supported; see Section~\ref{sec:jape:operators} for full
+details:
\begin{itemize}
\item \verb|{Token.kind == "number"}|, \verb|{Token.length != 4}| - equality
and inequality.
@@ -257,8 +183,9 @@
\item \verb|{Token.string =~ "[Dd]ogs"}|,
\verb|{Token.string !~ "(?i)hello"}| - regular expression. \verb|==~| and
\verb|!=~| are also provided, for whole-string matching.
- \item \verb|{X contains Y}| and \verb|{X within Y}| for checking annotations
- within the context of other annotations.
+ \item \verb|{X contains Y}|, \verb|{X notContains Y}|, \verb|{X within Y}|
+ and \verb|{X notWithin Y}| for checking annotations within the context of
+ other annotations.
\end{itemize}
In the following rule, the `category' feature of the `Token' annotation is
@@ -299,36 +226,153 @@
\end{verbatim}
\end{small}
-\verb=@string= is also available in assignments on the right-hand side:
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsect[sec:jape:compositionaloperators]{Building complex patterns from simple patterns}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+So far we have seen how to build a simple pattern that matches a single
+annotation, optionally with a constraint on one of its features or
+meta-properties, but to do anything useful with JAPE you will need to combine
+these simple patterns into more complex ones.
+
+\subsubsect{Sequences, alternatives and grouping}
+
+Patterns can be matched in sequence, for example:
\begin{small}
\begin{verbatim}
-{X@... > 5}:label-->:label.New = {somefeat = :label.X@... }
+Rule: InLocation
+(
+ {Token.category == "IN"}
+ {Location}
+):inLoc
\end{verbatim}
-\end{small}
+\end{small}
+matches a Token annotation of category ``IN'' followed by a Location
+annotation. Note that ``followed by'' in JAPE depends on the annotation types
+specified in the Input line -- the above pattern matches a Token annotation and
+a Location annotation provided there are no intervening annotations of a type
+listed in the Input line. The Token and Location will {\em not} necessarily be
+immediately adjacent (they would probably be separated by an intervening space).
+In particular the pattern would {\em not} match if ``SpaceToken'' were
+specified in the Input line.
-As well as getting the string covered by a single annotation it is also
-possible to omit the annotation type and get the string spanned by all the
-annotations bound to a label, for example:
+The vertical bar ``\verb!|!'' is used to denote alternatives. For example
+\begin{small}
+\begin{verbatim}
+Rule: InOrAdjective
+(
+ {Token.category == "IN"} | {Token.category == "JJ"}
+):inLoc
+\end{verbatim}
+\end{small}
+would match {\em either} a Token whose category is ``IN'' {\em or} one whose
+category is ``JJ''.
+Parentheses are used to group patterns:
\begin{small}
\begin{verbatim}
+Rule: InLocation
(
- {X@... > 5}
- ({Y}+):ys
-):label
--->
-:label.New = { somefeat = :ys@... }
+ ({Token.category == "IN"} | {Token.category == "JJ"})
+ {Location}
+):inLoc
\end{verbatim}
\end{small}
+matches a Token with one or other of the two category values, followed by a
+Location, whereas:
+\begin{small}
+\begin{verbatim}
+Rule: InLocation
+(
+ {Token.category == "IN"} |
+ ( {Token.category == "JJ"}
+ {Location} )
+):inLoc
+\end{verbatim}
+\end{small}
+would match either an ``IN'' Token or a sequence of ``JJ'' Token and Location.
-If several Y annotations were included in the match, the New annotation's
-feature would be set to the string starting at the beginning of the leftmost Y
-that was matched and ending at the end of the rightmost one.
+\subsubsect{Repetition}
-Similarly, the `meta-properties' \verb=@length= and \verb=@cleanString= can also be accessed on the right-hand side.
+JAPE also provides repetition operators to allow a pattern in parentheses to be
+optional (?), or to match zero or more (*), one or more (+) or some specified
+number of times. In the following example, you can see the `$\mid $' and `?'
+operators being used:
+\begin{small}
+\begin{verbatim}
+Rule: LocOrganization
+Priority: 50
+(
+ ({Lookup.majorType == location} |
+ {Lookup.majorType == country_adj})
+{Lookup.majorType == organization}
+({Lookup.majorType == organization})?
+)
+:orgName -->
+ :orgName.TempOrganization = {kind = "orgName", rule=LocOrganization}
+\end{verbatim}
+\end{small}
+
+\subsubsect[sec:jape:ranges]{Range Notation}
+
+Repetition ranges are specified using square brackets.
+\begin{small}
+\begin{verbatim}({Token})[1,3]\end{verbatim}
+\end{small} matches one to three Tokens
+in a row. \begin{small}
+\begin{verbatim}({Token.kind == number})[3]\end{verbatim}
+\end{small} matches
+exactly 3 number Tokens in a row.
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsect{Matching a Simple Text String}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+JAPE operates over annotations so it cannot match strings of text in the
+document directly. To match a string you need to match an annotation that
+covers that string, typically a ``Token''. The GATE Tokeniser adds a
+``string'' feature to all the Token annotations containing the string that the
+Token covers, so you can use this (or the \verb|@string| meta property) to
+match text in your document.
+
+\begin{small}
+\begin{verbatim}
+{Token.string == "of"}
+\end{verbatim}
+\end{small}
+
+The following grammar shows a sequence of strings being matched.
+
+\begin{small}
+\begin{verbatim}
+Phase: UrlPre
+Input: Token SpaceToken
+Options: control = appelt
+
+Rule: Urlpre
+
+( (({Token.string == "http"} |
+ {Token.string == "ftp"})
+ {Token.string == ":"}
+ {Token.string == "/"}
+ {Token.string == "/"}
+ ) |
+ ({Token.string == "www"}
+ {Token.string == "."}
+ )
+):urlpre
+-->
+:urlpre.UrlPre = {rule = "UrlPre"}
+\end{verbatim}
+\end{small}
+
+Since we are matching annotations and not text, you must be careful that the
+strings you ask for are in fact single tokens. In the example above,
+\verb|{Token.string == "://"}| would never match (assuming the default ANNIE
+Tokeniser) as the three characters are treated as separate tokens.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsect[sec:jape:templates]{Using Templates}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -557,58 +601,51 @@
\end{small}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsect[sec:jape:context]{Using Context}
+\subsect[sec:jape:multiconstraint]{Multi-Constraint Statements}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-Context can be dealt with in the grammar rules in the following way.
-The pattern to be annotated is always enclosed by a set of round
-brackets. If preceding context is to be included in the rule, this is
-placed before this set of brackets. This context is described in
-exactly the same way as the pattern to be matched. If context
-following the pattern needs to be included, it is placed after the
-label given to the annotation. Context is used where a pattern should
-only be recognised if it occurs in a certain situation, but the
-context itself does not form part of the pattern to be annotated.
+In the examples we have seen so far, most statements have contained only one
+constraint. For example, in this statement, the `category' of `Token' must
+equal `NNP':
-For example, the following rule for Time (assuming an appropriate
-macro for `year') would mean that a year would only
-be recognised if it occurs preceded by the words `in' or `by':
-
\begin{small}
\begin{verbatim}
-Rule: YearContext1
-
-({Token.string == "in"}|
- {Token.string == "by"}
-)
-(YEAR)
-:date -->
- :date.Timex = {kind = "date", rule = "YearContext1"}
+Rule: Unknown
+Priority: 50
+(
+ {Token.category == NNP}
+)
+:unknown
+-->
+ :unknown.Unknown = {kind = "PN", rule = Unknown}
\end{verbatim}
\end{small}
-Similarly, the following rule (assuming an appropriate macro for
-`email') would mean that an email address would
-only be recognised if it occurred inside angled brackets (which would
-not themselves form part of the entity):
+However, it is equally acceptable to have multiple constraints in a statement.
+In this example, the `majorType' of `Lookup' must be `name' {\bf and} the
+`minorType' must be `surname':
\begin{small}
\begin{verbatim}
-Rule: Emailaddress1
-({Token.string == `<'})
+Rule: Surname
(
- (EMAIL)
-)
-:email
-({Token.string == `>'})
+ {Lookup.majorType == "name",
+ Lookup.minorType == "surname"}
+):surname
-->
- :email.Address= {kind = "email", rule = "Emailaddress1"}
+ :surname.Surname = {}
\end{verbatim}
\end{small}
-Also, it is possible to specify the constraint that one annotation must start
-at the same place as another. For example:
+Multiple constraints on the same annotation type must all be satisfied by the
+{\em same} annotation in order for the pattern to match.
+The constraints may refer to different annotations, and for the pattern as a
+whole to match the constraints must be satisfied by annotations that
+{\em start} at the same location in the document. In this example, in
+addition to the constraints on the `majorType' and `minorType' of `Lookup', we
+also have a constraint on the `string' of `Token':
+
\begin{small}
\begin{verbatim}
Rule: SurnameStartingWithDe
@@ -626,69 +663,66 @@
with majorType `name' and minorType `surname' start at the same offset in
the text. Both the Lookup and Token annotations would be included in the
\verb|:de| binding, so the Surname annotation generated would span the longer
-of the two. Constraints on the same annotation type must be satisfied by a
-single annotation, so in this example there must be a single Lookup matching
-both the major and minor types -- the rule would not match if there were two
-different lookups at the same location, one of them satisfying each constraint.
+of the two. As before, constraints on the same annotation type must be
+satisfied by a single annotation, so in this example there must be a single
+Lookup matching both the major and minor types -- the rule would not match if
+there were two different lookups at the same location, one of them satisfying
+each constraint.
-It is important to remember that context is consumed by the rule, so it cannot be
-reused in another rule within the same phase. So, for example, right context
-cannot be used as left context for another rule.
-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsect[sec:jape:multiconstraint]{Multi-Constraint Statements}
+\subsect[sec:jape:context]{Using Context}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-In the examples we have seen so far, most statements have contained only one
-constraint. For example, in this statement, the `category' of `Token' must
-equal `NNP':
+Context can be dealt with in the grammar rules in the following way.
+The pattern to be annotated is always enclosed by a set of round
+brackets. If preceding context is to be included in the rule, this is
+placed before this set of brackets. This context is described in
+exactly the same way as the pattern to be matched. If context
+following the pattern needs to be included, it is placed after the
+label given to the annotation. Context is used where a pattern should
+only be recognised if it occurs in a certain situation, but the
+context itself does not form part of the pattern to be annotated.
+For example, the following rule for Time (assuming an appropriate
+macro for `year') would mean that a year would only
+be recognised if it occurs preceded by the words `in' or `by':
+
\begin{small}
\begin{verbatim}
-Rule: Unknown
-Priority: 50
-(
- {Token.category == NNP}
-)
-:unknown
--->
- :unknown.Unknown = {kind = "PN", rule = Unknown}
+Rule: YearContext1
+
+({Token.string == "in"}|
+ {Token.string == "by"}
+)
+(YEAR)
+:date -->
+ :date.Timex = {kind = "date", rule = "YearContext1"}
\end{verbatim}
\end{small}
-However, it is equally acceptable to have multiple constraints in a statement.
-In this example, the `majorType' of `Lookup' must be `name' {\bf and} the
-`minorType' must be `surname':
+Similarly, the following rule (assuming an appropriate macro for
+`email') would mean that an email address would
+only be recognised if it occurred inside angled brackets (which would
+not themselves form part of the entity):
\begin{small}
\begin{verbatim}
-Rule: Surname
+Rule: Emailaddress1
+({Token.string == `<'})
(
- {Lookup.majorType == "name",
- Lookup.minorType == "surname"}
-):surname
+ (EMAIL)
+)
+:email
+({Token.string == `>'})
-->
- :surname.Surname = {}
+ :email.Address= {kind = "email", rule = "Emailaddress1"}
\end{verbatim}
\end{small}
-As we saw in Section~\ref{sec:jape:context}, the constraints may refer to
-different annotations. In this example, in addition to the constraints on the
-`majorType' and `minorType' of `Lookup', we also have a constraint on the
-`string' of `Token':
-\begin{small}
-\begin{verbatim}
-Rule: SurnameStartingWithDe
-(
- {Token.string == "de",
- Lookup.majorType == "name",
- Lookup.minorType == "surname"}
-):de
--->
- :de.Surname = {prefix = "de"}
-\end{verbatim}
-\end{small}
+It is important to remember that context is consumed by the rule, so it cannot
+be reused in another rule within the same phase. So, for example, right context
+for one rule cannot be used as left context for another rule.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsect[sec:jape:negation]{Negation}
@@ -699,9 +733,8 @@
constraints which specify the \emph{absence} of annotations. A negative
constraint is signalled in the grammar by a `!' character.
-Negative constraints are generally used in combination with positive ones to
-constrain the locations at which the positive constraint can match. For
-example:
+Negative constraints are used in combination with positive ones to constrain
+the locations at which the positive constraint can match. For example:
\begin{small}
\begin{verbatim}
@@ -719,12 +752,12 @@
negative constraint matches at any location where the corresponding positive
constraint would \emph{not} match. Negative constraints do not contribute any
annotations to the bindings - in the example above, the \verb|:name| binding
-would contain only the Token annotation. The exception to this is when a
+would contain only the Token annotation\footnote{The exception to this is when a
negative constraint is used alone, without any positive constraints in the
combination. In this case it binds \emph{all} the annotations at the match
-position that do not match the constraint. Thus, \verb|{!Lookup}| would bind
-all the annotations starting at this location except Lookups. In most cases,
-negative constraints should only be used in combination with positive ones.
+position that do not match the constraint. Thus, \{!Lookup\} would bind
+all the annotations starting at this location except Lookups. In general
+negative constraints should only be used in combination with positive ones.}.
Any constraint can be negated, for example:
@@ -746,6 +779,34 @@
if there is no Token annotation at all at this location.\footnote{In the
Montreal transducer, the two forms were equivalent}
+As with positive constraints, multiple negative constraints on the same
+annotation type must all match the same annotation in order for the overall
+pattern match to be blocked. For example:
+\begin{small}
+\begin{verbatim}
+{Name, !Lookup.majorType == "person", !Lookup.minorType == "female"}
+\end{verbatim}
+\end{small}
+would match a ``Name'' annotation, but only if it does not start at the same
+location as a Lookup with majorType ``person'' and minorType ``female''. A
+Lookup with majorType ``person'' and minorType ``male'' would {\em not} block
+the pattern from matching. However negated constraints on different annotation
+types are independent:
+\begin{small}
+\begin{verbatim}
+{Person, !Organization, !Location}
+\end{verbatim}
+\end{small}
+would match a Person annotation, but only if there is no Organization
+annotation {\em and} no Location annotation starting at the same place.
+
+{\bf Note} Prior to GATE 7.0, negated constraints on the same annotation type
+were considered independent, i.e. in the Name example above {\em any} Lookup of
+majorType ``person'' would block the match, irrespective of its minorType. If
+you have existing grammars that depend on this behaviour you should add
+\verb|negationGrouping = false| to the Options line at the top of the JAPE
+phase in question.
+
Although JAPE provides an operator to look for the absence of a single annotation
type, there is no support for a general negative operator to prevent a rule from
firing if a particular \emph{sequence} of annotations is found. One solution to
@@ -812,73 +873,19 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\sect[sec:jape:operators]{LHS Operators in Detail}
+\label{sec:jape:matchingoperators}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This section gives more detail on the behaviour of the matching operators used
on the left-hand side of JAPE rules.
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsect[sec:jape:compositionaloperators]{Compositional Operators}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-Compositional operators are used to combine matching constructions in the
-manner intended. Union and Kleene operators are available, as is range notation.
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsect{Union and Kleene Operators}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-The following union and Kleene operators are available:
-
-\begin{itemize}
-\item $\mid $ - or
-\item * - zero or more occurrences
-\item ? - zero or one occurrences
-\item + - one or more occurrences
-\end{itemize}
-
-In the following example, you can see the `$\mid $' and `?' operators being used:
-
-\begin{small}
-\begin{verbatim}
-Rule: LocOrganization
-Priority: 50
-
-(
- ({Lookup.majorType == location} |
- {Lookup.majorType == country_adj})
-{Lookup.majorType == organization}
-({Lookup.majorType == organization})?
-)
-:orgName -->
- :orgName.TempOrganization = {kind = "orgName", rule=LocOrganization}
-\end{verbatim}
-\end{small}
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsect[sec:jape:ranges]{Range Notation}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-A range notation can also be added. e.g.
-\begin{small}
-\begin{verbatim}({Token})[1,3]\end{verbatim}
-\end{small} matches one to three Tokens
-in a row. \begin{small}
-\begin{verbatim}({Token.kind == number})[3]\end{verbatim}
-\end{small} matches
-exactly 3 number Tokens in a row.
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsect[sec:jape:matchingoperators]{Matching Operators}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
Matching operators are used to specify how matching must take place between a
-specification and an annotation in the document. Equality (`==' and `!=') and
+JAPE pattern and an annotation in the document. Equality (`==' and `!=') and
comparison (`$<$', `$<=$', `$>=$' and `$>$') operators can be used, as can
regular expression matching and contextual operators (`contains' and `within').
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsect{Equality Operators}
+\subsect{Equality Operators}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The equality operators are `==' and `!='. The basic operator in JAPE is
@@ -910,7 +917,7 @@
The \verb|!=| operator matches exactly when \verb|==| doesn't.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsect{Comparison Operators}
+\subsect{Comparison Operators}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The comparison operators are `$<$', `$<=$', `$>=$' and `$>$'. Comparison
@@ -933,7 +940,7 @@
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsect[sec:jape:operators:regex]{Regular Expression Operators}
+\subsect[sec:jape:operators:regex]{Regular Expression Operators}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The regular expression operators are `=$\sim$', `==$\sim$', `!$\sim$' and
@@ -973,31 +980,36 @@
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsect[sec:jape:operators:contextual]{Contextual Operators}
+\subsect[sec:jape:operators:contextual]{Contextual Operators}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-The contextual Operators are `contains' and `within'. These operators match
-annotations within the context of other annotations.
+The contextual Operators are `contains' and `within', and their complements
+`notContains' and `notWithin'. These operators match annotations within the
+context of other annotations.
\begin{itemize}
\item contains - Written as \verb|{X contains Y}|, returns true if an
-annotation of type X completely contains an annotation of type Y.
+annotation of type X completely contains an annotation of type Y. Conversely
+\verb|{X notContains Y}| matches if an annotation of type X does not contain
+one of type Y.
\item within - Written as \verb|{X within Y}|, returns true if an annotation
-of type X is completely covered by an annotation of type Y.
+of type X is completely covered by an annotation of type Y. Conversely
+\verb|{X notWithin Y}| matches if an annotation of type X is not covered by an
+annotation of type Y.
\end{itemize}
-For either operator, the right-hand value (Y in the above examples) can be a
-full constraint itself. For example \verb|{X contains {Y.foo==bar}}| is also
-accepted. The operators can be used in a multi-constraint statement (see
-Section~\ref{sec:jape:multiconstraint}) just like any of the traditional ones,
-so \verb|{X.f1 != "something", X contains {Y.foo==bar}}| is valid.
+For any of these operators, the right-hand value (Y in the above examples) can
+be a full constraint itself. For example \verb|{X contains {Y.foo==bar}}| is
+also accepted. The operators can be used in a multi-constraint statement (see
+Section~\ref{sec:jape:multiconstraint}) just like any of the traditional
+ones, so \verb|{X.f1 != "something", X contains {Y.foo==bar}}| is valid.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsubsect[sec:jape:customoperators]{Custom Operators}
+\subsect[sec:jape:customoperators]{Custom Operators}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
It is possible to add additional custom operators without modifying the JAPE
-language. There are new init-time parameters to Transducer so that additional
+language. There are init-time parameters to Transducer so that additional
annotation `meta-property' accessors and custom operators can be referenced at
runtime. To add a custom operator, write a class that implements
gate.jape.constraint.ConstraintPredicate, make the class available to GATE
@@ -1080,6 +1092,46 @@
you want to copy several feature values from the same left hand side
annotation, you should consider using Java code on the right hand side of your
rule (see Section \ref{sec:jape:javarhs}).
+
+In addition to copying feature values you can also copy meta-properties (see
+section~\ref{sec:jape:metaproperties}):
+%
+\begin{small}
+\begin{verbatim}
+Rule: LocationType
+(
+ {Lookup.majorType == location}
+):loc
+-->
+ :loc.Location = {rule = "LocationType", text = :loc.Lookup@...}
+\end{verbatim}
+\end{small}
+
+The syntax ``\verb|feature = :label.AnnotationType@...|'' assigns to the
+specified feature the text covered by the annotation of this type in the
+binding with this label. The \verb|@cleanString| and \verb|@length| properties
+are similar. As before, if there is more than one annotation of the given type
+is bound to the same label then one of them will be chosen arbitrarily.
+
+The ``\verb|.AnnotationType|'' may be omitted, for example
+%
+\begin{small}
+\begin{verbatim}
+Rule: LocationType
+(
+ {Token.category == IN}
+ {Lookup.majorType == location}
+):loc
+-->
+ :loc.InLocation = {rule = "InLoc", text = :loc@...,
+ size = :loc@...}
+\end{verbatim}
+\end{small}
+
+In this case the string, cleanString or length is that covered by the whole
+label, i.e. the same span as would be covered by an annotation created with
+``\verb|:label.NewAnnotation = {}|''.
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsect[sec:jape:optional]{Optional or Empty Labels}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Deleted: userguide/trunk/japeimpl.tex
===================================================================
--- userguide/trunk/japeimpl.tex 2012-02-03 16:19:33 UTC (rev 15317)
+++ userguide/trunk/japeimpl.tex 2012-02-03 16:35:31 UTC (rev 15318)
@@ -1,510 +0,0 @@
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%
-% japeimpl.tex
-%
-% hamish, 5/3/2
-%
-% $Id: japeimpl.tex,v 1.8 2006/08/19 14:17:54 ian Exp $
-%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\chapt[chap:japeimpl]{JAPE: Implementation}
-\markboth{JAPE: Implementation}{JAPE: Implementation}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-%%%% qqqqqqqqqqqqqqqqqqqqqqqqq %%%%
-\ifprintedbook
-\else
-\begin{quote}
-The annual Diagram prize for the oddest book title of the year has
-been awarded to Gerard Forlin's Butterworths Corporate
-Manslaughter Service, a hefty law tome providing guidance and analysis
-on corporate liability for deaths in the workplace.
-
-The book, not published until January, was up against five other
-shortlisted titles: Fancy Coffins to Make Yourself; The Flat-Footed
-Flies of Europe; Lightweight Sandwich Construction; Tea Bag Folding; and
-The Art and Craft of Pounding Flowers: No Paint, No
-Ink, Just a Hammer! The shortlist was thrown open to readers of the
-literary trade magazine The Bookseller, who chose the winner
-by voting on the magazine's website. Butterworths Corporate Manslaughter
-Service, a snip at £375, emerged as the overall victor
-with 35\% of the vote.
-
-The Diagram prize has been a regular on the award circuit since 1978,
-when Proceedings of the Second International Workshop on
-Nude Mice carried off the inaugural award. Since then, titles such
-as American Bottom Archaeology and last year's winner,
-High-Performance Stiffened Structures (an engineering publication), have
-received unwonted publicity through the prize. This year's
-winner is perhaps most notable for its lack of entendre.
-
-{\it Manslaughter Service kills off competition in battle of strange titles},
-Emma Yates, The Guardian, November 30, 2001.
-\end{quote}
-\fi
-%%%% qqqqqqqqqqqqqqqqqqqqqqqqq %%%%
-
-
-This appendix gives implementation details and formal definitions of the
-JAPE annotation patterns language.
-Section \ref{sec:japeimpl:grammar} gives a more formal definition of the JAPE
-grammar, and some examples of its use. Section \ref{sec:japeimpl:cpsl}
-describes JAPE's relation to CPSL. Section~\ref{sec:japeimpl:init} describes
-the initialisation of a JAPE grammar, Section~\ref{sec:japeimpl:exec} talks
-about the execution of JAPE grammars, and the final section explains how to
-switch the Java compiler used for JAPE.
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\sect[sec:japeimpl:grammar]{Formal Description of the JAPE Grammar}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-JAPE is similar to CPSL (a Common Pattern Specification Language, developed
-in the TIPSTER programme by Doug Appelt and others), with a few exceptions.
-Figure \ref{fig:japebnf} gives a BNF (Backus-Naur Format)
-description of the grammar.
-
-An example rule LHS:
-
-\begin{verbatim}
-Rule: KiloAmount
-( ({Token.kind == "containsDigitAndComma"}):number
- {Token.string == "kilograms"} ):whole
-\end{verbatim}
-
-A basic constraint specification appears between curly braces, and gives
-a conjunction of annotation/attribute/value specifiers which have to match
-at a particular point in the annotation graph. A complex constraint
-specification
-appears within round brackets, and may be bound to a label with the `:'
-operator; the label then becomes available in the RHS for access to the
-annotations matched by the complex constraint. Complex constraints can
-also have Kleene operators (*, +, ?) applied to them. A sequence of constraints
-represents a sequential conjunction; disjunction is represented by separating
-constraints with `\verb^|^'.
-
-Converted to the format accepted by the JavaCC LL parser generator,
-the most significant fragment of the CPSL grammar (as described by Appelt,
-based on an original specification from a TIPSTER working
-group chaired by Boyan Onyshkevych) goes like this:
-\begin{verbatim}
-constraintGroup -->
- (patternElement)+ ("|" (patternElement)+ )*
-
-patternElement -->
- "{" constraint ("," constraint)* "}"
-| "(" constraintGroup ")" (kleeneOp)? (binding)?
-\end{verbatim}
-Here the first line of {\tt patternElement} is a basic constraint, the
-second a complex one.
-
-%% JP (20101209): the following is the old BNF of JAPE which was included here:
-% MultiPhaseTransducer ::=
- % ( <multiphase> <ident> )?
- % ( ( SinglePhaseTransducer )+ | ( <phases> ( <ident> )+ ) )
- % <EOF>
-% SinglePhaseTransducer ::=
- % <phase> <ident> ( <input> ( <ident> )* )?
- % ( <option> ( <ident> <assign> <ident> )* )?
- % ( ( Rule ) | MacroDef )*
-% Rule ::=
- % <rule> <ident> ( <priority> <integer> )?
- % LeftHandSide "-->" RightHandSide
-% MacroDef ::=
- % <macro> <ident> ( PatternElement | Action )
-% LeftHandSide ::=
- % ConstraintGroup
-% ConstraintGroup ::=
- % ( PatternElement )+ ( <bar> ( PatternElement )+ )*
-% PatternElement ::=
- % ( <ident> | BasicPatternElement | ComplexPatternElement )
-% BasicPatternElement ::=
- % ( ( <leftBrace> Constraint ( <comma> Constraint )* <rightBrace> )
- % | ( <string> ) )
-% ComplexPatternElement ::=
- % <leftBracket> ConstraintGroup <rightBracket>
- % ( <kleeneOp> )? ( <colon> ( <ident> | <integer> ) )?
-% Constraint ::=
- % ( <pling> )? <ident> ( <period> <ident> <equals> AttrVal )?
-% AttrVal ::=
- % ( <string> | <ident> | <integer> | <floatingPoint> | <bool> )
-% RightHandSide ::=
- % Action ( <comma> Action )*
-% Action ::=
- % ( NamedJavaBlock | AnonymousJavaBlock | AssignmentExpression | <ident> )
-% NamedJavaBlock ::=
- % <colon> <ident> <leftBrace> ConsumeBlock
-% AnonymousJavaBlock ::=
- % <leftBrace> ConsumeBlock
-% AssignmentExpression ::=
- % ( <colon> | <colonplus> ) <ident> <period> <ident>
- % <assign> <leftBrace> (
- % <ident> <assign>
- % ( AttrVal | ( <colon> <ident> <period> <ident> <period> <ident> ) )
- % ( <comma> )?
- % )* <rightBrace>
-% ConsumeBlock ::=
- % Java code
-\begin{figure}
-{\scriptsize
-\begin{verbatim}
-MultiPhaseTransducer ::=
- ( <multiphase> <ident> )?
- ( ( ( JavaImportBlock )
- ( ( ControllerStartedBlock )
- | ( ControllerFinishedBlock )
- | ( ControllerAbortedBlock )
- )*
- ( SinglePhaseTransducer )+ ) |
- ( <phases> ( <path> )+ ) )
- <EOF>
-SinglePhaseTransducer ::=
- <phase> <ident>
- ( ( <input> ( <ident> )* ) |
- ( <option> ( <ident> <assign> ( <ident> | <bool> ) )* ) )*
- ( ( Rule ) | MacroDef | TemplateDef )*
-JavaImportBlock ::= ( <javaimport> <leftBrace> ConsumeBlock )?
-ControllerStartedBlock ::= ( <controllerstarted> <leftBrace> ConsumeBlock )
-ControllerFinishedBlock ::= ( <controllerfinished> <leftBrace> ConsumeBlock )
-ControllerAbortedBlock ::= ( <controlleraborted> <leftBrace> ConsumeBlock )
-Rule ::=
- <rule> <ident>
- ( <priority> <integer> )?
- LeftHandSide "-->" RightHandSide
-MacroDef ::= <macro> <ident> ( PatternElement | Action )
-TemplateDef ::= <template> <ident> <assign> AttrVal
-LeftHandSide ::= ConstraintGroup
-ConstraintGroup ::= ( PatternElement )+ ( <bar> ( PatternElement )+ )*
-PatternElement ::= ( <ident> | BasicPatternElement | ComplexPatternElement )
-BasicPatternElement ::=
- ( ( <leftBrace> Constraint ( <comma> Constraint )* <rightBrace> ) |
- ( <string> ) )
-ComplexPatternElement ::=
- <leftBracket> ConstraintGroup <rightBracket>
- ( KleeneOperator )?
- ( <colon> ( <ident> | <integer> ) )?
-KleeneOperator ::=
- ( <kleeneOp> ) |
- ( <leftSquare> ( <integer> ( <comma> <integer> )? ) <rightSquare> )
-Constraint ::=
- ( <pling> )? <ident>
- ( ( FeatureAccessor <attrOp> AttrVal )
- | ( <metaPropOp> <ident> <attrOp> AttrVal )
- | ( <ident> ( ( <leftBrace> Constraint <rightBrace> ) | ( Constraint ) ) )
- )?
-FeatureAccessor ::= ( <period> <ident> )
-AttrVal ::= ( ( <string> | <ident> | <integer> | <floatingPoint> | <bool> ) )
- | ( TemplateCall )
-TemplateCall ::= <leftSquare> <ident>
- ( <ident> <assign> AttrVal ( <comma> )? )*
- <rightSquare>
-RightHandSide ::= Action ( <comma> Action )*
-Action ::= ( NamedJavaBlock | AnonymousJavaBlock | AssignmentExpression | <ident> )
-NamedJavaBlock ::= <colon> <ident> <leftBrace> ConsumeBlock
-AnonymousJavaBlock ::= <leftBrace> ConsumeBlock
-AssignmentExpression ::=
- ( <colon> | <colonplus> ) <ident> <period> <ident> <assign>
- <leftBrace>
- ( <ident> <assign>
- ( AttrVal |
- ( <colon>
- <ident>
- ( ( <period> <ident> ( <period> | <metaPropOp> ) <ident> ) |
- ( <metaPropOp> <ident> )
- )
- )
- )
- ( <comma> )?
- )*
- <rightBrace>
-appendSpecials ::= java code
-ConsumeBlock ::= java code
-\end{verbatim}
-}
-\nnormalsize
-\caption{\nsmall BNF of JAPE's grammar}
-\label{fig:japebnf}
-\end{figure} %%\nnormalsize
-
-\newpage
-An example of a complete rule:
-\begin{verbatim}
-Rule: NumbersAndUnit
-( ( {Token.kind == "number"} )+:numbers {Token.kind == "unit"} )
--->
-:numbers.Name = { rule = "NumbersAndUnit" }
-\end{verbatim}
-This says `match sequences of numbers followed by a unit; create a Name
-annotation across the span of the numbers,
-and attribute rule with value NumbersAndUnit'.
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\sect[sec:japeimpl:cpsl]{Relation to CPSL}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-We {\em differ from the CPSL spec} in various ways:
-
-\begin{enumerate}
-\item
-No pre- or post-fix context is allowed on the LHS.
-\item
-No function calls on the LHS.
-\item
-No string shorthand on the LHS.
-\item
-We have multiple rule application algorithms (see Section \ref{sec:jape:priority}).
-\item
-Expressions relating to labels unbound on the LHS are not evaluated on
-the RHS. (In TextPro they evaluate to `false'.)
-\item
-JAPE allows arbitrary Java code on the RHS.
-\item
-JAPE has a different macro syntax, and allows macros for both the RHS and
-LHS.
-\item
-JAPE grammars are compiled and can be stored as serialised Java objects.
-\end{enumerate}
-
-Apart from this, it is a full implementation of CPSL, and the formal power
-of the languages is the same (except that a JAPE RHS can delete annotations,
-which straight CPSL cannot). The rule LHS is a regular language over
-annotations; the rule RHS can perform arbitrary transformations on
-annotations, but the RHS is only fired {\it after} the LHS been evaluated,
-and the effects of a rule application can only be referenced after the phase in
-which it occurs, so the recognition power is no more than regular.
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\sect[sec:japeimpl:init]{Initialisation of a JAPE Grammar}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-When a JAPE grammar is loaded in GATE, each phase is converted into a finite
-state machine, a process that has several stages. Each rule is treated as a
-regular expression using annotation-based patterns as input symbols. A JAPE
-phase is a disjunction of rules, so it is also a regular expression. The first
-stage of building the associated FSM for a JAPE phase is the construction of a
-non-deterministic finite-state automaton, following the algorithm described in
-\cite{Aho86}.
-
-Additional to standard regular expressions, JAPE rules also contain bindings
-(labels associated to pattern segments). These are intended to be associated to
-the matched input symbols (i.e. annotations) during the matching process, and are
-used while executing the actions caused by the rule firing. Upon creating the
-equivalent FSM for a given JAPE rule, bindings are associated with the FSM
-transitions. This changes the semantics of a transition -- besides moving the
-state machine into a new current state, a transition may also {\em bind} the
-consumed annotation(s) with one or more labels.
-
-In order to optimise the execution time during matching (at the expense of
-storage space), NFAs are usually converted to {\em Deterministic Finite State
-Automata} (DFAs) using e.g. the {\em subset algorithm}~\cite{Aho86}. In the case
-of JAPE this transformation is not possible due to the binding labels: two or
-more transitions from the NFA that match the same annotation pattern cannot be
-compacted into a single transition in the DFA if they have different bindings.
-Because of this, JAPE grammars are represented as non-deterministic finite state
-machines. A partial optimisation that eliminates the $\epsilon$-transitions from
-the NFA is however performed.
-
-The actions represented on the right hand side of JAPE rules are converted to
-compiled Java classes and are associated with final states in the FSM. The final
-in-memory representation of a JAPE grammar thus consists of a non-deterministic
-finite state machine, with transitions that use annotation-based patterns as
-input symbols, additionally marked with bindings information and for which the
-final states are associated with actions.
-
-Starting from the following two JAPE rules:
-\begin{verbatim}
-Rule: PersonPrefix
-(
- ({Token})+ {Person}
-):pers
---> {...}
-
-Rule: OrganisationPrefix (
- ({Token})+ {Organisation}
-):org --> {...}
-\end{verbatim}
-the associated NFA is constructed, as illustrated in
-Figure~\ref{fig:fsm-nfa-example}. Note that due to the fact that the final
-states are associated with different actions, they cannot be joined into a
-single one and are kept separate. This automaton is then optimised by
-eliminating the $\epsilon$-transitions, resulting in the NFA presented in
-Figure~\ref{fig:fsm-dfa-example}. For the sake of simplicity, the annotation
-patterns used are the most basic ones, depending solely on annotation type. In
-the graphical representation, the transitions are marked with the type of
-annotation that they match and the associated binding in square brackets.
-
-\begin{figure}[htb]
-\begin{center}
-\includegraphics[scale=1]{fsm-nfa-example.png}
-\caption{Example of a non-deterministic finite state machine compiled from JAPE
-rules.}
-\label{fig:fsm-nfa-example}
-\end{center}
-\end{figure}
-
-\begin{figure}[htb]
-\begin{center}
-\includegraphics[scale=1]{fsm-dfa-example.png}
-\caption{Example of a finite state machine compiled from JAPE rules, with
-$\epsilon$-transitions removed.}
-\label{fig:fsm-dfa-example}
-\end{center}
-\end{figure}
-
-It can be observed in Figure~\ref{fig:fsm-dfa-example} that there are two
-transitions starting from state $1$ (leading to states $2$, respectively $4$)
-that both consume annotations of type {\tt Token}, thus even the optimised
-finite state machine is still non-deterministic.
-
-Once a JAPE grammar is converted to the equivalent finite state automaton, the
-initialisation phase is complete.
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\sect[sec:japeimpl:exec]{Execution of JAPE Grammars}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-The execution of a JAPE grammar can be described in simple terms as finding a
-path through an annotation graph where all the annotations traversed form a
-sequence that is accepted by the finite state machine built during the
-initialisation phase. The actual process is somewhat more complex than that, as
-it also needs to take into account the various matching modes, the filtering of
-input annotation types, to deal with the assignment of matched annotation to
-bindings, and to manage the execution of actions whenever successful matches
-occur.
-
-Executing a JAPE grammar involves simulating the execution of a
-non-deterministic finite state automaton (NFA) while using an annotation graph
-as input. At each step we start from a document position (initially zero) and a
-finite state machine in a given state (initially the start state). Annotations
-found at the given document position are compared with the restrictions encoded
-in the NFA transitions; if they match, the annotations are {\em consumed}
-and the state machine moves to a new state. Ambiguities are possible at each
-step both in terms of input (several matching annotations can start at the
-same offset) and in terms of available NFA transitions (the state machine is
-non-deterministic, so multiple transitions with the same restrictions can be
-present). When such ambiguities are encountered, the current state machine is
-cloned to create as many copies as necessary, and each such copy continues the
-matching process independently. The JAPE executor thus needs to keep track of
-a family of state machines that are running in parallel -- henceforth we shall
-call these FSM instances.
-
-Whenever one of the active FSM instances is moved to a new state, a test is
-performed to check if the new state is a final one. If that is the case, the FSM
-instance is said to be in an {\em accepting} state, and a copy of its state is
-saved for later usage.
-
-When none of the active FSM instances can advance any further, the
-stored accepting FSM instances are used to execute JAPE actions, according to
-the declared matching style of the current grammar.
-
-A high-level view\footnote{The view of the algorithm presented here is greatly
-simplified, for the sake of clarity. The actual implementation consists of a few
-thousand lines of Java code.} of the algorithm used during the execution of a
-JAPE grammar is presented in Listing~\ref{lst:jape-exec}, in a Java-inspired
-pseudo-code.
-
-\begin{lstlisting}[float=htb,
-caption={JAPE matching algorithm},
-label={lst:jape-exec}]
-processInputfilters();
-currentDocPosition = 0;
-activeFSMInstances = new List<FSMInstance>();
-acceptingFSMInstances = new List<FSMInstance>();
-while(currentDocPosition < document.length()){
- //create an initial FSM instance, starting from
- //the current document position
- activeFSMInstances.add(
- new FSMInstance(currentDocPosition));
- //advance all FSM instances,
- //until no further advance is possible
- while(!activeFSMInstances.isEmpty()){
- FSMInstance aFSM = activeFSMInstances.remove(0);
- //advance aFSM, consuming annotations, linking used
- //annotations to binding labels, as required;
-
- //create cloned copies as necessary and add them to
- //activeFSMInstances;
-
- //save any accepting state of aFSM
- //into acceptingFSMInstances;
- }
- if(!acceptingFSMInstances.isEmpty()){
- //execute the action(s)
- }
-
- //move to the new document position, in accordance
- //with the matching style.
-}
-\end{lstlisting}
-
-The next paragraphs contain some more detailed comments, indexed using the line
-numbers in the listing:
-\begin{description}
- \item[line 1] The annotations present in the document are filtered according
- to the {\tt Input} declaration in the JAPE code, if one was present. This
- causes the JAPE executor to completely ignore annotations that are not
- listed as valid input.
- \item[lines 2--4] The matching process is initialised by setting the document
- position to $0$, and creating empty lists of active and accepting FSM
- instances.
- \item[lines 5--29] The matching continues until all the document text is
- exhausted.
- \item[line 8] Each step starts from the current document position with a
- single FSM instance.
- \item[lines 12--22] While there are still active FSM instances, they are
- advanced as far as possible. Whenever ambiguities are encountered, cloned
- copies are created and added to the list of active FSM instances. Whenever
- an FSM instance reaches a final state during its advancing, a copy of its
- state is saved to the list of accepting FSM instances.
- \item[lines 23--25] This segment of code is reached when there are no more
- active FSM instances -- all active instances were advanced as far as
- possible and either saved to the accepting list (if they reached a final
- state during that process) or simply discarded (if they could advance no
- further but they still have not reached a final state).\\At this point,
- any successful matches that occurred need to be acted upon, so the list of
- accepting FSM instances is inspected. If there are any, their associated
- actions are now executed, according to the desired matching style. For
- instance if the matching style used is {\tt Appelt}, then only the accepting
- FSM instance that has covered the most input will be executed; conversely,
- if the matching style is {\tt Brill}, then all accepting FSM instances will
- have their actions executed, etc.
- \item[line 27] When this point is reached, all possible matches from the
- current document position were found and the required action executed. The
- next step is to move to the next starting position in the document, and
- re-start the matching process from there. Depending on the matching style
- selected, the new document position is either the $oldPosition+1$, in the
- case of {\tt All}, or $matchingEndPosition+1$ in all other cases.
-\end{description}
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\sect[sec:japeimpl:javacompiler]{Using a Different Java Compiler}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-GATE allows you to choose which Java compiler is used to compile the action
-classes generated from JAPE rules. The preferred compiler is specified by the
-{\tt Compiler\_type} option in {\tt gate.xml}. At present the supported values
-are:
-\begin{description}
-\item[Sun]
-The Java compiler supplied with the JDK. Although the option is called {\tt
-Sun}, it supports any JDK that supplies {\tt com.sun.tools.javac.Main} in a
-standard location, including the IBM JDK (all platforms) and the Apple JDK for
-Mac OS X.
-\item[Eclipse]
-The Eclipse compiler, from the Java Development Tools of the Eclipse
-project\footnote{\htlinkplain{http://www.eclipse.org/jdt}}. Currently we use
-the compiler from Eclipse 3.2, which supports Java 5.0.
-\end{description}
-
-By default, the Eclipse compiler is used. It compiles faster than the Sun
-compiler, and loads dependencies via the GATE ClassLoader, which means that
-Java code on the right hand side of JAPE rules can refer to classes that were
-loaded from a plugin JAR file. The Sun compiler can only load classes from the
-system classpath, so it will not work if GATE is loaded from a subsidiary
-classloader, e.g. a Tomcat web application. You should generally use the
-Eclipse compiler unless you have a compelling reason not to.
-
-Support for other compilers can be added, but this is not documented here - if
-you're in a position to do this, you won't mind reading the source code...
Modified: userguide/trunk/tao_main.tex
===================================================================
--- userguide/trunk/tao_main.tex 2012-02-03 16:19:33 UTC (rev 15317)
+++ userguide/trunk/tao_main.tex 2012-02-03 16:35:31 UTC (rev 15318)
@@ -702,7 +702,6 @@
\input{design} %final for book
\fi
-\input{japeimpl} %final for book
\input{ant-tasks}
\ifprintedbook
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|