From: <mar...@us...> - 2011-10-28 17:23:11
|
Revision: 14458 http://gate.svn.sourceforge.net/gate/?rev=14458&view=rev Author: markagreenwood Date: 2011-10-28 17:23:04 +0000 (Fri, 28 Oct 2011) Log Message: ----------- details of the changes needed to get GENIA to handle double quotes properly, as well as tryign to make sure all my new stuff has made it into the list of recent changes Modified Paths: -------------- userguide/trunk/misc-creole.tex userguide/trunk/recent-changes.tex Modified: userguide/trunk/misc-creole.tex =================================================================== --- userguide/trunk/misc-creole.tex 2011-10-28 17:14:13 UTC (rev 14457) +++ userguide/trunk/misc-creole.tex 2011-10-28 17:23:04 UTC (rev 14458) @@ -265,6 +265,15 @@ cause the PR to run \verb|cmd.exe /c runTagger.bat| which is the way to run batch files from Java. +In general most of the complexities of configuring a number of external taggers has +already been determined and example pipelines are provided in the plugin's resources +directory. To use one of the supported taggers simply load one of the exampl +applications and then check the runtime parameters of the Tagger\_Framework PR +in order to set paths correctly to your copy of the tagger you wish to use. + +Some taggers require more complex configuration, details of which are covered in +the remainder of this section. + \subsect{TreeTagger - Multilingual POS Tagger} The TreeTagger is a language-independent part-of-speech tagger, which @@ -329,8 +338,30 @@ site. Figure~\ref{fig:treetagger} shows a screenshot of a French document processed with the TreeTagger. +\subsect[sec:genia-quotes]{GENIA and Double Quotes} +\subsect{GENIA and Double Quotes} +Documents that contain double quote characters can cause problems for +the GENIA tagger. The issue arises because the in-built GENIA tokenizer +converts double quotes to single quotes in the output which then do not +match the document content, causing the tagger to fail. There are two possible +solutions to this problem. +Firstly you can perform tokenization in GATE and disable the in-built +GENIA tokenizer. Such a pipeline is provided as an example in the GENIA +resources direcotry; geniatagger-en-no\_tokenization.gapp. However, this may +result in other problems for your subsequent code. If so, you may want to +try the second solution. +The second solution is to use the GENIA tokenization via the other provided +example pipeline: geniatagger-en-tokenization.gapp. If your documents do not +contain double quotes then this gapp example should work as is. Otherwise, +you must modify the GENIA tagger in order \textit{not} to convert double quotes +to single quotes. Fortunately this is fairly straightforward. In the +resources directory you will find a modified copy of tokenize.cpp from +v3.0.1 of the GENNIA tagger. Simply use this file to replace the copy in the +normal GENIA distribution and recompile. For Windows users, a pre-compiled +binary is also provided -- simply replace your existing binary with this +modified copy. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \sect[sec:parsers:chemistrytagger]{Chemistry Tagger} Modified: userguide/trunk/recent-changes.tex =================================================================== --- userguide/trunk/recent-changes.tex 2011-10-28 17:14:13 UTC (rev 14457) +++ userguide/trunk/recent-changes.tex 2011-10-28 17:23:04 UTC (rev 14458) @@ -21,6 +21,10 @@ \rcSect[next-release]{Next Release} +\rcSubsect{October 2011} + +Details on running GENIA over documents containing double quotes, see Section~\ref{sec:genia-quotes}. + \rcSubsect{August 2011} Added support for using MutationFinder within GATE to find mentions of point mutations. @@ -34,6 +38,9 @@ \rcSubsect{June 2011} +Added suport for the GENIA sentence splitter allowing for a full GENIA application. See +Section~\ref{sec:domain-creole:biomed:genia} for details. + The rule and phase names are now accessible in a JAPE Java RHS by the \verb=ruleName()= and \verb=phaseName()= methods and the name of the JAPE processing resource executing the JAPE transducer is accessible This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |