From: <tho...@us...> - 2011-06-28 13:18:44
|
Revision: 14095 http://gate.svn.sourceforge.net/gate/?rev=14095&view=rev Author: thomas_heitz Date: 2011-06-28 13:18:38 +0000 (Tue, 28 Jun 2011) Log Message: ----------- Add a section on UIMA CAS document format. Modified Paths: -------------- userguide/trunk/corpora.tex Modified: userguide/trunk/corpora.tex =================================================================== --- userguide/trunk/corpora.tex 2011-06-28 11:00:26 UTC (rev 14094) +++ userguide/trunk/corpora.tex 2011-06-28 13:18:38 UTC (rev 14095) @@ -474,6 +474,8 @@ Microsoft Office (some formats) \item OpenOffice (some formats) +\item +UIMA CAS \end{itemize} By default GATE will try and identify the type of the document, then strip @@ -1302,6 +1304,35 @@ resources and JAPE grammars designed for use with HTML files should also work well with PDF and Office documents. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsect[sec:corpora:uima]{UIMA CAS Documents} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +GATE can read UIMA CAS documents. The CAS stands for Common Analysis +Structure. It provides a common representation to the artifact being +analyzed, here a text. + +The subject of analysis (SOFA), here a string, is used as the document +content. Multiple sofa are concatenated. The analysis results or metadata +are added as annotations when having begin and end offsets and otherwise are +added as document features. The views are added as GATE annotation sets. +The type system (a hierarchical annotation schema) is not currently +supported. + +The web server content type associate with UIMA documents +is: {\em text/xmi+xml.} + +The extensions are: xcas, xmicas, xmi. + +The magic numbers are: +\begin{verbatim} +<CAS version="2"> +\end{verbatim} +and +\begin{verbatim} +xmlns:cas= +\end{verbatim} + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \sect[sec:corpora:xmlinout]{XML Input/Output} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |