From: Parker, D. <dav...@ds...> - 2010-11-22 06:03:32
|
UNCLASSIFIED Hi Gate can successfully load XML documents with namespaces, but it would appear that one cannot successfully write JAPE rules to match the Original markups. For example, the following original and GATE documents can be successfully loaded, and creates the expected Original markups, but it would appear that one cannot write rules that have input Types of the form dc:title or dc:description Is it possible to create rules to match type names with embedded colons, such as Dublin Core elements. The JAPE parser appears to object to the Input statement of a Phase having typenames with embedded colons. It raises the gate.jape.JapeException: Batch: error parsing transducer: Cannot parse a phase in file .... Cheers David ======================================================================== ======= Original document (extracted from http://dublincore.org/documents/dc-xml-guidelines/) ======================================================================== ======= <?xml version="1.0"?> <metadata xmlns="http://example.org/myapp/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://example.org/myapp/ http://example.org/myapp/schema.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title> UKOLN </dc:title> <dc:description> UKOLN is a national focus of expertise in digital information management. It provides policy, research and awareness services to the UK library, information and cultural heritage communities. UKOLN is based at the University of Bath. </dc:description> <dc:publisher> UKOLN, University of Bath </dc:publisher> <dc:identifier> http://www.ukoln.ac.uk/ </dc:identifier> </metadata> ======================================================================== ====== GATE GateDocument of original document ======================================================================== ======== <?xml version='1.0' encoding='windows-1252'?> <GateDocument> <!-- The document's features--> <GateDocumentFeatures> <Feature> <Name className="java.lang.String">gate.SourceURL</Name> <Value className="java.lang.String">file:/H:/MyDocuments/DigitalLibrary/Informa tion%20Extraction/GATE/plugins/DSTO/Resources/dublincoreExample.xml</Val ue> </Feature> <Feature> <Name className="java.lang.String">MimeType</Name> <Value className="java.lang.String">text/xml</Value> </Feature> </GateDocumentFeatures> <!-- The document content area with serialized nodes --> <TextWithNodes><Node id="0" /> <Node id="4" /> UKOLN <Node id="17" /> <Node id="20" /> UKOLN is a national focus of expertise in digital information management. It provides policy, research and awareness services to the UK library, information and cultural heritage communities. UKOLN is based at the University of Bath. <Node id="273" /> <Node id="276" /> UKOLN, University of Bath <Node id="309" /> <Node id="312" /> http://www.ukoln.ac.uk/ <Node id="343" /> <Node id="345" /></TextWithNodes> <!-- The default annotation set --> <AnnotationSet> </AnnotationSet> <!-- Named annotation set --> <AnnotationSet Name="Original markups"> <Annotation Id="0" Type="metadata" StartNode="0" EndNode="345"> <Feature> <Name className="java.lang.String">xmlns:xsi</Name> <Value className="java.lang.String">http://www.w3.org/2001/XMLSchema-instance</ Value> </Feature> <Feature> <Name className="java.lang.String">xmlns</Name> <Value className="java.lang.String">http://example.org/myapp/</Value> </Feature> <Feature> <Name className="java.lang.String">xsi:schemaLocation</Name> <Value className="java.lang.String">http://example.org/myapp/ http://example.org/myapp/schema.xsd</Value> </Feature> <Feature> <Name className="java.lang.String">xmlns:dc</Name> <Value className="java.lang.String">http://purl.org/dc/elements/1.1/</Value> </Feature> </Annotation> <Annotation Id="1" Type="dc:title" StartNode="4" EndNode="17"> </Annotation> <Annotation Id="2" Type="dc:description" StartNode="20" EndNode="273"> </Annotation> <Annotation Id="3" Type="dc:publisher" StartNode="276" EndNode="309"> </Annotation> <Annotation Id="4" Type="dc:identifier" StartNode="312" EndNode="343"> </Annotation> </AnnotationSet> </GateDocument> IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email. |