From: <my...@us...> - 2009-06-20 14:53:09
|
Revision: 1988 http://aperture.svn.sourceforge.net/aperture/?rev=1988&view=rev Author: mylka Date: 2009-06-20 13:48:43 +0000 (Sat, 20 Jun 2009) Log Message: ----------- moved the main and test folder from src/src to just 'src' Added Paths: ----------- aperture-addons/trunk/src/main/ aperture-addons/trunk/src/test/ Removed Paths: ------------- aperture-addons/trunk/src/src/main/ aperture-addons/trunk/src/src/test/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <my...@us...> - 2009-06-20 14:53:56
|
Revision: 1987 http://aperture.svn.sourceforge.net/aperture/?rev=1987&view=rev Author: mylka Date: 2009-06-20 13:30:58 +0000 (Sat, 20 Jun 2009) Log Message: ----------- 2093266 - fourth step in aperture-addons mavenization, deleted the old non-maven-friendly folders Removed Paths: ------------- aperture-addons/trunk/src/activators/ aperture-addons/trunk/src/examples/ aperture-addons/trunk/src/java/ aperture-addons/trunk/src/test/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <my...@us...> - 2009-12-01 15:27:03
|
Revision: 2135 http://aperture.svn.sourceforge.net/aperture/?rev=2135&view=rev Author: mylka Date: 2009-12-01 15:26:55 +0000 (Tue, 01 Dec 2009) Log Message: ----------- second step in moving the apple crawlers to aperture-addons Added Paths: ----------- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/addressbook/ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/addressbook/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <my...@us...> - 2010-11-30 21:15:36
|
Revision: 2443 http://aperture.svn.sourceforge.net/aperture/?rev=2443&view=rev Author: mylka Date: 2010-11-30 21:15:27 +0000 (Tue, 30 Nov 2010) Log Message: ----------- made the aperture-addons work with current Tika Snapshot Modified Paths: -------------- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml Added Paths: ----------- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTUnresolvedProblemsTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTWorkingTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/IdentificationTestCase.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAUnresolvedProblemsTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAWorkingTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java aperture-addons/trunk/src/test/resources/test-documents/ aperture-addons/trunk/src/test/resources/test-documents/html-teampb.html aperture-addons/trunk/src/test/resources/test-documents/html-utf16-leading-whitespace-wrong-extension.doc aperture-addons/trunk/src/test/resources/test-documents/microsoft-excel-2007beta2.xlsb aperture-addons/trunk/src/test/resources/test-documents/microsoft-powerpoint-2007beta2.potm aperture-addons/trunk/src/test/resources/test-documents/microsoft-powerpoint-2007beta2.ppsm aperture-addons/trunk/src/test/resources/test-documents/microsoft-powerpoint-2007beta2.pptm aperture-addons/trunk/src/test/resources/test-documents/testEMLX.emlx aperture-addons/trunk/src/test/resources/test-documents/testEXCEL.xlsb aperture-addons/trunk/src/test/resources/test-documents/testFOXMAIL.BOX aperture-addons/trunk/src/test/resources/test-documents/testMHTMLFirefox.mhtml aperture-addons/trunk/src/test/resources/test-documents/testPRESENTATIONS3.0.shw aperture-addons/trunk/src/test/resources/test-documents/testQUATTRO.wb2 aperture-addons/trunk/src/test/resources/test-documents/testWORKSSPREADSHEET3.0.wks aperture-addons/trunk/src/test/resources/test-documents/testWORKSSPREADSHEET7.0.xlr Removed Paths: ------------- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaDocumentsIdentificationTest.java Modified: aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml =================================================================== --- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml 2010-11-26 13:09:11 UTC (rev 2442) +++ aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml 2010-11-30 21:15:27 UTC (rev 2443) @@ -508,6 +508,21 @@ <!-- the identifier only choses name-based over magic-based type if the former is a specialization of the latter --> <sub-class-of type="text/plain" /> </mime-type> + <mime-type type="message/x-emlx"> + <magic priority="50"> + <match value="Relay-Version:" type="string" offset="3:10"/> + <match value="#!\ rnews" type="string" offset="3:10"/> + <match value="N#!\ rnews" type="string" offset="3:10"/> + <match value="Forward\ to" type="string" offset="3:10"/> + <match value="Pipe\ to" type="string" offset="3:10"/> + <match value="Return-Path:" type="string" offset="3:10"/> + <match value="From:" type="string" offset="3:10"/> + <match value="Received:" type="string" offset="3:10"/> + <match type="string" value="Message-ID:" offset="3:10"/> + <match type="string" value="Date:" offset="3:10"/> + </magic> + <glob pattern="*.emlx"/> + </mime-type> <mime-type type="model/vnd.dwf"> <alias type="image/x-dwf"/> <!-- Aperture --> <magic priority="50"> Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTUnresolvedProblemsTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTUnresolvedProblemsTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTUnresolvedProblemsTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,67 @@ +package org.semanticdesktop.aperture.tika; + +import java.io.IOException; + +import org.apache.tika.detect.Detector; +import org.apache.tika.mime.MimeTypes; +import org.apache.tika.mime.MimeTypesEnhancer; +import org.junit.Ignore; + +/** + * Tests for mime type identification of documents from the Aperture Framework + * test suite. + */ +@Ignore +public class EnhancedMTUnresolvedProblemsTest extends IdentificationTestCase { + + public void testMultipleParents() throws IOException { + t("testQUATTRO.wb2", + "application/x-quattro-pro", // wb2 should be recognized + "application/x-123", // it's a lotus 123 magic number + "application/x-quattro-pro"); // but if we know the name - it's quattro pro + + t("testPRESENTATIONS3.0.shw", + "application/x-corelpresentations", // shw should be recognized + "application/vnd.wordperfect", // it's a wordperfect magic number + "application/x-corelpresentations"); // with the name - it's corel presentations + + t("testWORKSSPREADSHEET3.0.wks", + "application/vnd.ms-works", // wks should be recognized + "application/x-123", // it's a lotus 123 magic number + "application/vnd.ms-works"); // with the name - it's MS works + + t("testWORKSSPREADSHEET7.0.xlr", + "application/vnd.ms-works", // xlr should be recognized + "application/vnd.ms-excel", // it's an office magic number + // (recognized as excel by the ContainerAwareDetector) + "application/vnd.ms-works"); // with the name - it's ms WORKS + } + + public void testUtf16HTML() throws IOException { + t("html-utf16-leading-whitespace-wrong-extension.doc", + "application/msword", // the extension is wrong, no problems here + "text/html", // this should be identified as <html> + "text/html"); // this should be identified as <html> + } + + @Override + public Detector getNameOnlyDetector() { + return getEnhancedMT(); + } + + @Override + public Detector getDataDetector() { + return getEnhancedMT(); + } + + private Detector getEnhancedMT() { + MimeTypes mt = MimeTypes.getDefaultMimeTypes(); + try { + MimeTypesEnhancer.enhance(mt, getClass(). + getResourceAsStream("diff-mimetypes.xml")); + } catch (Exception e) { + throw new RuntimeException(e); + } + return mt; + } +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTUnresolvedProblemsTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTWorkingTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTWorkingTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTWorkingTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,61 @@ +package org.semanticdesktop.aperture.tika; + +import java.io.IOException; +import java.io.InputStream; +import java.util.LinkedList; +import java.util.List; + +import junit.framework.TestCase; + +import org.apache.tika.detect.ContainerAwareDetector; +import org.apache.tika.detect.Detector; +import org.apache.tika.metadata.Metadata; +import org.apache.tika.mime.MimeTypeException; +import org.apache.tika.mime.MimeTypes; +import org.apache.tika.mime.MimeTypesEnhancer; +import org.apache.tika.mime.MimeTypesFactory; + +/** + * Tests for mime type identification of documents from the Aperture Framework + * test suite. + */ +public class EnhancedMTWorkingTest extends IdentificationTestCase { + + public void testEMLX() throws IOException { + t("testEMLX.emlx", + "message/x-emlx", // the .emlx extension + "message/x-emlx", // our magic numbers should work + "message/x-emlx"); // this shouldn't be a problem + } + + public void testTeamPb() throws IOException { + t("html-teampb.html", + "text/html", // the .html + "application/xhtml+xml", // our magic numbers should work + "application/xhtml+xml"); // this shouldn't be a problem + } + + @Override + public Detector getNameOnlyDetector() { + MimeTypes mt = MimeTypes.getDefaultMimeTypes(); + try { + MimeTypesEnhancer.enhance(mt, getClass(). + getResourceAsStream("diff-mimetypes.xml")); + } catch (Exception e) { + throw new RuntimeException(e); + } + return mt; + } + + @Override + public Detector getDataDetector() { + MimeTypes mt = MimeTypes.getDefaultMimeTypes(); + try { + MimeTypesEnhancer.enhance(mt, getClass(). + getResourceAsStream("diff-mimetypes.xml")); + } catch (Exception e) { + throw new RuntimeException(e); + } + return mt; + } +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/EnhancedMTWorkingTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/IdentificationTestCase.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/IdentificationTestCase.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/IdentificationTestCase.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.semanticdesktop.aperture.tika; + +import java.io.IOException; +import java.io.InputStream; +import java.util.LinkedList; +import java.util.List; + +import junit.framework.TestCase; + +import org.apache.tika.detect.ContainerAwareDetector; +import org.apache.tika.detect.Detector; +import org.apache.tika.metadata.Metadata; +import org.apache.tika.mime.MimeTypes; + +/** + * A common superclass for tests of a {@link ContainerAwareDetector} coupled + * with the default {@link MimeTypes} detector. + * + * @author Antoni + * + */ +public abstract class IdentificationTestCase extends TestCase { + + private static final class IdentificationError { + public String documentName; + public String testType; + public String actual; + public String expected; + } + + public Detector getNameOnlyDetector() { + return MimeTypes.getDefaultMimeTypes(); + } + + public Detector getDataDetector() { + return new ContainerAwareDetector(fallback); + } + + public void setUp() { + this.fallback = getNameOnlyDetector(); + this.detector = getDataDetector(); + this.errors = new LinkedList<IdentificationError>(); + } + + public void tearDown() { + if (errors.size() > 0) { + StringBuffer msg = new StringBuffer(); + for (IdentificationError e : errors) { + msg.append("doc: " + e.documentName + " test: " + e.testType + " " + + "is: " + e.actual + " should be: " + e.expected + "\n"); + } + fail(msg.toString()); + } + } + + protected void t(String documentName, String typeNameOnly, + String typeDataOnly, String typeNameAndData) throws IOException { + assertTypeByName(typeNameOnly, documentName); + assertTypeByData(typeDataOnly, documentName); + assertTypeByNameAndData(typeNameAndData, documentName); + } + + private void assertTypeByName(String expected, String filename) + throws IOException { + Metadata metadata = new Metadata(); + metadata.set(Metadata.RESOURCE_NAME_KEY, filename); + String actual = fallback.detect(null, metadata).toString(); + if (!expected.equals(actual)) { + addError(filename,"name only",expected,actual); + } + } + + private void addError(String filename, String string, String expected, + String actual) { + IdentificationError error = new IdentificationError(); + error.documentName = filename; + error.testType = string; + error.actual = actual; + error.expected = expected; + errors.add(error); + } + + private void assertTypeByData(String expected, String filename) + throws IOException { + InputStream stream = getClass() + .getResourceAsStream("/test-documents/" + filename); + try { + Metadata metadata = new Metadata(); + String actual = detector.detect(stream, metadata).toString(); + if (!expected.equals(actual)) { + addError(filename,"data only",expected,actual); + } + + } finally { + stream.close(); + } + } + + private void assertTypeByNameAndData(String expected, String filename) + throws IOException { + InputStream stream = getClass() + .getResourceAsStream("/test-documents/" + filename); + try { + Metadata metadata = new Metadata(); + metadata.add(Metadata.RESOURCE_NAME_KEY, filename); + String actual = detector.detect(stream, metadata).toString(); + if (!expected.equals(actual)) { + addError(filename,"name and data",expected,actual); + } + } finally { + stream.close(); + } + } + + private Detector detector; + private Detector fallback; + private List<IdentificationError> errors; + +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/IdentificationTestCase.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAUnresolvedProblemsTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAUnresolvedProblemsTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAUnresolvedProblemsTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,83 @@ +package org.semanticdesktop.aperture.tika; + +import java.io.IOException; + +import org.apache.tika.detect.Detector; +import org.apache.tika.mime.MimeTypes; +import org.junit.Ignore; + +/** + * A set of tests for the plain unenhanced Tika trunk. A repository for further + * issues. + */ +@Ignore +public class PlainTikaCAUnresolvedProblemsTest extends IdentificationTestCase { + + /** + * Already covered by the TIKA-561, awaiting the application of my patch + * @throws IOException + */ + public void testEmlx() throws IOException { + t("testEMLX.emlx", + "message/x-emlx", // the .emlx extension + "message/x-emlx", // our magic numbers should work + "message/x-emlx"); // this shouldn't be a problem + } + + /** + * Covered by my patch to TIKA-560, but Nick Burch forgot to apply this particular + * fix in Tika rev 103 + * @throws IOException + */ + public void testXlsb() throws IOException { + t("testEXCEL.xlsb", + // .xlsb extension should be recognized + "application/vnd.ms-excel.sheet.binary.macroenabled.12", + // should be found by the ZipContainerDetector + "application/vnd.ms-excel.sheet.binary.macroenabled.12", + // this shouldn't be a problem + "application/vnd.ms-excel.sheet.binary.macroenabled.12"); + } + + /** + * Not covered anywhere, would require some rewriting in Tika + * @throws IOException + */ + public void testUtf16LeadingWhitespaceWrongExtension() throws IOException { + t("html-utf16-leading-whitespace-wrong-extension.doc", + "application/msword", // the extension is wrong, no problems here + "text/html", // this should be identified as <html> + "text/html"); // this should be identified as <html> + } + + /** + * Four files which would require the ability for a single mime type definition + * to have mulitple parents. + * + * @throws IOException + */ + public void testMultipleParents() throws IOException { + t("testQUATTRO.wb2", + "application/x-quattro-pro", // wb2 should be recognized + "application/x-123", // it's a lotus 123 magic number + "application/x-quattro-pro"); // but if we know the name - it's quattro pro + + t("testPRESENTATIONS3.0.shw", + "application/x-corelpresentations", // shw should be recognized + "application/vnd.wordperfect", // it's a wordperfect magic number + "application/x-corelpresentations"); // with the name - it's corel presentations + + t("testWORKSSPREADSHEET3.0.wks", + "application/vnd.ms-works", // wks should be recognized + "application/x-123", // it's a lotus 123 magic number + "application/vnd.ms-works"); // with the name - it's MS works + + t("testWORKSSPREADSHEET7.0.xlr", + "application/vnd.ms-works", // xlr should be recognized + "application/vnd.ms-excel", // it's an office magic number + // (recognized as excel by the ContainerAwareDetector) + "application/vnd.ms-works"); // with the name - it's ms WORKS + } + + +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAUnresolvedProblemsTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAWorkingTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAWorkingTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAWorkingTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,22 @@ +package org.semanticdesktop.aperture.tika; + +import org.apache.tika.detect.Detector; +import org.apache.tika.mime.MimeTypes; + +/** + * Tests for mime type identification of documents from the Aperture Framework + * test suite. + */ +public class PlainTikaCAWorkingTest extends IdentificationTestCase { + + + public void testDummy() { + + } + + @Override + public Detector getNameOnlyDetector() { + return MimeTypes.getDefaultMimeTypes(); + } + +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAWorkingTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,89 @@ +package org.semanticdesktop.aperture.tika; + +import java.io.IOException; + +import org.apache.tika.detect.Detector; +import org.apache.tika.mime.MimeTypes; +import org.junit.Ignore; + +/** + * A set of tests for the plain unenhanced Tika trunk. A repository for further + * issues. + */ +@Ignore +public class PlainTikaMTUnresolvedProblemsTest extends IdentificationTestCase { + + /** + * Already covered by the TIKA-561, awaiting the application of my patch + * @throws IOException + */ + public void testEmlx() throws IOException { + t("testEMLX.emlx", + "message/x-emlx", // the .emlx extension + "message/x-emlx", // our magic numbers should work + "message/x-emlx"); // this shouldn't be a problem + } + + /** + * Covered by my patch to TIKA-560, but Nick Burch forgot to apply this particular + * fix in Tika rev 103 + * @throws IOException + */ + public void testXlsb() throws IOException { + t("testEXCEL.xlsb", + // .xlsb extension should be recognized + "application/vnd.ms-excel.sheet.binary.macroenabled.12", + // should be found by the ZipContainerDetector + "application/vnd.ms-excel.sheet.binary.macroenabled.12", + // this shouldn't be a problem + "application/vnd.ms-excel.sheet.binary.macroenabled.12"); + } + + /** + * Not covered anywhere, would require some rewriting in Tika + * @throws IOException + */ + public void testUtf16LeadingWhitespaceWrongExtension() throws IOException { + t("html-utf16-leading-whitespace-wrong-extension.doc", + "application/msword", // the extension is wrong, no problems here + "text/html", // this should be identified as <html> + "text/html"); // this should be identified as <html> + } + + /** + * Four files which would require the ability for a single mime type definition + * to have mulitple parents. + * + * @throws IOException + */ + public void testMultipleParents() throws IOException { + t("testQUATTRO.wb2", + "application/x-quattro-pro", // wb2 should be recognized + "application/x-123", // it's a lotus 123 magic number + "application/x-quattro-pro"); // but if we know the name - it's quattro pro + + t("testPRESENTATIONS3.0.shw", + "application/x-corelpresentations", // shw should be recognized + "application/vnd.wordperfect", // it's a wordperfect magic number + "application/x-corelpresentations"); // with the name - it's corel presentations + + t("testWORKSSPREADSHEET3.0.wks", + "application/vnd.ms-works", // wks should be recognized + "application/x-123", // it's a lotus 123 magic number + "application/vnd.ms-works"); // with the name - it's MS works + + t("testWORKSSPREADSHEET7.0.xlr", + "application/vnd.ms-works", // xlr should be recognized + "application/vnd.ms-excel", // it's an office magic number + // (recognized as excel by the ContainerAwareDetector) + "application/vnd.ms-works"); // with the name - it's ms WORKS + } + + /* (non-Javadoc) + * @see org.semanticdesktop.aperture.tika.ContainerAwareIdentificationTestCase#getDataDetector() + */ + @Override + public Detector getDataDetector() { + return MimeTypes.getDefaultMimeTypes(); + } +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,32 @@ +package org.semanticdesktop.aperture.tika; + +import java.io.IOException; + +import org.apache.tika.detect.Detector; +import org.apache.tika.mime.MimeTypes; + +/** + * A set of tests for the plain unenhanced Tika trunk. A repository for further + * issues. + */ +public class PlainTikaMTWorkingTest extends IdentificationTestCase { + + /** + * Part of our test suite. The test submitted for 3025427 + * @throws IOException + */ + public void testTeamPB() throws IOException { + t("html-teampb.html", + "text/html", // the .html + "application/xhtml+xml", // our magic numbers should work + "application/xhtml+xml"); // this shouldn't be a problem + } + + /* (non-Javadoc) + * @see org.semanticdesktop.aperture.tika.ContainerAwareIdentificationTestCase#getDataDetector() + */ + @Override + public Detector getDataDetector() { + return MimeTypes.getDefaultMimeTypes(); + } +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain Deleted: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaDocumentsIdentificationTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaDocumentsIdentificationTest.java 2010-11-26 13:09:11 UTC (rev 2442) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaDocumentsIdentificationTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -1,272 +0,0 @@ -/* - * Copyright (c) 2010 Aduna. - * All rights reserved. - * - * Licensed under the Aperture BSD-style license. - */ -package org.semanticdesktop.aperture.tika; - -import org.junit.Before; -import org.junit.Test; -import org.semanticdesktop.aperture.mime.identifier.AbstractIdentificationTest; -import org.semanticdesktop.aperture.mime.identifier.MimeTypeIdentifier; - -public class TikaDocumentsIdentificationTest extends AbstractIdentificationTest { - - private MimeTypeIdentifier identifier; - - @Before - public void setUp() { - this.identifier = new TikaMimeTypeIdentifier(); - } - - @Test - public void testIdentification() throws Exception { - - t("bzip2-txt-bziptest.txt.bz2", "application/x-bzip", "application/x-bzip2"); - t("compress-txt-compresstest.txt.Z", "application/x-compress", "application/x-compress"); - t("corel-presentations-3.0.shw", "application/vnd.wordperfect","application/vnd.wordperfect"); // wrong, tika limitation - t("corel-presentations-x3.shw", "application/x-tika-msoffice","application/x-corelpresentations"); - t("corel-quattro-pro-6.wb2", "application/x-123", "application/x-123"); // wrong, tika limitation - t("corel-quattro-pro-7.wb3", "application/x-quattro-pro", "application/x-quattro-pro"); - t("corel-quattro-pro-x3.qpw", "application/x-quattro-pro", "application/x-quattro-pro"); - t("corel-wordperfect-4.2.wp", "application/octet-stream", "application/vnd.wordperfect"); - t("corel-wordperfect-5.0.wp", "application/vnd.wordperfect","application/vnd.wordperfect"); - t("corel-wordperfect-5.1-far-east.wp", "application/vnd.wordperfect","application/vnd.wordperfect"); - t("corel-wordperfect-5.1.wp", "application/vnd.wordperfect","application/vnd.wordperfect"); - t("corel-wordperfect-x3.wpd", "application/vnd.wordperfect","application/vnd.wordperfect"); - t("cpio-testfile.txt.cpio", "application/x-cpio", "application/x-cpio"); - t("counting-input-stream-test-file.dat", "application/x-tika-ooxml", "application/x-tika-ooxml"); - t("emlx-74719.emlx", "text/html", "text/html"); // wrong, tika limitation - t("faulty-fileaccessdata-is-ignored.xml","application/x-gzip", "application/x-gzip"); - t("foxmail-in.BOX", "application/octet-stream", "application/vnd.previewsystems.box"); // very wrong, WTF is vnd.previewsystems.box ??? - t("html-condenast.html", "text/html", "text/html"); - t("html-handwritten-with-wrong-file-extension.txt","text/html", "text/html"); - t("html-handwritten.html", "text/html", "text/html"); - t("html-mixed-case-header-and-wrong-extension.txt","text/html", "text/html"); - t("html-quelle.de.html", "text/html", "text/html"); - t("html-teampb.html", "application/xhtml+xml", "application/xhtml+xml"); - t("html-utf16-leading-whitespace-wrong-extension.doc","text/plain", "text/plain"); // wrong, tika limitation - t("html-youtube-contenttypeinhttpheaders.html","text/html", "text/html"); - t("jingle1.mp3", "audio/mpeg", "audio/mpeg"); - t("jingle2.mp3", "audio/mpeg", "audio/mpeg"); - t("jingle3.mp3", "audio/mpeg", "audio/mpeg"); - t("jpg-exif-img_9367.JPG", "image/jpeg", "image/jpeg"); - t("jpg-exif-zerolength.jpg", "application/octet-stream", "image/jpeg"); // empty file - t("jpg-geotagged-ipanema.jpg", "image/jpeg", "image/jpeg"); - t("jpg-geotagged.jpg", "image/jpeg", "image/jpeg"); - t("mail-attachment.eml", "message/rfc822", "message/rfc822"); - t("mail-conflict-desktop1.eml", "text/plain", "message/rfc822"); // wrong - t("mail-conflict-desktop2.eml", "text/plain", "message/rfc822"); // wrong - t("mail-forwarded-references.eml", "text/plain", "message/rfc822"); // wrong - t("mail-mapi125messageid.eml", "message/rfc822", "message/rfc822"); - t("mail-mbox-aperture-inc1-mail1.eml", "text/plain", "message/rfc822"); // wrong - t("mail-mbox-aperture-inc1-mail2.eml", "text/plain", "message/rfc822"); // wrong - t("mail-mbox-aperture-inc1-mail3.eml", "text/plain", "message/rfc822"); // wrong - t("mail-mbox-aperture-inc1-mail4.eml", "text/plain", "message/rfc822"); // wrong - t("mail-multipart-plain-html.eml", "text/plain", "message/rfc822"); // wrong - t("mail-multipart-related-bug.eml", "message/rfc822", "message/rfc822"); - t("mail-multipart-test.eml", "text/plain", "message/rfc822"); // wrong - t("mail-multipart-test.eml.tar.gz", "application/x-gzip", "application/x-gzip"); - t("mail-plaintext-attachment.eml", "message/rfc822", "message/rfc822"); - t("mail-threaded.eml", "text/html", "text/html"); // very wrong - t("mail-thunderbird-1.5-unspecifiedcharset.eml","message/rfc822", "message/rfc822"); - t("mail-thunderbird-1.5.eml", "message/rfc822", "message/rfc822"); - t("mail-UnsupportedOperationException.eml","message/rfc822", "message/rfc822"); - t("mail-xml-attachment.eml", "message/rfc822", "message/rfc822"); - t("mail.msg", "application/x-tika-msoffice","application/vnd.ms-outlook"); // wrong - t("mbox-aperture-dev", "application/mbox", "application/mbox"); - t("mbox-aperture-inc1", "application/mbox", "application/mbox"); - t("mbox-aperture-inc2", "application/mbox", "application/mbox"); - t("mbox-aperture-inc3", "application/mbox", "application/mbox"); - t("mbox-aperture-inc4", "application/mbox", "application/mbox"); - t("mbox-noblanklinebetweenmails.mbox", "application/mbox", "application/mbox"); - t("mbox-testfolder", "application/mbox", "application/mbox"); - t("mhtml-firefox.mht", "text/html", "text/html"); - t("mhtml-internet-explorer.mht", "message/rfc822", "message/rfc822"); - - t("microsoft-excel-2000.xls", "application/x-tika-msoffice", // wrong - "application/vnd.ms-excel"); - t("microsoft-excel-2007beta2.xlam", "application/x-tika-ooxml", // wrong - "application/vnd.ms-excel.addin.macroenabled.12"); // to clarify - t("microsoft-excel-2007beta2.xlsb", "application/x-tika-ooxml", // wrong - "application/x-tika-ooxml"); // wrong - t("microsoft-excel-2007beta2.xlsm", "application/x-tika-ooxml", // wrong - "application/vnd.ms-excel.sheet.macroenabled.12"); // to clarify - t("microsoft-excel-2007beta2.xlsx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); - t("microsoft-excel-2007beta2.xltm", "application/x-tika-ooxml", // wrong - "application/vnd.ms-excel.template.macroenabled.12"); - t("microsoft-excel-2007beta2.xltx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.spreadsheetml.template"); - t("microsoft-excel-2010beta.xlsx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); - - t("microsoft-powerpoint-2000.ppt", "application/x-tika-msoffice", // wrong - "application/vnd.ms-powerpoint"); - t("microsoft-powerpoint-2007beta2.potm", "application/x-tika-ooxml", // wrong - "application/x-tika-ooxml"); // wrong - t("microsoft-powerpoint-2007beta2.potx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.presentationml.template"); - t("microsoft-powerpoint-2007beta2.ppsm", "application/x-tika-ooxml", // wrong - "application/x-tika-ooxml"); // to clarify - t("microsoft-powerpoint-2007beta2.ppsx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.presentationml.slideshow"); - t("microsoft-powerpoint-2007beta2.pptm", "application/x-tika-ooxml", // wrong - "application/x-tika-ooxml"); // wrong - t("microsoft-powerpoint-2007beta2.pptx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.presentationml.presentation"); - t("microsoft-powerpoint-2010beta.pptx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.presentationml.presentation"); - t("microsoft-powerpoint-invalidunicode.ppt","application/x-tika-msoffice", // wrong - "application/vnd.ms-powerpoint"); - - t("microsoft-publisher-2003.pub","application/x-tika-msoffice","application/x-mspublisher"); // wrong - t("microsoft-visio.vsd","application/x-tika-msoffice","application/vnd.visio"); // wrong - - t("microsoft-word-2000-with-wrong-file-extension.pdf","application/x-tika-msoffice", // wrong - "application/x-tika-msoffice"); // wrong - t("microsoft-word-2000.doc", "application/x-tika-msoffice", // wrong - "application/msword"); - t("microsoft-word-2007beta2.docm", "application/x-tika-ooxml", // wrong - "application/vnd.ms-word.document.macroenabled.12"); - t("microsoft-word-2007beta2.docx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); - t("microsoft-word-2007beta2.dotm", "application/x-tika-ooxml", // wrong - "application/vnd.ms-word.template.macroenabled.12"); - t("microsoft-word-2007beta2.dotx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.wordprocessingml.template"); // to clarify - t("microsoft-word-2010beta.docx", "application/x-tika-ooxml", // wrong - "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); - t("microsoft-word-history-blair.doc", "application/x-tika-msoffice", // wrong - "application/msword"); - t("microsoft-word-illegal-unicode-characters.doc", "application/x-tika-msoffice", // wrong - "application/msword"); - t("microsoft-word-testdoc-comments.doc", "application/x-tika-msoffice", // wrong - "application/msword"); - t("microsoft-word-testdoc-nocomments.doc","application/x-tika-msoffice", // wrong - "application/msword"); - - t("microsoft-works-spreadsheet-3.0.wks", "application/x-123", "application/x-123"); // tika limitation, can't recognize works 3.0 spreadsheets - t("microsoft-works-spreadsheet-4.0-2000.wks", "application/vnd.ms-works","application/vnd.ms-works"); - t("microsoft-works-spreadsheet-7.0.xlr", "application/vnd.ms-excel","application/vnd.ms-excel"); // tika limitation - - t("microsoft-works-word-processor-2000.wps", "application/vnd.ms-works", "application/vnd.ms-works"); - t("microsoft-works-word-processor-3.0.wps", "application/x-tika-msoffice", "application/vnd.ms-works"); - t("microsoft-works-word-processor-4.0.wps", "application/x-tika-msoffice", "application/vnd.ms-works"); - t("microsoft-works-word-processor-7.0.wps", "application/x-tika-msoffice", "application/vnd.ms-works"); - - t("openoffice-1.1.5-calc-template.stc", "application/vnd.sun.xml.calc", "application/vnd.sun.xml.calc"); - t("openoffice-1.1.5-calc.sxc", "application/vnd.sun.xml.calc", "application/vnd.sun.xml.calc"); - t("openoffice-1.1.5-draw-template.std", "application/vnd.sun.xml.draw", "application/vnd.sun.xml.draw"); - t("openoffice-1.1.5-draw.sxd", "application/vnd.sun.xml.draw", "application/vnd.sun.xml.draw"); - t("openoffice-1.1.5-impress-template.sti", "application/vnd.sun.xml.impress", "application/vnd.sun.xml.impress"); - t("openoffice-1.1.5-impress.sxi", "application/vnd.sun.xml.impress", "application/vnd.sun.xml.impress"); - t("openoffice-1.1.5-writer-template.stw", "application/vnd.sun.xml.writer", "application/vnd.sun.xml.writer"); - t("openoffice-1.1.5-writer.sxw", "application/vnd.sun.xml.writer", "application/vnd.sun.xml.writer"); - - t("openoffice-2.0-calc-template.ots", "application/vnd.oasis.opendocument.spreadsheet-template", - "application/vnd.oasis.opendocument.spreadsheet-template"); - t("openoffice-2.0-calc.ods", "application/vnd.oasis.opendocument.spreadsheet", - "application/vnd.oasis.opendocument.spreadsheet"); - t("openoffice-2.0-draw-template.otg", "application/vnd.oasis.opendocument.graphics-template", - "application/vnd.oasis.opendocument.graphics-template"); - t("openoffice-2.0-draw.odg", "application/vnd.oasis.opendocument.graphics", - "application/vnd.oasis.opendocument.graphics"); - t("openoffice-2.0-formula.odf", "application/vnd.oasis.opendocument.formula", - "application/vnd.oasis.opendocument.formula"); - t("openoffice-2.0-impress-template.otp","application/vnd.oasis.opendocument.presentation-template", - "application/vnd.oasis.opendocument.presentation-template"); - t("openoffice-2.0-impress.odp", "application/vnd.oasis.opendocument.presentation", - "application/vnd.oasis.opendocument.presentation"); - t("openoffice-2.0-writer-template.ott", "application/vnd.oasis.opendocument.text-template", - "application/vnd.oasis.opendocument.text-template"); - t("openoffice-2.0-writer.odt", "application/vnd.oasis.opendocument.text", - "application/vnd.oasis.opendocument.text"); - - t("pdf-distiller-6-weirdchars.pdf", "application/pdf", "application/pdf"); - t("pdf-manyauthors.pdf", "application/pdf", "application/pdf"); - t("pdf-no-author.pdf", "application/pdf", "application/pdf"); - t("pdf-openoffice-1.1.5-writer.pdf", "application/pdf", "application/pdf"); - t("pdf-openoffice-2.0-writer.pdf", "application/pdf", "application/pdf"); - t("pdf-openoffice-2.0-writer.pdf.tar", "application/x-tar", "application/x-tar"); - t("pdf-word-2000-pdfcreator-0.8.0.pdf", "application/pdf", "application/pdf"); - t("pdf-word-2000-pdfmaker-7.0.pdf", "application/pdf", "application/pdf"); - t("pdf-word-2000-pdfwriter-7.0.pdf", "application/pdf", "application/pdf"); - - t("plain-text-ansi.txt", "text/plain", "text/plain"); - t("plain-text-china-wikipedia-utf16be.txt", "application/octet-stream", "text/plain"); - t("plain-text-china-wikipedia-utf8.txt", "text/plain", "text/plain"); - t("plain-text-chinese-garbled-name-gb18030.txt", "text/plain", "text/plain"); - t("plain-text-chinese-gb18030.txt", "text/plain", "text/plain"); - t("plain-text-chinese-utf16.txt", "text/plain", "text/plain"); - t("plain-text-empty.txt", "application/octet-stream", "text/plain"); - t("plain-text-japan-wikipedia-eucjp.txt", "text/plain", "text/plain"); - t("plain-text-japanese-juniversalchardettest-bomremoved-utf16le.txt", "application/octet-stream", "text/plain"); - t("plain-text-japanese-juniversalchardettest-eucjp.txt", "text/plain", "text/plain"); - t("plain-text-japanese-juniversalchardettest-iso2022jp.txt", "text/plain", "text/plain"); - t("plain-text-japanese-juniversalchardettest-shiftjis.txt", "text/plain", "text/plain"); - t("plain-text-japanese-juniversalchardettest-utf8nobom.txt", "text/plain", "text/plain"); - t("plain-text-pt-ksiega1-latin2.txt", "text/plain", "text/plain"); - t("plain-text-pt-ksiega1-utf16be.txt", "application/octet-stream", "text/plain"); - t("plain-text-pt-ksiega1-utf16le.txt", "application/octet-stream", "text/plain"); - t("plain-text-pt-ksiega1-utf8.txt", "text/plain", "text/plain"); - t("plain-text-utf16be.txt", "text/plain", "text/plain"); - t("plain-text-utf16le.txt", "text/plain", "text/plain"); - t("plain-text-utf8.txt", "text/plain", "text/plain"); - t("plain-text-with-null-character.txt", "application/octet-stream", "text/plain"); - t("plain-text-without-extension", "text/plain", "text/plain"); - t("plain-text.txt", "text/plain", "text/plain"); - - t("rtf-openoffice-1.1.5.rtf", "application/rtf", "application/rtf"); - t("rtf-openoffice-2.0.rtf", "application/rtf", "application/rtf"); - t("rtf-staroffice-5.2.rtf", "application/rtf", "application/rtf"); - t("rtf-word-2000.rtf", "application/rtf", "application/rtf"); - - t("staroffice-5.2-calc-template.vor", "application/x-tika-msoffice", "application/x-tika-msoffice"); - t("staroffice-5.2-calc.sdc", "application/x-tika-msoffice", "application/x-tika-msoffice"); - t("staroffice-5.2-draw-template.vor", "application/x-tika-msoffice", "application/x-tika-msoffice"); - t("staroffice-5.2-draw.sda", "application/x-tika-msoffice", "application/x-tika-msoffice"); - t("staroffice-5.2-impress-template.vor", "application/x-tika-msoffice", "application/x-tika-msoffice"); - t("staroffice-5.2-impress.sdd", "application/x-tika-msoffice", "application/x-tika-msoffice"); - t("staroffice-5.2-writer-template.vor", "application/x-tika-msoffice", "application/x-tika-msoffice"); - t("staroffice-5.2-writer.sdw", "application/x-tika-msoffice", "application/x-tika-msoffice"); - - t("tar-test.tar","application/x-tar","application/x-tar"); - - t("thunderbird-addressbook.mab","text/plain","application/x-mozilla-addressbook"); - - t("vcard-antoni-cardpicture.vcf","text/x-vcard","text/x-vcard"); - t("vcard-antoni-kontact.vcf","text/x-vcard","text/x-vcard"); - t("vcard-antoni-outlook2003-urlphoto.vcf","text/x-vcard","text/x-vcard"); - t("vcard-antoni-outlook2003.vcf","text/x-vcard","text/x-vcard"); - t("vcard-dirk-corrupted.vcf","text/plain","text/x-vcard"); // wrong, but this one is corrupted - t("vcard-dirk.vcf","text/x-vcard","text/x-vcard"); - t("vcard-illegalurl.vcf","text/x-vcard","text/x-vcard"); - t("vcard-incompletenproperty.vcf","text/x-vcard","text/x-vcard"); - t("vcard-rfc2426.vcf","text/x-vcard","text/x-vcard"); - t("vcard-vCards-SAP-onemodified.vcf","text/x-vcard","text/x-vcard"); - t("vcard-vCards-SAP.vcf","text/x-vcard","text/x-vcard"); - - t("xml-handwritten.xml","application/xml","application/xml"); - t("xml-nonexistent-dtd.xml","application/xml","application/xml"); - t("xml-nonexistent-remote-dtd.xml","application/xml","application/xml"); - t("xml-nonexistent-remote-xsd.xml","application/xml","application/xml"); - t("xml-nonexistent-xsd.xml","application/xml","application/xml"); - t("xml-utf8-bom","text/plain","text/plain"); - - t("zip_7zr_on_linux_password_hello.zip","application/x-7z-compressed","application/x-7z-compressed"); - t("zip-infiniteloop.zip","application/zip","application/zip"); - t("zip-mail-attachment.zip","application/zip","application/zip"); - t("zip-mail-forwarded-message.zip","application/zip","application/zip"); - t("zip-multivolume-firstvolume.zip","application/zip","application/zip"); - t("zip-problem.zip","application/zip","application/zip"); - t("zip-somedocs.zip","application/zip","application/zip"); - t("zip-test.zip","application/zip","application/zip"); - } - - private void t(String name, String mimeTypeWithoutName, String mimeTypeWithName) throws Exception { - test(identifier, mimeTypeWithoutName, "/org/semanticdesktop/aperture/docs/" + name, false); - test(identifier, mimeTypeWithName, "/org/semanticdesktop/aperture/docs/" + name, true); - } -} Copied: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java (from rev 2441, aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaDocumentsIdentificationTest.java) =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java 2010-11-30 21:15:27 UTC (rev 2443) @@ -0,0 +1,271 @@ +/* + * Copyright (c) 2010 Aduna. + * All rights reserved. + * + * Licensed under the Aperture BSD-style license. + */ +package org.semanticdesktop.aperture.tika; + +import org.junit.Before; +import org.junit.Test; +import org.semanticdesktop.aperture.mime.identifier.AbstractIdentificationTest; +import org.semanticdesktop.aperture.mime.identifier.MimeTypeIdentifier; + +public class TikaMimeTypeIdentifierTest extends AbstractIdentificationTest { + + private MimeTypeIdentifier identifier; + + @Before + public void setUp() { + this.identifier = new TikaMimeTypeIdentifier(); + } + + @Test + public void testIdentification() throws Exception { + t("bzip2-txt-bziptest.txt.bz2", "application/x-bzip", "application/x-bzip2"); + t("compress-txt-compresstest.txt.Z", "application/x-compress", "application/x-compress"); + t("corel-presentations-3.0.shw", "application/vnd.wordperfect","application/vnd.wordperfect"); // wrong, tika limitation + t("corel-presentations-x3.shw", "application/x-tika-msoffice","application/x-corelpresentations"); + t("corel-quattro-pro-6.wb2", "application/x-123", "application/x-123"); // wrong, tika limitation + t("corel-quattro-pro-7.wb3", "application/x-quattro-pro", "application/x-quattro-pro"); + t("corel-quattro-pro-x3.qpw", "application/x-quattro-pro", "application/x-quattro-pro"); + t("corel-wordperfect-4.2.wp", "application/octet-stream", "application/vnd.wordperfect"); + t("corel-wordperfect-5.0.wp", "application/vnd.wordperfect","application/vnd.wordperfect"); + t("corel-wordperfect-5.1-far-east.wp", "application/vnd.wordperfect","application/vnd.wordperfect"); + t("corel-wordperfect-5.1.wp", "application/vnd.wordperfect","application/vnd.wordperfect"); + t("corel-wordperfect-x3.wpd", "application/vnd.wordperfect","application/vnd.wordperfect"); + t("cpio-testfile.txt.cpio", "application/x-cpio", "application/x-cpio"); + t("counting-input-stream-test-file.dat", "application/x-tika-ooxml", "application/x-tika-ooxml"); + t("emlx-74719.emlx", "message/x-emlx", "message/x-emlx"); + t("faulty-fileaccessdata-is-ignored.xml","application/x-gzip", "application/x-gzip"); + t("foxmail-in.BOX", "application/x-foxmail", "application/x-foxmail"); + t("html-condenast.html", "text/html", "text/html"); + t("html-handwritten-with-wrong-file-extension.txt","text/html", "text/html"); + t("html-handwritten.html", "text/html", "text/html"); + t("html-mixed-case-header-and-wrong-extension.txt","text/html", "text/html"); + t("html-quelle.de.html", "text/html", "text/html"); + t("html-teampb.html", "application/xhtml+xml", "application/xhtml+xml"); + t("html-utf16-leading-whitespace-wrong-extension.doc","text/plain", "text/plain"); // wrong, tika limitation + t("html-youtube-contenttypeinhttpheaders.html","text/html", "text/html"); + t("jingle1.mp3", "audio/mpeg", "audio/mpeg"); + t("jingle2.mp3", "audio/mpeg", "audio/mpeg"); + t("jingle3.mp3", "audio/mpeg", "audio/mpeg"); + t("jpg-exif-img_9367.JPG", "image/jpeg", "image/jpeg"); + t("jpg-exif-zerolength.jpg", "application/octet-stream", "image/jpeg"); // empty file + t("jpg-geotagged-ipanema.jpg", "image/jpeg", "image/jpeg"); + t("jpg-geotagged.jpg", "image/jpeg", "image/jpeg"); + t("mail-attachment.eml", "message/rfc822", "message/rfc822"); + t("mail-conflict-desktop1.eml", "text/plain", "message/rfc822"); // wrong + t("mail-conflict-desktop2.eml", "text/plain", "message/rfc822"); // wrong + t("mail-forwarded-references.eml", "text/plain", "message/rfc822"); // wrong + t("mail-mapi125messageid.eml", "message/rfc822", "message/rfc822"); + t("mail-mbox-aperture-inc1-mail1.eml", "text/plain", "message/rfc822"); // wrong + t("mail-mbox-aperture-inc1-mail2.eml", "text/plain", "message/rfc822"); // wrong + t("mail-mbox-aperture-inc1-mail3.eml", "text/plain", "message/rfc822"); // wrong + t("mail-mbox-aperture-inc1-mail4.eml", "text/plain", "message/rfc822"); // wrong + t("mail-multipart-plain-html.eml", "text/plain", "message/rfc822"); // wrong + t("mail-multipart-related-bug.eml", "message/rfc822", "message/rfc822"); + t("mail-multipart-test.eml", "text/plain", "message/rfc822"); // wrong + t("mail-multipart-test.eml.tar.gz", "application/x-gzip", "application/x-gzip"); + t("mail-plaintext-attachment.eml", "message/rfc822", "message/rfc822"); + t("mail-threaded.eml", "application/mbox", "application/mbox"); + t("mail-thunderbird-1.5-unspecifiedcharset.eml","message/rfc822", "message/rfc822"); + t("mail-thunderbird-1.5.eml", "message/rfc822", "message/rfc822"); + t("mail-UnsupportedOperationException.eml","message/rfc822", "message/rfc822"); + t("mail-xml-attachment.eml", "message/rfc822", "message/rfc822"); + t("mail.msg", "application/x-tika-msoffice","application/vnd.ms-outlook"); // wrong + t("mbox-aperture-dev", "application/mbox", "application/mbox"); + t("mbox-aperture-inc1", "application/mbox", "application/mbox"); + t("mbox-aperture-inc2", "application/mbox", "application/mbox"); + t("mbox-aperture-inc3", "application/mbox", "application/mbox"); + t("mbox-aperture-inc4", "application/mbox", "application/mbox"); + t("mbox-noblanklinebetweenmails.mbox", "application/mbox", "application/mbox"); + t("mbox-testfolder", "application/mbox", "application/mbox"); + t("mhtml-firefox.mht", "message/rfc822", "message/rfc822"); + t("mhtml-internet-explorer.mht", "message/rfc822", "message/rfc822"); + + t("microsoft-excel-2000.xls", "application/x-tika-msoffice", // wrong + "application/vnd.ms-excel"); + t("microsoft-excel-2007beta2.xlam", "application/x-tika-ooxml", // wrong + "application/vnd.ms-excel.addin.macroenabled.12"); + t("microsoft-excel-2007beta2.xlsb", "application/x-tika-ooxml", // UP + "application/x-tika-ooxml"); // UP + t("microsoft-excel-2007beta2.xlsm", "application/x-tika-ooxml", // wrong + "application/vnd.ms-excel.sheet.macroenabled.12"); + t("microsoft-excel-2007beta2.xlsx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); + t("microsoft-excel-2007beta2.xltm", "application/x-tika-ooxml", // wrong + "application/vnd.ms-excel.template.macroenabled.12"); + t("microsoft-excel-2007beta2.xltx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.spreadsheetml.template"); + t("microsoft-excel-2010beta.xlsx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); + + t("microsoft-powerpoint-2000.ppt", "application/x-tika-msoffice", // wrong + "application/vnd.ms-powerpoint"); + t("microsoft-powerpoint-2007beta2.potm", "application/x-tika-ooxml", // ZipContainerDetector requires full file + "application/vnd.ms-powerpoint.template.macroenabled.12"); + t("microsoft-powerpoint-2007beta2.potx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.presentationml.template"); + t("microsoft-powerpoint-2007beta2.ppsm", "application/x-tika-ooxml", // wrong + "application/x-tika-ooxml"); // to clarify + t("microsoft-powerpoint-2007beta2.ppsx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.presentationml.slideshow"); + t("microsoft-powerpoint-2007beta2.pptm", "application/x-tika-ooxml", // wrong + "application/x-tika-ooxml"); // wrong + t("microsoft-powerpoint-2007beta2.pptx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.presentationml.presentation"); + t("microsoft-powerpoint-2010beta.pptx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.presentationml.presentation"); + t("microsoft-powerpoint-invalidunicode.ppt","application/x-tika-msoffice", // wrong + "application/vnd.ms-powerpoint"); + + t("microsoft-publisher-2003.pub","application/x-tika-msoffice","application/x-mspublisher"); // wrong + t("microsoft-visio.vsd","application/x-tika-msoffice","application/vnd.visio"); // wrong + + t("microsoft-word-2000-with-wrong-file-extension.pdf","application/x-tika-msoffice", // wrong + "application/x-tika-msoffice"); // wrong + t("microsoft-word-2000.doc", "application/x-tika-msoffice", // wrong + "application/msword"); + t("microsoft-word-2007beta2.docm", "application/x-tika-ooxml", // wrong + "application/vnd.ms-word.document.macroenabled.12"); + t("microsoft-word-2007beta2.docx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); + t("microsoft-word-2007beta2.dotm", "application/x-tika-ooxml", // wrong + "application/vnd.ms-word.template.macroenabled.12"); + t("microsoft-word-2007beta2.dotx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.wordprocessingml.template"); // to clarify + t("microsoft-word-2010beta.docx", "application/x-tika-ooxml", // wrong + "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); + t("microsoft-word-history-blair.doc", "application/x-tika-msoffice", // wrong + "application/msword"); + t("microsoft-word-illegal-unicode-characters.doc", "application/x-tika-msoffice", // wrong + "application/msword"); + t("microsoft-word-t... [truncated message content] |
From: <my...@us...> - 2010-11-30 22:46:09
|
Revision: 2444 http://aperture.svn.sourceforge.net/aperture/?rev=2444&view=rev Author: mylka Date: 2010-11-30 22:46:00 +0000 (Tue, 30 Nov 2010) Log Message: ----------- committed all unresolved problems with the TikaMimeTypeIdentifier Modified Paths: -------------- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAUnresolvedProblemsTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaCAWorkingTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java Added Paths: ----------- aperture-addons/trunk/src/test/resources/test-documents/testPPT.potm aperture-addons/trunk/src/test/resources/test-documents/testPPT.ppsm aperture-addons/trunk/src/test/resources/test-documents/testPPT.pptm aperture-addons/trunk/src/test/resources/test-documents/testVORCalcTemplate.vor aperture-addons/trunk/src/test/resources/test-documents/testVORDrawTemplate.vor aperture-addons/trunk/src/test/resources/test-documents/testVORImpressTemplate.vor aperture-addons/trunk/src/test/resources/test-documents/testVORWriterTemplate.vor aperture-addons/trunk/src/test/resources/test-documents/testXMLUtf8BOM.noextension Removed Paths: ------------- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/full-mimetypes.xml aperture-addons/trunk/src/test/resources/test-documents/microsoft-excel-2007beta2.xlsb aperture-addons/trunk/src/test/resources/test-documents/microsoft-powerpoint-2007beta2.potm aperture-addons/trunk/src/test/resources/test-documents/microsoft-powerpoint-2007beta2.ppsm aperture-addons/trunk/src/test/resources/test-documents/microsoft-powerpoint-2007beta2.pptm Modified: aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml =================================================================== --- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml 2010-11-30 21:15:27 UTC (rev 2443) +++ aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/diff-mimetypes.xml 2010-11-30 22:46:00 UTC (rev 2444) @@ -161,6 +161,22 @@ <match value="rtsp://" type="string" offset="0" /> </magic> </mime-type> + <mime-type type="application/vnd.stardivision.calc"> + <sub-class-of type="application/x-tika-msoffice" /> + <glob pattern="*.sdc" /> + </mime-type> + <mime-type type="application/vnd.stardivision.draw"> + <sub-class-of type="application/x-tika-msoffice" /> + <glob pattern="*.sda" /> + </mime-type> + <mime-type type="application/vnd.stardivision.impress"> + <sub-class-of type="application/x-tika-msoffice" /> + <glob pattern="*.sdd" /> + </mime-type> + <mime-type type="application/vnd.stardivision.writer"> + <sub-class-of type="application/x-tika-msoffice" /> + <glob pattern="*.sdw" /> + </mime-type> <mime-type type="application/vnd.sun.xml.calc"> <sub-class-of type="application/zip" /> </mime-type> Deleted: aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/full-mimetypes.xml =================================================================== --- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/full-mimetypes.xml 2010-11-30 21:15:27 UTC (rev 2443) +++ aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/tika/full-mimetypes.xml 2010-11-30 22:46:00 UTC (rev 2444) @@ -1,4470 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- -/* - * Copyright (c) 2010 Aduna. - * All rights reserved. - * - * Licensed under the Aperture BSD-style license. - */ ---> -<mime-info> - - <mime-type type="application/activemessage"/> - <mime-type type="application/andrew-inset"> - <glob pattern="*.ez"/> - </mime-type> - <mime-type type="application/applefile"/> - <mime-type type="application/applixware"> - <glob pattern="*.aw"/> - </mime-type> - - <mime-type type="application/atom+xml"> - <root-XML localName="feed" namespaceURI="http://purl.org/atom/ns#"/> - <glob pattern="*.atom"/> - </mime-type> - - <mime-type type="application/atomcat+xml"> - <glob pattern="*.atomcat"/> - </mime-type> - <mime-type type="application/atomicmail"/> - <mime-type type="application/atomsvc+xml"> - <glob pattern="*.atomsvc"/> - </mime-type> - <mime-type type="application/auth-policy+xml"/> - - <mime-type type="application/bat"> - <glob pattern="*.bat" /> - </mime-type> - - <mime-type type="application/batch-smtp"/> - <mime-type type="application/beep+xml"/> - <mime-type type="application/cals-1840"/> - <mime-type type="application/ccxml+xml"> - <glob pattern="*.ccxml"/> - </mime-type> - <mime-type type="application/cea-2018+xml"/> - <mime-type type="application/cellml+xml"/> - <mime-type type="application/cnrp+xml"/> - <mime-type type="application/commonground"/> - <mime-type type="application/conference-info+xml"/> - <mime-type type="application/cpl+xml"/> - <mime-type type="application/csta+xml"/> - <mime-type type="application/cstadata+xml"/> - <mime-type type="application/cu-seeme"> - <glob pattern="*.cu"/> - </mime-type> - <mime-type type="application/cybercash"/> - <mime-type type="application/davmount+xml"> - <glob pattern="*.davmount"/> - </mime-type> - <mime-type type="application/dca-rft"/> - <mime-type type="application/dec-dx"/> - <mime-type type="application/dialog-info+xml"/> - <mime-type type="application/dicom"/> - <mime-type type="application/dns"/> - <mime-type type="application/dvcs"/> - <mime-type type="application/ecmascript"> - <glob pattern="*.ecma"/> - </mime-type> - <mime-type type="application/edi-consent"/> - <mime-type type="application/edi-x12"/> - <mime-type type="application/edifact"/> - <mime-type type="application/emma+xml"> - <glob pattern="*.emma"/> - </mime-type> - <mime-type type="application/epp+xml"/> - - <mime-type type="application/epub+zip"> - <acronym>EPUB</acronym> - <comment>Electronic Publication</comment> - <magic priority="50"> - <match value="PK\003\004" type="string" offset="0"> - <match value="mimetypeapplication/epub+zip" type="string" offset="30"/> - </match> - </magic> - <glob pattern="*.epub"/> - </mime-type> - - <mime-type type="application/eshop"/> - <mime-type type="application/example"/> - <mime-type type="application/fastinfoset"/> - <mime-type type="application/fastsoap"/> - <mime-type type="application/fits"/> - <mime-type type="application/font-tdpfr"> - <glob pattern="*.pfr"/> - </mime-type> - <mime-type type="application/h224"/> - <mime-type type="application/http"/> - <mime-type type="application/hyperstudio"> - <glob pattern="*.stk"/> - </mime-type> - <mime-type type="application/ibe-key-request+xml"/> - <mime-type type="application/ibe-pkg-reply+xml"/> - <mime-type type="application/ibe-pp-data"/> - <mime-type type="application/iges"/> - <mime-type type="application/im-iscomposing+xml"/> - <mime-type type="application/index"/> - <mime-type type="application/index.cmd"/> - <mime-type type="application/index.obj"/> - <mime-type type="application/index.response"/> - <mime-type type="application/index.vnd"/> - <mime-type type="application/iotp"/> - <mime-type type="application/ipp"/> - <mime-type type="application/isup"/> - - <mime-type type="application/java-archive"> - <sub-class-of type="application/zip"/> - <glob pattern="*.jar"/> - </mime-type> - - <mime-type type="application/java-serialized-object"> - <glob pattern="*.ser"/> - </mime-type> - - <mime-type type="application/javascript"> - <sub-class-of type="text/plain"/> - <glob pattern="*.js"/> - </mime-type> - - <mime-type type="application/json"> - <sub-class-of type="application/javascript"/> - <glob pattern="*.json"/> - </mime-type> - - <mime-type type="application/java-vm"> - <magic priority="40"> - <match value="0xcafebabe" type="string" offset="0" /> - </magic> - <glob pattern="*.class"/> - </mime-type> - - <mime-type type="application/kpml-request+xml"/> - <mime-type type="application/kpml-response+xml"/> - <mime-type type="application/lost+xml"> - <glob pattern="*.lostxml"/> - </mime-type> - - <mime-type type="application/mac-binhex40"> - <alias type="application/mac-binhex"/> - <alias type="application/binhex"/> - <magic priority="50"> - <match value="must\ be\ converted\ with\ BinHex" type="string" offset="11"/> - </magic> - <glob pattern="*.hqx"/> - </mime-type> - - <mime-type type="application/mac-compactpro"> - <glob pattern="*.cpt"/> - </mime-type> - - <mime-type type="application/macwriteii"/> - <mime-type type="application/marc"> - <glob pattern="*.mrc"/> - </mime-type> - <mime-type type="application/mathematica"> - <glob pattern="*.ma"/> - <glob pattern="*.nb"/> - <glob pattern="*.mb"/> - </mime-type> - <mime-type type="application/mathml+xml"> - <glob pattern="*.mathml"/> - </mime-type> - <mime-type type="application/mbms-associated-procedure-description+xml"/> - <mime-type type="application/mbms-deregister+xml"/> - <mime-type type="application/mbms-envelope+xml"/> - <mime-type type="application/mbms-msk+xml"/> - <mime-type type="application/mbms-msk-response+xml"/> - <mime-type type="application/mbms-protection-description+xml"/> - <mime-type type="application/mbms-reception-report+xml"/> - <mime-type type="application/mbms-register+xml"/> - <mime-type type="application/mbms-register-response+xml"/> - <mime-type type="application/mbms-user-service-description+xml"/> - <mime-type type="application/mbox"> - <sub-class-of type="text/plain"/> - <magic priority="50"> - <match value="From " type="string" offset="0"/> - </magic> - <glob pattern="*.mbox"/> - </mime-type> - <mime-type type="application/media_control+xml"/> - <mime-type type="application/mediaservercontrol+xml"> - <glob pattern="*.mscml"/> - </mime-type> - <mime-type type="application/mikey"/> - <mime-type type="application/moss-keys"/> - <mime-type type="application/moss-signature"/> - <mime-type type="application/mosskey-data"/> - <mime-type type="application/mosskey-request"/> - <mime-type type="application/mp4"> - <glob pattern="*.mp4s"/> - </mime-type> - <mime-type type="application/mpeg4-generic"/> - <mime-type type="application/mpeg4-iod"/> - <mime-type type="application/mpeg4-iod-xmt"/> - - <!-- http://www.iana.org/assignments/media-types/application/msword --> - <mime-type type="application/msword"> - <!-- Use org.apache.tika.detect.ContainerAwareDetector for more reliable detection of OLE2 documents --> - <alias type="application/vnd.ms-word"/> - <comment>Microsoft Word Document</comment> - <magic priority="50"> - <match value="Microsoft\ Word\ 6.0\ Document" type="string" offset="2080"/> - <match value="Documento\ Microsoft\ Word\ 6" type="string" offset="2080"/> - <match value="MSWordDoc" type="string" offset="2112"/> - <match value="0x31be0000" type="big32" offset="0"/> - <match value="PO^Q`" type="string" offset="0"/> - <match value="\376\067\0\043" type="string" offset="0"/> - <match value="\333\245-\0\0\0" type="string" offset="0"/> - <match value="\354\245\301" type="string" offset="512"/> - <match value="\320\317\021\340\241\261\032\341" type="string" offset="0"/> - <match value="\224\246\056" type="string" offset="0"/> - <match value="0xd0cf11e0a1b11ae1" type="string" offset="0:8"> - <match value="W\x00o\x00r\x00d\x00D\x00o\x00c\x00u\x00m\x00e\x00n\x00t" type="string" offset="1152:4096" /> - </match> - </magic> - <glob pattern="*.doc"/> - <glob pattern="*.dot"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/mxf"> - <glob pattern="*.mxf"/> - </mime-type> - <mime-type type="application/nasdata"/> - <mime-type type="application/news-checkgroups"/> - <mime-type type="application/news-groupinfo"/> - <mime-type type="application/news-transmission"/> - <mime-type type="application/nss"/> - <mime-type type="application/ocsp-request"/> - <mime-type type="application/ocsp-response"/> - - <mime-type type="application/octet-stream"> - <magic priority="50"> - <match value="#\ This\ is\ a\ shell\ archive" type="string" offset="10"/> - <match value="\037\036" type="string" offset="0"/> - <match value="017437" type="host16" offset="0"/> - <match value="0x1fff" type="host16" offset="0"/> - <match value="\377\037" type="string" offset="0"/> - <match value="0145405" type="host16" offset="0"/> - </magic> - <glob pattern="*.bin"/> - <glob pattern="*.dms"/> - <glob pattern="*.lha"/> - <glob pattern="*.lrf"/> - <glob pattern="*.lzh"/> - <glob pattern="*.so"/> - <glob pattern="*.iso"/> - <glob pattern="*.dmg"/> - <glob pattern="*.dist"/> - <glob pattern="*.distz"/> - <glob pattern="*.pkg"/> - <glob pattern="*.bpk"/> - <glob pattern="*.dump"/> - <glob pattern="*.elc"/> - <glob pattern="*.deploy"/> - </mime-type> - - <mime-type type="application/oda"> - <glob pattern="*.oda"/> - </mime-type> - <mime-type type="application/oebps-package+xml"> - <glob pattern="*.opf"/> - </mime-type> - - <mime-type type="application/ogg"> - <alias type="application/x-ogg"/> - <magic priority="50"> - <match value="OggS" type="string" offset="0"/> - </magic> - <glob pattern="*.ogx"/> - </mime-type> - - <mime-type type="application/onenote"> - <glob pattern="*.onetoc"/> - <glob pattern="*.onetoc2"/> - <glob pattern="*.onetmp"/> - <glob pattern="*.onepkg"/> - </mime-type> - <mime-type type="application/parityfec"/> - <mime-type type="application/patch-ops-error+xml"> - <glob pattern="*.xer"/> - </mime-type> - - <mime-type type="application/pdf"> - <alias type="application/x-pdf"/> - <acronym>PDF</acronym> - <comment>Portable Document Format</comment> - <magic priority="50"> - <match value="%PDF-" type="string" offset="0"/> - </magic> - <glob pattern="*.pdf"/> - </mime-type> - - <mime-type type="application/pgp-encrypted"> - <glob pattern="*.pgp"/> - </mime-type> - <mime-type type="application/pgp-keys"/> - <mime-type type="application/pgp-signature"> - <magic priority="50"> - <match value="-----BEGIN PGP SIGNATURE-----" type="string" offset="0"/> - </magic> - <glob pattern="*.asc"/> - <glob pattern="*.sig"/> - </mime-type> - <mime-type type="application/pics-rules"> - <glob pattern="*.prf"/> - </mime-type> - <mime-type type="application/pidf+xml"/> - <mime-type type="application/pidf-diff+xml"/> - <mime-type type="application/pkcs10"> - <glob pattern="*.p10"/> - </mime-type> - <mime-type type="application/pkcs7-mime"> - <glob pattern="*.p7m"/> - <glob pattern="*.p7c"/> - </mime-type> - <mime-type type="application/pkcs7-signature"> - <alias type="application/x-pkcs7-signature"/> - <glob pattern="*.p7s"/> - </mime-type> - <mime-type type="application/pkix-cert"> - <glob pattern="*.cer"/> - </mime-type> - <mime-type type="application/pkix-crl"> - <glob pattern="*.crl"/> - </mime-type> - <mime-type type="application/pkix-pkipath"> - <glob pattern="*.pkipath"/> - </mime-type> - <mime-type type="application/pkixcmp"> - <glob pattern="*.pki"/> - </mime-type> - <mime-type type="application/pls+xml"> - <glob pattern="*.pls"/> - </mime-type> - <mime-type type="application/poc-settings+xml"/> - - <mime-type type="application/postscript"> - <comment>PostScript</comment> - <magic priority="50"> - <match value="%!" type="string" offset="0" /> - <match value="\004%!" type="string" offset="0" /> - <!-- Windows format EPS --> - <match value="0xc5d0d3c6" type="string" offset="0"/> - </magic> - <glob pattern="*.ai"/> - <glob pattern="*.ps"/> - <glob pattern="*.eps"/> - <glob pattern="*.epsf"/> - <glob pattern="*.epsi"/> - </mime-type> - - <mime-type type="application/prs.alvestrand.titrax-sheet"/> - <mime-type type="application/prs.cww"> - <glob pattern="*.cww"/> - </mime-type> - <mime-type type="application/prs.nprend"/> - <mime-type type="application/prs.plucker"/> - <mime-type type="application/qsig"/> - - <mime-type type="application/rdf+xml"> - <root-XML localName="RDF"/> - <root-XML localName="RDF" namespaceURI="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/> - <sub-class-of type="application/xml"/> - <acronym>RDF/XML</acronym> - <comment>XML syntax for RDF graphs</comment> - <glob pattern="*.rdf"/> - <glob pattern="*.owl"/> - <glob pattern="^rdf$" isregex="true"/> - <glob pattern="^owl$" isregex="true"/> - </mime-type> - - <mime-type type="application/reginfo+xml"> - <glob pattern="*.rif"/> - </mime-type> - <mime-type type="application/relax-ng-compact-syntax"> - <sub-class-of type="text/plain"/> - <glob pattern="*.rnc"/> - </mime-type> - <mime-type type="application/remote-printing"/> - <mime-type type="application/resource-lists+xml"> - <glob pattern="*.rl"/> - </mime-type> - <mime-type type="application/resource-lists-diff+xml"> - <glob pattern="*.rld"/> - </mime-type> - <mime-type type="application/riscos"/> - <mime-type type="application/rlmi+xml"/> - <mime-type type="application/rls-services+xml"> - <glob pattern="*.rs"/> - </mime-type> - <mime-type type="application/rsd+xml"> - <glob pattern="*.rsd"/> - </mime-type> - - <mime-type type="application/rss+xml"> - <alias type="text/rss"/> - <root-XML localName="rss"/> - <root-XML namespaceURI="http://purl.org/rss/1.0/"/> - <glob pattern="*.rss"/> - </mime-type> - - <mime-type type="application/rtf"> - <alias type="text/rtf"/> - <magic priority="50"> - <match value="{\\rtf" type="string" offset="0"/> - </magic> - <glob pattern="*.rtf"/> - <sub-class-of type="text/plain"/> - </mime-type> - - <mime-type type="application/rtx"/> - <mime-type type="application/samlassertion+xml"/> - <mime-type type="application/samlmetadata+xml"/> - <mime-type type="application/sbml+xml"> - <glob pattern="*.sbml"/> - </mime-type> - <mime-type type="application/scvp-cv-request"> - <glob pattern="*.scq"/> - </mime-type> - <mime-type type="application/scvp-cv-response"> - <glob pattern="*.scs"/> - </mime-type> - <mime-type type="application/scvp-vp-request"> - <glob pattern="*.spq"/> - </mime-type> - <mime-type type="application/scvp-vp-response"> - <glob pattern="*.spp"/> - </mime-type> - <mime-type type="application/sdp"> - <glob pattern="*.sdp"/> - </mime-type> - <mime-type type="application/set-payment"/> - <mime-type type="application/set-payment-initiation"> - <glob pattern="*.setpay"/> - </mime-type> - <mime-type type="application/set-registration"/> - <mime-type type="application/set-registration-initiation"> - <glob pattern="*.setreg"/> - </mime-type> - <mime-type type="application/sgml"/> - <mime-type type="application/sgml-open-catalog"/> - <mime-type type="application/shf+xml"> - <glob pattern="*.shf"/> - </mime-type> - <mime-type type="application/sieve"/> - <mime-type type="application/simple-filter+xml"/> - <mime-type type="application/simple-message-summary"/> - <mime-type type="application/simplesymbolcontainer"/> - <mime-type type="application/slate"/> - <mime-type type="application/smil"> - <glob pattern="*.smi"/> - <glob pattern="*.smil" /> - </mime-type> - <mime-type type="application/smil+xml"> - <glob pattern="*.smi"/> - <glob pattern="*.smil"/> - </mime-type> - <mime-type type="application/soap+fastinfoset"/> - <mime-type type="application/soap+xml"/> - <mime-type type="application/sparql-query"> - <glob pattern="*.rq"/> - </mime-type> - <mime-type type="application/sparql-results+xml"> - <glob pattern="*.srx"/> - </mime-type> - <mime-type type="application/spirits-event+xml"/> - <mime-type type="application/srgs"> - <glob pattern="*.gram"/> - </mime-type> - <mime-type type="application/srgs+xml"> - <glob pattern="*.grxml"/> - </mime-type> - <mime-type type="application/ssml+xml"> - <glob pattern="*.ssml"/> - </mime-type> - <mime-type type="application/timestamp-query"/> - <mime-type type="application/timestamp-reply"/> - - <mime-type type="application/trix"> - <root-XML localName="feed" namespaceURI="http://www.w3.org/2004/03/trix/trix-1"/> - <glob pattern="*.trix"/> - <sub-class-of type="application/xml"/> - </mime-type> - - <mime-type type="application/tve-trigger"/> - <mime-type type="application/ulpfec"/> - <mime-type type="application/vemmi"/> - <mime-type type="application/vividence.scriptfile"/> - <mime-type type="application/vnd.3gpp.bsf+xml"/> - <mime-type type="application/vnd.3gpp.pic-bw-large"> - <glob pattern="*.plb"/> - </mime-type> - <mime-type type="application/vnd.3gpp.pic-bw-small"> - <glob pattern="*.psb"/> - </mime-type> - <mime-type type="application/vnd.3gpp.pic-bw-var"> - <glob pattern="*.pvb"/> - </mime-type> - <mime-type type="application/vnd.3gpp.sms"/> - <mime-type type="application/vnd.3gpp2.bcmcsinfo+xml"/> - <mime-type type="application/vnd.3gpp2.sms"/> - <mime-type type="application/vnd.3gpp2.tcap"> - <glob pattern="*.tcap"/> - </mime-type> - <mime-type type="application/vnd.3m.post-it-notes"> - <glob pattern="*.pwn"/> - </mime-type> - <mime-type type="application/vnd.accpac.simply.aso"> - <glob pattern="*.aso"/> - </mime-type> - <mime-type type="application/vnd.accpac.simply.imp"> - <glob pattern="*.imp"/> - </mime-type> - <mime-type type="application/vnd.acucobol"> - <glob pattern="*.acu"/> - </mime-type> - <mime-type type="application/vnd.acucorp"> - <glob pattern="*.atc"/> - <glob pattern="*.acutc"/> - </mime-type> - <mime-type type="application/vnd.adobe.air-application-installer-package+zip"> - <glob pattern="*.air"/> - <sub-class-of type="application/zip"/> - </mime-type> - <mime-type type="application/vnd.adobe.xdp+xml"> - <glob pattern="*.xdp"/> - </mime-type> - <mime-type type="application/vnd.adobe.xfdf"> - <glob pattern="*.xfdf"/> - </mime-type> - <mime-type type="application/vnd.aether.imp"/> - <mime-type type="application/vnd.airzip.filesecure.azf"> - <glob pattern="*.azf"/> - </mime-type> - <mime-type type="application/vnd.airzip.filesecure.azs"> - <glob pattern="*.azs"/> - </mime-type> - <mime-type type="application/vnd.amazon.ebook"> - <glob pattern="*.azw"/> - </mime-type> - <mime-type type="application/vnd.americandynamics.acc"> - <glob pattern="*.acc"/> - </mime-type> - <mime-type type="application/vnd.amiga.ami"> - <glob pattern="*.ami"/> - </mime-type> - <mime-type type="application/vnd.android.package-archive"> - <glob pattern="*.apk"/> - </mime-type> - <mime-type type="application/vnd.anser-web-certificate-issue-initiation"> - <glob pattern="*.cii"/> - </mime-type> - <mime-type type="application/vnd.anser-web-funds-transfer-initiation"> - <glob pattern="*.fti"/> - </mime-type> - <mime-type type="application/vnd.antix.game-component"> - <glob pattern="*.atx"/> - </mime-type> - <mime-type type="application/vnd.apple.installer+xml"> - <glob pattern="*.mpkg"/> - </mime-type> - <mime-type type="application/vnd.apple.iwork"> - <sub-class-of type="application/zip"/> - <magic priority="40"> - <match value="0x504b0304140000000000" type="string" offset="0"/> - </magic> - <glob pattern="*.key"/> - <glob pattern="*.pages"/> - <glob pattern="*.numbers"/> - </mime-type> - <mime-type type="application/vnd.apple.keynote"> - <root-XML localName="presentation" namespaceURI="http://developer.apple.com/namespaces/keynote2" /> - </mime-type> - <mime-type type="application/vnd.apple.pages"> - <root-XML localName="document" namespaceURI="http://developer.apple.com/namespaces/sl" /> - </mime-type> - <mime-type type="application/vnd.apple.numbers"> - <root-XML localName="document" namespaceURI="http://developer.apple.com/namespaces/ls" /> - </mime-type> - <mime-type type="application/vnd.arastra.swi"> - <glob pattern="*.swi"/> - </mime-type> - <mime-type type="application/vnd.audiograph"> - <glob pattern="*.aep"/> - </mime-type> - <mime-type type="application/vnd.autopackage"/> - <mime-type type="application/vnd.avistar+xml"/> - <mime-type type="application/vnd.blueice.multipass"> - <glob pattern="*.mpm"/> - </mime-type> - <mime-type type="application/vnd.bluetooth.ep.oob"/> - <mime-type type="application/vnd.bmi"> - <glob pattern="*.bmi"/> - </mime-type> - <mime-type type="application/vnd.businessobjects"> - <glob pattern="*.rep"/> - </mime-type> - <mime-type type="application/vnd.cab-jscript"/> - <mime-type type="application/vnd.canon-cpdl"/> - <mime-type type="application/vnd.canon-lips"/> - <mime-type type="application/vnd.cendio.thinlinc.clientconf"/> - <mime-type type="application/vnd.chemdraw+xml"> - <glob pattern="*.cdxml"/> - </mime-type> - <mime-type type="application/vnd.chipnuts.karaoke-mmd"> - <glob pattern="*.mmd"/> - </mime-type> - <mime-type type="application/vnd.cinderella"> - <glob pattern="*.cdy"/> - </mime-type> - <mime-type type="application/vnd.cirpack.isdn-ext"/> - <mime-type type="application/vnd.claymore"> - <glob pattern="*.cla"/> - </mime-type> - <mime-type type="application/vnd.clonk.c4group"> - <glob pattern="*.c4g"/> - <glob pattern="*.c4d"/> - <glob pattern="*.c4f"/> - <glob pattern="*.c4p"/> - <glob pattern="*.c4u"/> - </mime-type> - <mime-type type="application/vnd.commerce-battelle"/> - <mime-type type="application/vnd.commonspace"> - <glob pattern="*.csp"/> - </mime-type> - <mime-type type="application/vnd.contact.cmsg"> - <glob pattern="*.cdbcmsg"/> - </mime-type> - <mime-type type="application/vnd.corel-draw"> - <magic priority="50"> - <match value="CDRA" type="string" offset="8" /> - </magic> - <glob pattern="*.cdr"/> - </mime-type> - <mime-type type="application/vnd.cosmocaller"> - <glob pattern="*.cmc"/> - </mime-type> - <mime-type type="application/vnd.crick.clicker"> - <glob pattern="*.clkx"/> - </mime-type> - <mime-type type="application/vnd.crick.clicker.keyboard"> - <glob pattern="*.clkk"/> - </mime-type> - <mime-type type="application/vnd.crick.clicker.palette"> - <glob pattern="*.clkp"/> - </mime-type> - <mime-type type="application/vnd.crick.clicker.template"> - <glob pattern="*.clkt"/> - </mime-type> - <mime-type type="application/vnd.crick.clicker.wordbank"> - <glob pattern="*.clkw"/> - </mime-type> - <mime-type type="application/vnd.criticaltools.wbs+xml"> - <glob pattern="*.wbs"/> - </mime-type> - <mime-type type="application/vnd.ctc-posml"> - <glob pattern="*.pml"/> - </mime-type> - <mime-type type="application/vnd.ctct.ws+xml"/> - <mime-type type="application/vnd.cups-pdf"/> - <mime-type type="application/vnd.cups-postscript"/> - <mime-type type="application/vnd.cups-ppd"> - <glob pattern="*.ppd"/> - </mime-type> - <mime-type type="application/vnd.cups-raster"/> - <mime-type type="application/vnd.cups-raw"/> - <mime-type type="application/vnd.curl.car"> - <glob pattern="*.car"/> - </mime-type> - <mime-type type="application/vnd.curl.pcurl"> - <glob pattern="*.pcurl"/> - </mime-type> - <mime-type type="application/vnd.cybank"/> - <mime-type type="application/vnd.data-vision.rdz"> - <glob pattern="*.rdz"/> - </mime-type> - <mime-type type="application/vnd.denovo.fcselayout-link"> - <glob pattern="*.fe_launch"/> - </mime-type> - <mime-type type="application/vnd.dir-bi.plate-dl-nosuffix"/> - <mime-type type="application/vnd.dna"> - <glob pattern="*.dna"/> - </mime-type> - <mime-type type="application/vnd.dolby.mlp"> - <glob pattern="*.mlp"/> - </mime-type> - <mime-type type="application/vnd.dolby.mobile.1"/> - <mime-type type="application/vnd.dolby.mobile.2"/> - <mime-type type="application/vnd.dpgraph"> - <glob pattern="*.dpg"/> - </mime-type> - <mime-type type="application/vnd.dreamfactory"> - <glob pattern="*.dfac"/> - </mime-type> - <mime-type type="application/vnd.dvb.esgcontainer"/> - <mime-type type="application/vnd.dvb.ipdcdftnotifaccess"/> - <mime-type type="application/vnd.dvb.ipdcesgaccess"/> - <mime-type type="application/vnd.dvb.ipdcroaming"/> - <mime-type type="application/vnd.dvb.iptv.alfec-base"/> - <mime-type type="application/vnd.dvb.iptv.alfec-enhancement"/> - <mime-type type="application/vnd.dvb.notif-aggregate-root+xml"/> - <mime-type type="application/vnd.dvb.notif-container+xml"/> - <mime-type type="application/vnd.dvb.notif-generic+xml"/> - <mime-type type="application/vnd.dvb.notif-ia-msglist+xml"/> - <mime-type type="application/vnd.dvb.notif-ia-registration-request+xml"/> - <mime-type type="application/vnd.dvb.notif-ia-registration-response+xml"/> - <mime-type type="application/vnd.dvb.notif-init+xml"/> - <mime-type type="application/vnd.dxr"/> - <mime-type type="application/vnd.dynageo"> - <glob pattern="*.geo"/> - </mime-type> - <mime-type type="application/vnd.ecdis-update"/> - <mime-type type="application/vnd.ecowin.chart"> - <glob pattern="*.mag"/> - </mime-type> - <mime-type type="application/vnd.ecowin.filerequest"/> - <mime-type type="application/vnd.ecowin.fileupdate"/> - <mime-type type="application/vnd.ecowin.series"/> - <mime-type type="application/vnd.ecowin.seriesrequest"/> - <mime-type type="application/vnd.ecowin.seriesupdate"/> - <mime-type type="application/vnd.emclient.accessrequest+xml"/> - <mime-type type="application/vnd.enliven"> - <glob pattern="*.nml"/> - </mime-type> - <mime-type type="application/vnd.epson.esf"> - <glob pattern="*.esf"/> - </mime-type> - <mime-type type="application/vnd.epson.msf"> - <glob pattern="*.msf"/> - </mime-type> - <mime-type type="application/vnd.epson.quickanime"> - <glob pattern="*.qam"/> - </mime-type> - <mime-type type="application/vnd.epson.salt"> - <glob pattern="*.slt"/> - </mime-type> - <mime-type type="application/vnd.epson.ssf"> - <glob pattern="*.ssf"/> - </mime-type> - <mime-type type="application/vnd.ericsson.quickcall"/> - <mime-type type="application/vnd.eszigno3+xml"> - <glob pattern="*.es3"/> - <glob pattern="*.et3"/> - </mime-type> - <mime-type type="application/vnd.etsi.aoc+xml"/> - <mime-type type="application/vnd.etsi.cug+xml"/> - <mime-type type="application/vnd.etsi.iptvcommand+xml"/> - <mime-type type="application/vnd.etsi.iptvdiscovery+xml"/> - <mime-type type="application/vnd.etsi.iptvprofile+xml"/> - <mime-type type="application/vnd.etsi.iptvsad-bc+xml"/> - <mime-type type="application/vnd.etsi.iptvsad-cod+xml"/> - <mime-type type="application/vnd.etsi.iptvsad-npvr+xml"/> - <mime-type type="application/vnd.etsi.iptvueprofile+xml"/> - <mime-type type="application/vnd.etsi.mcid+xml"/> - <mime-type type="application/vnd.etsi.sci+xml"/> - <mime-type type="application/vnd.etsi.simservs+xml"/> - <mime-type type="application/vnd.eudora.data"/> - <mime-type type="application/vnd.ezpix-album"> - <glob pattern="*.ez2"/> - </mime-type> - <mime-type type="application/vnd.ezpix-package"> - <glob pattern="*.ez3"/> - </mime-type> - <mime-type type="application/vnd.f-secure.mobile"/> - <mime-type type="application/vnd.fdf"> - <glob pattern="*.fdf"/> - </mime-type> - <mime-type type="application/vnd.fdsn.mseed"> - <glob pattern="*.mseed"/> - </mime-type> - <mime-type type="application/vnd.fdsn.seed"> - <glob pattern="*.seed"/> - <glob pattern="*.dataless"/> - </mime-type> - <mime-type type="application/vnd.ffsns"/> - <mime-type type="application/vnd.fints"/> - <mime-type type="application/vnd.flographit"> - <glob pattern="*.gph"/> - </mime-type> - <mime-type type="application/vnd.fluxtime.clip"> - <glob pattern="*.ftc"/> - </mime-type> - <mime-type type="application/vnd.font-fontforge-sfd"/> - <mime-type type="application/vnd.framemaker"> - <magic priority="50"> - <match value="<MakerFile" type="string" offset="0" /> - <match value="<MIFFile" type="string" offset="0" /> - <match value="<MakerDictionary" type="string" offset="0" /> - <match value="<MakerScreenFont" type="string" offset="0" /> - <match value="<MML" type="string" offset="0" /> - <match value="<BookFile" type="string" offset="0" /> - <match value="<Maker" type="string" offset="0" /> - </magic> - <glob pattern="*.fm"/> - <glob pattern="*.frame"/> - <glob pattern="*.maker"/> - <glob pattern="*.book"/> - <!-- in Aperture there were also the following --> - <!-- <glob pattern="*.mif"/> --> - <!-- <glob pattern="*.mf"/> --> - </mime-type> - <mime-type type="application/vnd.frogans.fnc"> - <glob pattern="*.fnc"/> - </mime-type> - <mime-type type="application/vnd.frogans.ltf"> - <glob pattern="*.ltf"/> - </mime-type> - <mime-type type="application/vnd.fsc.weblaunch"> - <glob pattern="*.fsc"/> - </mime-type> - <mime-type type="application/vnd.fujitsu.oasys"> - <glob pattern="*.oas"/> - </mime-type> - <mime-type type="application/vnd.fujitsu.oasys2"> - <glob pattern="*.oa2"/> - </mime-type> - <mime-type type="application/vnd.fujitsu.oasys3"> - <glob pattern="*.oa3"/> - </mime-type> - <mime-type type="application/vnd.fujitsu.oasysgp"> - <glob pattern="*.fg5"/> - </mime-type> - <mime-type type="application/vnd.fujitsu.oasysprs"> - <glob pattern="*.bh2"/> - </mime-type> - <mime-type type="application/vnd.fujixerox.art-ex"/> - <mime-type type="application/vnd.fujixerox.art4"/> - <mime-type type="application/vnd.fujixerox.hbpl"/> - <mime-type type="application/vnd.fujixerox.ddd"> - <glob pattern="*.ddd"/> - </mime-type> - <mime-type type="application/vnd.fujixerox.docuworks"> - <glob pattern="*.xdw"/> - </mime-type> - <mime-type type="application/vnd.fujixerox.docuworks.binder"> - <glob pattern="*.xbd"/> - </mime-type> - <mime-type type="application/vnd.fut-misnet"/> - <mime-type type="application/vnd.fuzzysheet"> - <glob pattern="*.fzs"/> - </mime-type> - <mime-type type="application/vnd.genomatix.tuxedo"> - <glob pattern="*.txd"/> - </mime-type> - <mime-type type="application/vnd.geogebra.file"> - <glob pattern="*.ggb"/> - </mime-type> - <mime-type type="application/vnd.geogebra.tool"> - <glob pattern="*.ggt"/> - </mime-type> - <mime-type type="application/vnd.geometry-explorer"> - <glob pattern="*.gex"/> - <glob pattern="*.gre"/> - </mime-type> - <mime-type type="application/vnd.gmx"> - <glob pattern="*.gmx"/> - </mime-type> - <mime-type type="application/vnd.google-earth.kml+xml"> - <glob pattern="*.kml"/> - </mime-type> - <mime-type type="application/vnd.google-earth.kmz"> - <glob pattern="*.kmz"/> - </mime-type> - <mime-type type="application/vnd.grafeq"> - <glob pattern="*.gqf"/> - <glob pattern="*.gqs"/> - </mime-type> - <mime-type type="application/vnd.gridmp"/> - <mime-type type="application/vnd.groove-account"> - <glob pattern="*.gac"/> - </mime-type> - <mime-type type="application/vnd.groove-help"> - <glob pattern="*.ghf"/> - </mime-type> - <mime-type type="application/vnd.groove-identity-message"> - <glob pattern="*.gim"/> - </mime-type> - <mime-type type="application/vnd.groove-injector"> - <glob pattern="*.grv"/> - </mime-type> - <mime-type type="application/vnd.groove-tool-message"> - <glob pattern="*.gtm"/> - </mime-type> - <mime-type type="application/vnd.groove-tool-template"> - <glob pattern="*.tpl"/> - </mime-type> - <mime-type type="application/vnd.groove-vcard"> - <glob pattern="*.vcg"/> - </mime-type> - <mime-type type="application/vnd.handheld-entertainment+xml"> - <glob pattern="*.zmm"/> - </mime-type> - <mime-type type="application/vnd.hbci"> - <glob pattern="*.hbci"/> - </mime-type> - <mime-type type="application/vnd.hcl-bireports"/> - <mime-type type="application/vnd.hhe.lesson-player"> - <glob pattern="*.les"/> - </mime-type> - <mime-type type="application/vnd.hp-hpgl"> - <glob pattern="*.hpgl"/> - </mime-type> - <mime-type type="application/vnd.hp-hpid"> - <glob pattern="*.hpid"/> - </mime-type> - <mime-type type="application/vnd.hp-hps"> - <glob pattern="*.hps"/> - </mime-type> - <mime-type type="application/vnd.hp-jlyt"> - <glob pattern="*.jlt"/> - </mime-type> - <mime-type type="application/vnd.hp-pcl"> - <glob pattern="*.pcl"/> - </mime-type> - <mime-type type="application/vnd.hp-pclxl"> - <glob pattern="*.pclxl"/> - </mime-type> - <mime-type type="application/vnd.httphone"/> - <mime-type type="application/vnd.hydrostatix.sof-data"> - <glob pattern="*.sfd-hdstx"/> - </mime-type> - <mime-type type="application/vnd.hzn-3d-crossword"> - <glob pattern="*.x3d"/> - </mime-type> - <mime-type type="application/vnd.ibm.afplinedata"/> - <mime-type type="application/vnd.ibm.electronic-media"/> - <mime-type type="application/vnd.ibm.minipay"> - <glob pattern="*.mpy"/> - </mime-type> - <mime-type type="application/vnd.ibm.modcap"> - <glob pattern="*.afp"/> - <glob pattern="*.listafp"/> - <glob pattern="*.list3820"/> - </mime-type> - <mime-type type="application/vnd.ibm.rights-management"> - <glob pattern="*.irm"/> - </mime-type> - <mime-type type="application/vnd.ibm.secure-container"> - <glob pattern="*.sc"/> - </mime-type> - <mime-type type="application/vnd.iccprofile"> - <glob pattern="*.icc"/> - <glob pattern="*.icm"/> - </mime-type> - <mime-type type="application/vnd.igloader"> - <glob pattern="*.igl"/> - </mime-type> - <mime-type type="application/vnd.immervision-ivp"> - <glob pattern="*.ivp"/> - </mime-type> - <mime-type type="application/vnd.immervision-ivu"> - <glob pattern="*.ivu"/> - </mime-type> - <mime-type type="application/vnd.informedcontrol.rms+xml"/> - <mime-type type="application/vnd.informix-visionary"/> - <mime-type type="application/vnd.intercon.formnet"> - <glob pattern="*.xpw"/> - <glob pattern="*.xpx"/> - </mime-type> - <mime-type type="application/vnd.intertrust.digibox"/> - <mime-type type="application/vnd.intertrust.nncp"/> - <mime-type type="application/vnd.intu.qbo"> - <glob pattern="*.qbo"/> - </mime-type> - <mime-type type="application/vnd.intu.qfx"> - <glob pattern="*.qfx"/> - </mime-type> - <mime-type type="application/vnd.iptc.g2.conceptitem+xml"/> - <mime-type type="application/vnd.iptc.g2.knowledgeitem+xml"/> - <mime-type type="application/vnd.iptc.g2.newsitem+xml"/> - <mime-type type="application/vnd.iptc.g2.packageitem+xml"/> - <mime-type type="application/vnd.ipunplugged.rcprofile"> - <glob pattern="*.rcprofile"/> - </mime-type> - <mime-type type="application/vnd.irepository.package+xml"> - <glob pattern="*.irp"/> - </mime-type> - <mime-type type="application/vnd.is-xpr"> - <glob pattern="*.xpr"/> - </mime-type> - <mime-type type="application/vnd.jam"> - <glob pattern="*.jam"/> - </mime-type> - <mime-type type="application/vnd.japannet-directory-service"/> - <mime-type type="application/vnd.japannet-jpnstore-wakeup"/> - <mime-type type="application/vnd.japannet-payment-wakeup"/> - <mime-type type="application/vnd.japannet-registration"/> - <mime-type type="application/vnd.japannet-registration-wakeup"/> - <mime-type type="application/vnd.japannet-setstore-wakeup"/> - <mime-type type="application/vnd.japannet-verification"/> - <mime-type type="application/vnd.japannet-verification-wakeup"/> - <mime-type type="application/vnd.jcp.javame.midlet-rms"> - <glob pattern="*.rms"/> - </mime-type> - <mime-type type="application/vnd.jisp"> - <glob pattern="*.jisp"/> - </mime-type> - <mime-type type="application/vnd.joost.joda-archive"> - <glob pattern="*.joda"/> - </mime-type> - <mime-type type="application/vnd.kahootz"> - <glob pattern="*.ktz"/> - <glob pattern="*.ktr"/> - </mime-type> - <mime-type type="application/vnd.kde.karbon"> - <glob pattern="*.karbon"/> - </mime-type> - <mime-type type="application/vnd.kde.kchart"> - <glob pattern="*.chrt"/> - </mime-type> - <mime-type type="application/vnd.kde.kformula"> - <glob pattern="*.kfo"/> - </mime-type> - <mime-type type="application/vnd.kde.kivio"> - <glob pattern="*.flw"/> - </mime-type> - <mime-type type="application/vnd.kde.kontour"> - <glob pattern="*.kon"/> - </mime-type> - <mime-type type="application/vnd.kde.kpresenter"> - <glob pattern="*.kpr"/> - <glob pattern="*.kpt"/> - </mime-type> - <mime-type type="application/vnd.kde.kspread"> - <glob pattern="*.ksp"/> - </mime-type> - <mime-type type="application/vnd.kde.kword"> - <glob pattern="*.kwd"/> - <glob pattern="*.kwt"/> - </mime-type> - <mime-type type="application/vnd.kenameaapp"> - <glob pattern="*.htke"/> - </mime-type> - <mime-type type="application/vnd.kidspiration"> - <glob pattern="*.kia"/> - </mime-type> - <mime-type type="application/vnd.kinar"> - <glob pattern="*.kne"/> - <glob pattern="*.knp"/> - </mime-type> - <mime-type type="application/vnd.koan"> - <alias type="application/x-koan"/> - <_comment>SSEYO Koan File</_comment> - <glob pattern="*.skp"/> - <glob pattern="*.skd"/> - <glob pattern="*.skt"/> - <glob pattern="*.skm"/> - </mime-type> - <mime-type type="application/vnd.kodak-descriptor"> - <glob pattern="*.sse"/> - </mime-type> - <mime-type type="application/vnd.liberty-request+xml"/> - <mime-type type="application/vnd.llamagraphics.life-balance.desktop"> - <glob pattern="*.lbd"/> - </mime-type> - <mime-type type="application/vnd.llamagraphics.life-balance.exchange+xml"> - <glob pattern="*.lbe"/> - </mime-type> - <mime-type type="application/vnd.lotus-1-2-3"> - <glob pattern="*.123"/> - </mime-type> - <mime-type type="application/vnd.lotus-approach"> - <glob pattern="*.apr"/> - </mime-type> - <mime-type type="application/vnd.lotus-freelance"> - <glob pattern="*.pre"/> - </mime-type> - <mime-type type="application/vnd.lotus-notes"> - <magic priority="50"> - <match value="0x1a0000040000" type="string" offset="0"/> - </magic> - <glob pattern="*.nsf"/> - </mime-type> - <mime-type type="application/vnd.lotus-organizer"> - <glob pattern="*.org"/> - </mime-type> - <mime-type type="application/vnd.lotus-screencam"> - <glob pattern="*.scm"/> - </mime-type> - - <mime-type type="application/vnd.lotus-wordpro"> - <magic priority="50"> - <match value="WordPro\0" type="string" offset="0" /> - <match value="WordPro\r\373" type="string" offset="0" /> - </magic> - <glob pattern="*.lwp"/> - </mime-type> - - <mime-type type="application/vnd.macports.portpkg"> - <glob pattern="*.portpkg"/> - </mime-type> - <mime-type type="application/vnd.marlin.drm.actiontoken+xml"/> - <mime-type type="application/vnd.marlin.drm.conftoken+xml"/> - <mime-type type="application/vnd.marlin.drm.license+xml"/> - <mime-type type="application/vnd.marlin.drm.mdcf"/> - <mime-type type="application/vnd.mcd"> - <glob pattern="*.mcd"/> - </mime-type> - <mime-type type="application/vnd.medcalcdata"> - <glob pattern="*.mc1"/> - </mime-type> - <mime-type type="application/vnd.mediastation.cdkey"> - <glob pattern="*.cdkey"/> - </mime-type> - <mime-type type="application/vnd.meridian-slingshot"/> - <mime-type type="application/vnd.mfer"> - <glob pattern="*.mwf"/> - </mime-type> - <mime-type type="application/vnd.mfmp"> - <glob pattern="*.mfm"/> - </mime-type> - <mime-type type="application/vnd.micrografx.flo"> - <glob pattern="*.flo"/> - </mime-type> - <mime-type type="application/vnd.micrografx.igx"> - <glob pattern="*.igx"/> - </mime-type> - - <mime-type type="application/vnd.mif"> - <comment>FrameMaker MIF document</comment> - <alias type="application/x-mif"/> - <alias type="application/x-frame"/> - <magic priority="50"> - <match value="\<MakerFile" type="string" offset="0" /> - <match value="\<MIFFile" type="string" offset="0" /> - <match value="\<MakerDictionary" type="string" offset="0" /> - <match value="\<MakerScreenFont" type="string" offset="0" /> - <match value="\<MML" type="string" offset="0" /> - <match value="\<Book" type="string" offset="0" /> - <match value="\<Maker" type="string" offset="0" /> - </magic> - <glob pattern="*.mif"/> - </mime-type> - - <mime-type type="application/vnd.minisoft-hp3000-save"/> - <mime-type type="application/vnd.mitsubishi.misty-guard.trustweb"/> - <mime-type type="application/vnd.mobius.daf"> - <glob pattern="*.daf"/> - </mime-type> - <mime-type type="application/vnd.mobius.dis"> - <glob pattern="*.dis"/> - </mime-type> - <mime-type type="application/vnd.mobius.mbk"> - <glob pattern="*.mbk"/> - </mime-type> - <mime-type type="application/vnd.mobius.mqy"> - <glob pattern="*.mqy"/> - </mime-type> - <mime-type type="application/vnd.mobius.msl"> - <glob pattern="*.msl"/> - </mime-type> - <mime-type type="application/vnd.mobius.plc"> - <glob pattern="*.plc"/> - </mime-type> - <mime-type type="application/vnd.mobius.txf"> - <glob pattern="*.txf"/> - </mime-type> - <mime-type type="application/vnd.mophun.application"> - <glob pattern="*.mpn"/> - </mime-type> - <mime-type type="application/vnd.mophun.certificate"> - <glob pattern="*.mpc"/> - </mime-type> - <mime-type type="application/vnd.motorola.flexsuite"/> - <mime-type type="application/vnd.motorola.flexsuite.adsi"/> - <mime-type type="application/vnd.motorola.flexsuite.fis"/> - <mime-type type="application/vnd.motorola.flexsuite.gotap"/> - <mime-type type="application/vnd.motorola.flexsuite.kmr"/> - <mime-type type="application/vnd.motorola.flexsuite.ttc"/> - <mime-type type="application/vnd.motorola.flexsuite.wem"/> - <mime-type type="application/vnd.motorola.iprm"/> - <mime-type type="application/vnd.mozilla.xul+xml"> - <glob pattern="*.xul"/> - </mime-type> - <mime-type type="application/vnd.ms-artgalry"> - <glob pattern="*.cil"/> - </mime-type> -<!-- this one is better served by the vide/x-ms-asf" - <mime-type type="application/vnd.ms-asf"/> - --> - <mime-type type="application/vnd.ms-cab-compressed"> - <magic priority="50"> - <match value="MSCF" type="string" offset="0" /> - </magic> - <glob pattern="*.cab"/> - </mime-type> - - <!-- http://www.iana.org/assignments/media-types/application/vnd.ms-excel --> - <mime-type type="application/vnd.ms-excel"> - <!-- Use org.apache.tika.detect.ContainerAwareDetector for more reliable detection of OLE2 documents --> - <alias type="application/msexcel" /> - <comment>Microsoft Excel Spreadsheet</comment> - <magic priority="50"> - <match value="Microsoft\ Excel\ 5.0\ Worksheet" type="string" offset="2080"/> - <match value="Foglio\ di\ lavoro\ Microsoft\ Exce" type="string" offset="2080"/> - <match value="Biff5" type="string" offset="2114"/> - <match value="Biff5" type="string" offset="2121"/> - <match value="\x09\x04\x06\x00\x00\x00\x10\x00" type="string" offset="0"/> - <match value="0xd0cf11e0a1b11ae1" type="string" offset="0:8"> - <match value="W\x00o\x00r\x00k\x00b\x00o\x00o\x00k" type="string" offset="1152:4096" /> - </match> - </magic> - <glob pattern="*.xls"/> - <glob pattern="*.xlm"/> - <glob pattern="*.xla"/> - <glob pattern="*.xlc"/> - <glob pattern="*.xlt"/> - <glob pattern="*.xlw"/> - <glob pattern="*.xll"/> - <glob pattern="*.xld"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-excel.addin.macroenabled.12"> - <comment>Office Open XML Workbook Add-in (macro-enabled)</comment> - <glob pattern="*.xlam"/> - <sub-class-of type="application/x-tika-ooxml"/> - </mime-type> - - <mime-type type="application/vnd.ms-excel.sheet.macroenabled.12"> - <comment>Office Open XML Workbook (macro-enabled)</comment> - <glob pattern="*.xlsm"/> - <sub-class-of type="application/x-tika-ooxml"/> - </mime-type> - - <mime-type type="application/vnd.ms-excel.sheet.binary.macroenabled.12"> - <comment>Microsoft Excel 2007 Binary Spreadsheet</comment> - <glob pattern="*.xlsb"/> - <sub-class-of type="application/vnd.ms-excel"/> - </mime-type> - - <mime-type type="application/vnd.ms-excel.template.macroenabled.12"> - <glob pattern="*.xltm"/> - <sub-class-of type="application/x-tika-ooxml"/> - </mime-type> - - <mime-type type="application/vnd.ms-fontobject"> - <glob pattern="*.eot"/> - </mime-type> - <mime-type type="application/vnd.ms-htmlhelp"> - <alias type="application/x-chm" /> - <magic priority="50"> - <match value="ITSF" type="string" offset="0"/> - </magic> - <glob pattern="*.chm"/> - </mime-type> - <mime-type type="application/vnd.ms-ims"> - <glob pattern="*.ims"/> - </mime-type> - <mime-type type="application/vnd.ms-lrm"> - <glob pattern="*.lrm"/> - </mime-type> - - <mime-type type="application/vnd.ms-outlook"> - <comment>Microsoft Outlook Message</comment> - <glob pattern="*.msg" /> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-outlookexpress"> - <comment>Microsoft Outlook Express 6 DBX File</comment> - <magic priority="50"> - <match value="0x4a4d463603001000" type="string" offset="0"/> - <match value="0xcfad12fec5fd746f66e3d1119a4e00c0" type="string" offset="0"/> - </magic> - <glob pattern="*.dbx" /> - </mime-type> - - <mime-type type="application/vnd.ms-outlook-pst"> - <comment>Microsoft Outlook PST File</comment> - <magic priority="50"> - <match value="0x2142444e" type="string" offset="0"/> - </magic> - <glob pattern="*.pst" /> - </mime-type> - - <mime-type type="application/vnd.ms-pki.seccat"> - <glob pattern="*.cat"/> - </mime-type> - <mime-type type="application/vnd.ms-pki.stl"> - <glob pattern="*.stl"/> - </mime-type> - <mime-type type="application/vnd.ms-playready.initiator+xml"/> - - <!-- http://www.iana.org/assignments/media-types/application/vnd.ms-powerpoint --> - <mime-type type="application/vnd.ms-powerpoint"> - <!-- Use org.apache.tika.detect.ContainerAwareDetector for more reliable detection of OLE2 documents --> - <alias type="application/mspowerpoint"/> - <comment>Microsoft Powerpoint Presentation</comment> - <magic priority="50"> - <match value="0xd0cf11e0a1b11ae1" type="string" offset="0:8"> - <match value="P\x00o\x00w\x00e\x00r\x00P\x00o\x00i\x00n\x00t\x00 D\x00o\x00c\x00u\x00m\x00e\x00n\x00t" type="string" offset="1152:4096" /> - </match> - </magic> - <glob pattern="*.ppz"/> - <glob pattern="*.ppt"/> - <glob pattern="*.pps"/> - <glob pattern="*.pot"/> - <glob pattern="*.ppa"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-powerpoint.addin.macroenabled.12"> - <comment>Office Open XML Presentation Add-in (macro-enabled)</comment> - <glob pattern="*.ppam"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-powerpoint.presentation.macroenabled.12"> - <comment>Office Open XML Presentation (macro-enabled)</comment> - <glob pattern="*.pptm"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-powerpoint.slide.macroenabled.12"> - <glob pattern="*.sldm"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-powerpoint.slideshow.macroenabled.12"> - <comment>Office Open XML Presentation Slideshow (macro-enabled)</comment> - <glob pattern="*.ppsm"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-powerpoint.template.macroenabled.12"> - <glob pattern="*.potm"/> - <sub-class-of type="application/x-tika-msoffice"/> - </mime-type> - - <mime-type type="application/vnd.ms-project"> - <glob pattern="*.mpp"/> - <glob pattern="*.mpt"/> - </mime-type> - - <mime-type type="application/vnd.ms-tnef"> - <magic priority="50"> - <match value="0x223e9f78" type="little16" offset="0" /> - </magic> - </mime-type> - - <mime-type type="application/vnd.ms-wmdrm.lic-chlg-req"/> - <mime-type type="application/vnd.ms-wmdrm.lic-resp"/> - <mime-type type="application/vnd.ms-wmdrm.meter-chlg-req"/> - <mime-type type="application/vnd.ms-wmdrm.meter-resp"/> - - <mime-type type="application/vnd.ms-word.document.macroenabled.12"> - <comment>Office Open XML Document (macro-enabled)</comment> - <glob pattern="*.docm"/> - <sub-class-of type="application/x-tika-ooxml"/> - </mime-type> - - <mime-type type="application/vnd.ms-word.template.macroenabled.12"> - <comment>Office Open XML Document Template (macro-enabled)</comment> - <glob pattern="*.dotm"/> - <sub-class-of type="application/x-tika-ooxml"/> - </mime-type> - - <mime-type type="application/vnd.ms-works"> - <glob pattern="*.wps"/> - <glob pattern="*.wks"/> - <glob pattern="*.wcm"/> - <glob pattern="*.wdb"/> - </mime-type> - - <mime-type type="application/vnd.ms-wpl"> - <magic priority="50"> - <match value="<?wpl" type="string" offset="0" /> - </magic> - <glob pattern="*.wpl"/> - </mime-type> - - <mime-type type="application/vnd.ms-xpsdocument"> - <glob pattern="*.xps"/> - <sub-class-of type="application/zip" /> - </mime-type> - <mime-type type="application/vnd.mseq"> - <glob pattern="*.mseq"/> - </mime-type> - <mime-type type="application/vnd.msign"/> - <mime-type type="application/vnd.multiad.creator"/> - <mime-type type="application/vnd.multiad.creator.cif"/> - <mime-type type="application/vnd.music-niff"/> - <mime-type type="application/vnd.musician"> - <glob pattern="*.mus"/> - </mime-type> - <mime-type type="application/vnd.muvee.style"> - <glob pattern="*.msty"/> - </mime-type> - <mime-type type="application/vnd.ncd.control"/> - <mime-type type="application/vnd.ncd.reference"/> - <mime-type type="application/vnd.nervana"/> - <mime-type type="application/vnd.netfpx"/> - <mime-type type="application/vnd.neurolanguage.nlu"> - <glob pattern="*.nlu"/> - </mime-type> - <mime-type type="application/vnd.noblenet-directory"> - <glob pattern="*.nnd"/> - </mime-type> - <mime-type type="application/vnd.noblenet-sealer"> - <glob pattern="*.nns"/> - </mime-type> - <mime-type type="application/vnd.noblenet-web"> - <glob pattern="*.nnw"/> - </mime-type> - <mime-type type="application/vnd.nokia.catalogs"/> - <mime-type type="application/vnd.nokia.conml+wbxml"/> - <mime-type type="application/vnd.nokia.conml+xml"/> - <mime-type type="application/vnd.nokia.isds-radio-presets"/> - <mime-type type="application/vnd.nokia.iptv.config+xml"/> - <mime-type type="application/vnd.nokia.landmark+wbxml"/> - <mime-type type="application/vnd.nokia.landmark+xml"/> - <mime-type type="application/vnd.nokia.landmarkcollection+xml"/> - <mime-type type="application/vnd.nokia.n-gage.ac+xml"/> - <mime-type type="application/vnd.nokia.n-gage.data"> - <glob pattern="*.ngdat"/> - </mime-type> - <mime-type type="application/vnd.nokia.n-gage.symbian.install"> - <glob pattern="*.n-gage"/> - </mime-type> - <mime-type type="application/vnd.nokia.ncd"/> - <mime-type type="application/vnd.nokia.pcd+wbxml"/> - <mime-type type="application/vnd.nokia.pcd+xml"/> - <mime-type type="application/vnd.nokia.radio-preset"> - <glob pattern="*.rpst"/> - </mime-type> - <mime-type type="application/vnd.nokia.radio-presets"> - <glob pattern="*.rpss"/> - </mime-type> - <mime-type type="application/vnd.novadigm.edm"> - <glob pattern="*.edm"/> - </mime-type> - <mime-type type="application/vnd.novadigm.edx"> - <glob pattern="*.edx"/> - </mime-type> - <mime-type type="application/vnd.novadigm.ext"> - <glob pattern="*.ext"/> - </mime-type> - - <!-- =================================================================== --> - <!-- Open Document Format for Office Applications (OpenDocument) v1.0 --> - <!-- http://www.oasis-open.org/specs/index.php#opendocumentv1.0 --> - <!-- =================================================================== --> - - <mime-type type="application/vnd.oasis.opendocument.chart"> - <alias type="application/x-vnd.oasis.opendocument.chart"/> - <comment>OpenDocument v1.0: Chart document</comment> - <magic> - <match type="string" offset="0" value="PK"> - <match type="string" offset="30" - value="mimetypeapplication/vnd.oasis.opendocument.chart"/> - </match> - </magic> - <glob pattern="*.odc"/> - </mime-type> - - <mime-type type="application/vnd.oasis.opendocument.chart-template"> - <alias type="application/x-vnd.oasis.opendocument.chart-template"/> - <comment>OpenDocument v1.0: Chart document used as template</comment> - <magic> - <match type="string" offset="0" value="PK"> - <match type="string" offset="30" - value="mimetypeapplication/vnd.oasis.opendocument.chart-template"/> - </match> - </magic> - <glob pattern="*.otc"/> - </mime-type> - - <mime-type type="application/vnd.oasis.opendocument.database"> - <glob pattern="*.odb"/> - </mime-type> - - <mime-type type="application/vnd.oasis.opendocument.formula"> - <alias type="application/x-vnd.oasis.opendocument.formula"/> - <comment>OpenDocument v1.0: Formula document</comment> - <magic> - <match type="string" offset="0" value="PK"> - <match type="string" offset="30" - value="mimetypeapplication/vnd.oasis.opendocument.formula" /> - </match> - </magic> - <glob pattern="*.odf"/> - <sub-class-of type="application/zip" /> - </mime-type> - - <mime-type type="application/vnd.oasis.opendocument.formula-template"> - <alias type="application/x-vnd.oasis.opendocument.formula-template"/> - <comment>OpenDocument v1.0: Formula document used as template</comment> - <magic> - <match type="string" offset="0" value="PK"> - <match type="string" offset="30" - value="mimetypeapplication/vnd.oasis.opendocument.formula-template"/> - </match> - </magic> - <glob pattern="*.odft"/> - </mime-type> - - <mime-type type="application/vnd.oasis.opendocument.graphics"> - <alias type="application/x-vnd.oasis.opendocument.graphics"/> - <comment>OpenDocument v1.0: Graphics document (Drawing)</comment> - <magic> - <match type="string" offset="0" value="PK"> - <match type="string" offset="30" - value="mimetypeapplication/vnd.oasis.opendocument.graphics"/> - </match> - </magic> - <glob pattern="*.odg"/> - <sub-class-of type="application/zip" /> - </mime-type> - - <mime-type type="application/vnd.oasis.opendocument.graphics-template"> - <alias type="application/x-vnd.oasis.opendocument.graphics-template"/> - <comment>OpenDocument v1.0: Graphics document used as template</comment> - <magic> - <match type="string" offset="0" value="PK"> - <match type="string" offset="30" - value="mimetypeapplication/vnd.oasis.opendocument.graphics-template"/> - </match> - </magic> - <glob pattern="*.otg"/> - <sub-class-of type="application/zip" /> - </mime-type> - - <mime-type type="application/vnd.oasis.opendocument.image"> - <alias type="application/x-vnd.oasis.opendocument.image"/> - <comment>OpenDocument v1.0: Image document</comment> - <magic> - <match type="string" offset="0" value="PK"> - <match type="string" offset="30" - value="mimetypeapplication/vnd.oasis.opendocument.image"/> - </match> - </magic> - <glob pattern="*.odi"/> - </mime-type> - - <m... [truncated message content] |
From: <my...@us...> - 2010-12-02 20:32:22
|
Revision: 2447 http://aperture.svn.sourceforge.net/aperture/?rev=2447&view=rev Author: mylka Date: 2010-12-02 20:32:13 +0000 (Thu, 02 Dec 2010) Log Message: ----------- incorporated the latest fixes from TIKA snapshot Modified Paths: -------------- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java Added Paths: ----------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/ContentTypesHandler.java aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/StreamingZipContainerDetector.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/ContentTypesHandlerTest.java aperture-addons/trunk/src/test/resources/org/semanticdesktop/ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_footnotes_docx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_protectedSheets_xlsx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_1img_xlsx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_embeded_xlsx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_formats_xlsx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsb.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_2imgs_pptx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_embeded_pptx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_potm.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsm.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptm.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_thmx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testWORD_1img_docx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testWORD_3imgs_docx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testWORD_docx.xml aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testWORD_embeded_docx.xml Added: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/ContentTypesHandler.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/ContentTypesHandler.java (rev 0) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/ContentTypesHandler.java 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,57 @@ +package org.semanticdesktop.aperture.tika; + +import org.xml.sax.Attributes; +import org.xml.sax.SAXException; +import org.xml.sax.helpers.DefaultHandler; + +/** + * Extracts the (hopefully) only MIME type with .main or .main+xml suffix from a + * [Content_Types].xml entry in a OOXML file. + * @author Antoni + * + */ +public class ContentTypesHandler extends DefaultHandler{ + + private static final String NAMESPACE = + "http://schemas.openxmlformats.org/package/2006/content-types"; + private static final String OVERRIDE = + "Override"; + private static final String DEFAULT = + "Default"; + private static final String CONTENT_TYPE = + "ContentType"; + + private String mimeType; + + public String getMimeType() { + return mimeType; + } + + @Override + public void startDocument() throws SAXException { + this.mimeType = null; + } + + @Override + public void startElement(String uri, String localName, String qName, + Attributes attributes) throws SAXException { + if ((NAMESPACE.equals(uri) && OVERRIDE.equals(localName) || + OVERRIDE.equals(qName)) || + (NAMESPACE.equals(uri) && DEFAULT.equals(localName) || + DEFAULT.equals(qName))) { + int index = attributes.getIndex(NAMESPACE, CONTENT_TYPE); + if (index < 0) { + index = attributes.getIndex(CONTENT_TYPE); + } + if (index >= 0) { + String type = attributes.getValue(index); + if (type.endsWith(".main+xml") || type.endsWith(".main")) { + this.mimeType = type.substring(0,type.lastIndexOf(".")); + if (this.mimeType.toLowerCase().endsWith("macroenabled")) { + this.mimeType = mimeType.toLowerCase() + ".12"; + } + } + } + } + } +} Property changes on: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/ContentTypesHandler.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/StreamingZipContainerDetector.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/StreamingZipContainerDetector.java (rev 0) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/StreamingZipContainerDetector.java 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,65 @@ +/* + * Copyright (c) 2010 Aduna. + * All rights reserved. + * + * Licensed under the Aperture BSD-style license. + */ +package org.semanticdesktop.aperture.tika; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Iterator; +import java.util.zip.ZipEntry; +import java.util.zip.ZipException; +import java.util.zip.ZipInputStream; + +import org.apache.tika.detect.ZipContainerDetector; +import org.apache.tika.io.TikaInputStream; + +/* + * A failed idea, left in hope that it may be useful someday + */ +class StreamingZipContainerDetector extends ZipContainerDetector{ + +// private static final long serialVersionUID = -309421956260248519L; +// +// private ZipInputStream zipInputStream; +// +// @Override +// protected Iterator<? extends ZipEntry> getEntriesIterator( +// final TikaInputStream input) throws ZipException, IOException { +// zipInputStream = new ZipInputStream(input); +// return new Iterator<ZipEntry>() { +// +// public boolean hasNext() { +// // TODO Auto-generated method stub +// return false; +// } +// +// public ZipEntry next() { +// if (zipInputStream == null) { +// return null; +// } else { +// +// } +// zipInputStream.closeEntry(); +// } +// +// public void remove() { +// throw new UnsupportedOperationException(); +// } +// }; +// } +// +// /* (non-Javadoc) +// * @see org.apache.tika.detect.ZipContainerDetector#getInputStream(java.util.zip.ZipEntry) +// */ +// @Override +// protected InputStream getInputStream(ZipEntry entry) throws IOException { +// // TODO Auto-generated method stub +// return super.getInputStream(entry); +// } +// +// +// +} Property changes on: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/tika/StreamingZipContainerDetector.java ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/ContentTypesHandlerTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/ContentTypesHandlerTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/ContentTypesHandlerTest.java 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,75 @@ +package org.semanticdesktop.aperture.tika; + +import static org.junit.Assert.assertEquals; + +import java.io.IOException; +import java.io.InputStream; + +import javax.xml.parsers.ParserConfigurationException; +import javax.xml.parsers.SAXParser; +import javax.xml.parsers.SAXParserFactory; + +import org.junit.Test; +import org.xml.sax.SAXException; + +public class ContentTypesHandlerTest { + + @Test + public void testContentTypesContentHandler() + throws ParserConfigurationException, SAXException { + testFile("tika_footnotes_docx.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); + testFile("tika_protectedSheets_xlsx.xml", + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); + testFile("tika_testEXCEL_1img_xlsx.xml", + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); + testFile("tika_testEXCEL_embeded_xlsx.xml", + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); + testFile("tika_testEXCEL_formats_xlsx.xml", + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); + testFile("tika_testEXCEL_xlsb.xml", + "application/vnd.ms-excel.sheet.binary.macroenabled.12"); + testFile("tika_testEXCEL_xlsx.xml", + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); + testFile("tika_testPPT_2imgs_pptx.xml", + "application/vnd.openxmlformats-officedocument.presentationml.presentation"); + testFile("tika_testPPT_embeded_pptx.xml", + "application/vnd.openxmlformats-officedocument.presentationml.presentation"); + testFile("tika_testPPT_potm.xml", + "application/vnd.ms-powerpoint.template.macroenabled.12"); + testFile("tika_testPPT_ppsm.xml", + "application/vnd.ms-powerpoint.slideshow.macroenabled.12"); + testFile("tika_testPPT_ppsx.xml", + "application/vnd.openxmlformats-officedocument.presentationml.slideshow"); + testFile("tika_testPPT_pptm.xml", + "application/vnd.ms-powerpoint.presentation.macroenabled.12"); + testFile("tika_testPPT_pptx.xml", + "application/vnd.openxmlformats-officedocument.presentationml.presentation"); + testFile("tika_testPPT_thmx.xml", + "application/vnd.openxmlformats-officedocument.presentationml.presentation"); + testFile("tika_testWORD_1img_docx.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); + testFile("tika_testWORD_3imgs_docx.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); + testFile("tika_testWORD_docx.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); + testFile("tika_testWORD_embeded_docx.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document"); + } + + private void testFile(String fileName, String mimeType) + throws ParserConfigurationException, SAXException { + InputStream is = getClass().getResourceAsStream(fileName); + ContentTypesHandler handler = new ContentTypesHandler(); + SAXParserFactory f = SAXParserFactory.newInstance(); + SAXParser p = f.newSAXParser(); + try { + p.parse(is, handler); + } catch (IOException e) { + // this may happen + } + String actualMimeType = handler.getMimeType(); + assertEquals(mimeType, actualMimeType); + } + +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/ContentTypesHandlerTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain Modified: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java 2010-12-01 09:22:07 UTC (rev 2446) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTUnresolvedProblemsTest.java 2010-12-02 20:32:13 UTC (rev 2447) @@ -69,29 +69,6 @@ "application/vnd.ms-excel", "application/vnd.ms-excel"); // wrong, should be: "application/vnd.ms-works"); } - - /** - * Covered by TIKA-563 - * @throws IOException - */ - public void testStarOffice5_2TemplateFiles() throws IOException { - t("testVORCalcTemplate.vor", - "application/vnd.stardivision.writer", // wrong, should be "application/x-tika-msoffice - "application/x-tika-msoffice", - "application/x-tika-msoffice"); - t("testVORDrawTemplate.vor", - "application/vnd.stardivision.writer", // wrong, should be "application/x-tika-msoffice - "application/x-tika-msoffice", - "application/x-tika-msoffice"); // wrong, should be "application/x-tika-msoffice - t("testVORImpressTemplate.vor", - "application/vnd.stardivision.writer", // wrong, should be "application/x-tika-msoffice - "application/x-tika-msoffice", - "application/x-tika-msoffice"); // wrong, should be "application/x-tika-msoffice - t("testVORWriterTemplate.vor", - "application/vnd.stardivision.writer", // wrong, should be "application/x-tika-msoffice - "application/x-tika-msoffice", - "application/x-tika-msoffice"); // wrong, should be "application/x-tika-msoffice - } /* (non-Javadoc) * @see org.semanticdesktop.aperture.tika.ContainerAwareIdentificationTestCase#getDataDetector() Modified: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java 2010-12-01 09:22:07 UTC (rev 2446) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/PlainTikaMTWorkingTest.java 2010-12-02 20:32:13 UTC (rev 2447) @@ -61,9 +61,30 @@ // there is no ZipContainerDetector, so this is OK "application/x-tika-ooxml", "application/vnd.ms-excel.sheet.binary.macroenabled.12"); - - } + + /** + * Covered by TIKA-563 + * @throws IOException + */ + public void testStarOffice5_2TemplateFiles() throws IOException { + t("testVORCalcTemplate.vor", + "application/x-staroffice-template", + "application/x-tika-msoffice", + "application/x-staroffice-template"); + t("testVORDrawTemplate.vor", + "application/x-staroffice-template", + "application/x-tika-msoffice", + "application/x-staroffice-template"); + t("testVORImpressTemplate.vor", + "application/x-staroffice-template", + "application/x-tika-msoffice", + "application/x-staroffice-template"); + t("testVORWriterTemplate.vor", + "application/x-staroffice-template", + "application/x-tika-msoffice", + "application/x-staroffice-template"); + } /* (non-Javadoc) * @see org.semanticdesktop.aperture.tika.ContainerAwareIdentificationTestCase#getDataDetector() Modified: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java 2010-12-01 09:22:07 UTC (rev 2446) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/tika/TikaMimeTypeIdentifierTest.java 2010-12-02 20:32:13 UTC (rev 2447) @@ -223,13 +223,13 @@ t("rtf-staroffice-5.2.rtf", "application/rtf", "application/rtf"); t("rtf-word-2000.rtf", "application/rtf", "application/rtf"); - t("staroffice-5.2-calc-template.vor", "application/x-tika-msoffice", "application/vnd.stardivision.writer"); // UP + t("staroffice-5.2-calc-template.vor", "application/x-tika-msoffice", "application/x-staroffice-template"); t("staroffice-5.2-calc.sdc", "application/x-tika-msoffice", "application/vnd.stardivision.calc"); - t("staroffice-5.2-draw-template.vor", "application/x-tika-msoffice", "application/vnd.stardivision.writer"); // UP + t("staroffice-5.2-draw-template.vor", "application/x-tika-msoffice", "application/x-staroffice-template"); t("staroffice-5.2-draw.sda", "application/x-tika-msoffice", "application/vnd.stardivision.draw"); - t("staroffice-5.2-impress-template.vor", "application/x-tika-msoffice", "application/vnd.stardivision.writer"); // UP + t("staroffice-5.2-impress-template.vor", "application/x-tika-msoffice", "application/x-staroffice-template"); t("staroffice-5.2-impress.sdd", "application/x-tika-msoffice", "application/vnd.stardivision.impress"); - t("staroffice-5.2-writer-template.vor", "application/x-tika-msoffice", "application/vnd.stardivision.writer"); // UP + t("staroffice-5.2-writer-template.vor", "application/x-tika-msoffice", "application/x-staroffice-template"); t("staroffice-5.2-writer.sdw", "application/x-tika-msoffice", "application/vnd.stardivision.writer"); t("tar-test.tar","application/x-tar","application/x-tar"); Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_footnotes_docx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_footnotes_docx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_footnotes_docx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/word/footnotes.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footnotes+xml"/><Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/><Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/><Override PartName="/word/endnotes.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.endnotes+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/><Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/><Override PartName="/word/webSettings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_footnotes_docx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_protectedSheets_xlsx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_protectedSheets_xlsx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_protectedSheets_xlsx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="bin" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.printerSettings"/><Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/xl/worksheets/sheet2.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/worksheets/sheet3.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/calcChain.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.calcChain+xml"/><Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_protectedSheets_xlsx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_1img_xlsx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_1img_xlsx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_1img_xlsx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="png" ContentType="image/png"/><Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/xl/worksheets/sheet2.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/drawings/drawing1.xml" ContentType="application/vnd.openxmlformats-officedocument.drawing+xml"/><Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/calcChain.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.calcChain+xml"/><Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_1img_xlsx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_embeded_xlsx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_embeded_xlsx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_embeded_xlsx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="bin" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.printerSettings"/><Default Extension="png" ContentType="image/png"/><Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/><Default Extension="emf" ContentType="image/x-emf"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/><Default Extension="docx" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/xl/worksheets/sheet2.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/drawings/drawing1.xml" ContentType="application/vnd.openxmlformats-officedocument.drawing+xml"/><Default Extension="pptx" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presentation"/><Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Default Extension="vml" ContentType="application/vnd.openxmlformats-officedocument.vmlDrawing"/><Override PartName="/xl/calcChain.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.calcChain+xml"/><Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/><Default Extension="doc" ContentType="application/msword"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_embeded_xlsx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_formats_xlsx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_formats_xlsx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_formats_xlsx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_formats_xlsx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsb.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsb.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsb.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="bin" ContentType="application/vnd.ms-excel.sheet.binary.macroEnabled.main"/><Override PartName="/xl/worksheets/sheet2.bin" ContentType="application/vnd.ms-excel.worksheet"/><Override PartName="/xl/worksheets/sheet3.bin" ContentType="application/vnd.ms-excel.worksheet"/><Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/xl/worksheets/sheet1.bin" ContentType="application/vnd.ms-excel.worksheet"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/xl/sharedStrings.bin" ContentType="application/vnd.ms-excel.sharedStrings"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/xl/worksheets/binaryIndex3.bin" ContentType="application/vnd.ms-excel.binIndexWs"/><Override PartName="/xl/worksheets/binaryIndex1.bin" ContentType="application/vnd.ms-excel.binIndexWs"/><Override PartName="/xl/worksheets/binaryIndex2.bin" ContentType="application/vnd.ms-excel.binIndexWs"/><Override PartName="/xl/styles.bin" ContentType="application/vnd.ms-excel.styles"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsb.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/xl/worksheets/sheet2.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/worksheets/sheet3.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/calcChain.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.calcChain+xml"/><Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testEXCEL_xlsx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_2imgs_pptx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_2imgs_pptx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_2imgs_pptx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/ppt/slideLayouts/slideLayout7.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout8.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="png" ContentType="image/png"/><Override PartName="/ppt/slideMasters/slideMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"/><Override PartName="/ppt/presProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout6.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/slideLayouts/slideLayout2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="jpeg" ContentType="image/jpeg"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/ppt/presentation.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml"/><Override PartName="/ppt/slideLayouts/slideLayout1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/ppt/tableStyles.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.tableStyles+xml"/><Override PartName="/ppt/slideLayouts/slideLayout11.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/><Override PartName="/ppt/slideLayouts/slideLayout10.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="gif" ContentType="image/gif"/><Override PartName="/ppt/viewProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.viewProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout9.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_2imgs_pptx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_embeded_pptx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_embeded_pptx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_embeded_pptx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/ppt/slideLayouts/slideLayout7.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout8.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="png" ContentType="image/png"/><Override PartName="/ppt/slideMasters/slideMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"/><Override PartName="/ppt/presProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout6.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/slideLayouts/slideLayout2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="emf" ContentType="image/x-emf"/><Default Extension="jpeg" ContentType="image/jpeg"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/ppt/presentation.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml"/><Override PartName="/ppt/slideLayouts/slideLayout1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="docx" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/ppt/tableStyles.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.tableStyles+xml"/><Override PartName="/ppt/slideLayouts/slideLayout11.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/><Override PartName="/ppt/slideLayouts/slideLayout10.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="vml" ContentType="application/vnd.openxmlformats-officedocument.vmlDrawing"/><Default Extension="gif" ContentType="image/gif"/><Default Extension="xlsx" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"/><Default Extension="doc" ContentType="application/msword"/><Override PartName="/ppt/viewProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.viewProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout9.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_embeded_pptx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_potm.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_potm.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_potm.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/ppt/slideLayouts/slideLayout7.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout8.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideMasters/slideMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"/><Override PartName="/ppt/presProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout6.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/theme/theme2.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/theme/theme3.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/slideLayouts/slideLayout2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="jpeg" ContentType="image/jpeg"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/ppt/presentation.xml" ContentType="application/vnd.ms-powerpoint.template.macroEnabled.main+xml"/><Override PartName="/ppt/notesMasters/notesMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.notesMaster+xml"/><Override PartName="/ppt/slideLayouts/slideLayout1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/ppt/tableStyles.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.tableStyles+xml"/><Override PartName="/ppt/handoutMasters/handoutMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.handoutMaster+xml"/><Override PartName="/ppt/viewProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.viewProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout9.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_potm.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsm.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsm.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsm.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/ppt/slideLayouts/slideLayout7.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout8.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideMasters/slideMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"/><Override PartName="/ppt/slides/slide3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/presProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout6.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/slides/slide2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/slideLayouts/slideLayout2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="jpeg" ContentType="image/jpeg"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/ppt/presentation.xml" ContentType="application/vnd.ms-powerpoint.slideshow.macroEnabled.main+xml"/><Override PartName="/ppt/slideLayouts/slideLayout1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/ppt/tableStyles.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.tableStyles+xml"/><Override PartName="/ppt/slideLayouts/slideLayout11.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout10.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/viewProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.viewProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout9.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsm.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/ppt/slideLayouts/slideLayout7.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout8.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideMasters/slideMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"/><Override PartName="/ppt/slides/slide3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/presProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout6.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/slides/slide2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/slideLayouts/slideLayout2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="jpeg" ContentType="image/jpeg"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/ppt/presentation.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideshow.main+xml"/><Override PartName="/ppt/slideLayouts/slideLayout1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/ppt/tableStyles.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.tableStyles+xml"/><Override PartName="/ppt/slideLayouts/slideLayout11.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout10.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/viewProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.viewProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout9.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_ppsx.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptm.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptm.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptm.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/ppt/slideLayouts/slideLayout7.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout8.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideMasters/slideMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"/><Override PartName="/ppt/slides/slide3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/presProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout6.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/slides/slide2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/ppt/slideLayouts/slideLayout2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Default Extension="jpeg" ContentType="image/jpeg"/><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/ppt/presentation.xml" ContentType="application/vnd.ms-powerpoint.presentation.macroEnabled.main+xml"/><Override PartName="/ppt/slideLayouts/slideLayout1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/ppt/tableStyles.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.tableStyles+xml"/><Override PartName="/ppt/slideLayouts/slideLayout11.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout10.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/viewProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.viewProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout9.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/></Types> \ No newline at end of file Property changes on: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptm.xml ___________________________________________________________________ Added: svn:mime-type + text/plain Added: aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptx.xml =================================================================== --- aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptx.xml (rev 0) +++ aperture-addons/trunk/src/test/resources/org/semanticdesktop/aperture/tika/tika_testPPT_pptx.xml 2010-12-02 20:32:13 UTC (rev 2447) @@ -0,0 +1,2 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Override PartName="/ppt/slideLayouts/slideLayout7.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout8.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideMasters/slideMaster1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"/><Override PartName="/ppt/slides/slide3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/presProps.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presProps+xml"/><Override PartName="/ppt/slideLayouts/slideLayout4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slideLayouts/slideLayout6.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"/><Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/><Override PartName="/ppt/slides/slide2.xml" ContentType="application/vnd.openxmlformats-officedocumen... [truncated message content] |
From: <my...@us...> - 2011-02-28 15:16:42
|
Revision: 2462 http://aperture.svn.sourceforge.net/aperture/?rev=2462&view=rev Author: mylka Date: 2011-02-28 15:16:35 +0000 (Mon, 28 Feb 2011) Log Message: ----------- first version of a proper wikpedia subcrawler which includes mediawiki markup stripping and a proper factory Modified Paths: -------------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java Added Paths: ----------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java Modified: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java 2011-02-28 15:15:34 UTC (rev 2461) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java 2011-02-28 15:16:35 UTC (rev 2462) @@ -1,3 +1,9 @@ +/* + * Copyright (c) 2011 Aduna and Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH. + * All rights reserved. + * + * Licensed under the Aperture BSD-style license. + */ package org.semanticdesktop.aperture.x2r; import info.aduna.io.IOUtil; Added: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java (rev 0) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java 2011-02-28 15:16:35 UTC (rev 2462) @@ -0,0 +1,38 @@ +/* + * Copyright (c) 2011 Aduna and Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH. + * All rights reserved. + * + * Licensed under the Aperture BSD-style license. + */ +package org.semanticdesktop.aperture.x2r; + +import java.util.Collections; +import java.util.HashSet; +import java.util.Set; + +import org.semanticdesktop.aperture.subcrawler.SubCrawler; +import org.semanticdesktop.aperture.subcrawler.SubCrawlerFactory; + +public class WikipediaSubCrawlerFactory implements SubCrawlerFactory { + + private static final Set MIME_TYPES; + + static { + Set<String> set = new HashSet<String>(); + set.add("application/x-mediawiki-xml-export"); + MIME_TYPES = Collections.unmodifiableSet(set); + } + + public Set getSupportedMimeTypes() { + return MIME_TYPES; + } + + public SubCrawler get() { + return new WikipediaSubCrawler(); + } + + public String getUriPrefix() { + return "mediawiki"; + } + +} Property changes on: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java ___________________________________________________________________ Added: svn:mime-type + text/plain Modified: aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl =================================================================== --- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl 2011-02-28 15:15:34 UTC (rev 2461) +++ aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl 2011-02-28 15:16:35 UTC (rev 2462) @@ -1,6 +1,8 @@ @prefix xml2r: <http://fivo.cyf-kr.edu.pl/trac/wiki/X2R/xml/mapping#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . +@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> . +@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#> . @prefix : <uri:test:dblp1#> . :wikipediaMapping a xml2r:Mapping ; @@ -17,19 +19,20 @@ xml2r:belongsToMapping :wikipediaMapping ; xml2r:nodeXPath "/mw:mediawiki/mw:page" ; xml2r:uriPattern "http://pubs.org/publications/${mw:id/text()}" ; - xml2r:class <http://some.cool.ontology/2008/ont#Publication> . + xml2r:class nfo:TextDocument . :titleBridge a xml2r:PropertyBridge ; xml2r:belongsToClassMap :publicationMap ; - xml2r:property dc:title ; + xml2r:property nie:title ; xml2r:pattern "${mw:title/text()}" . -:contributorBridge a xml2r:PropertyBridge ; - xml2r:belongsToClassMap :publicationMap ; - xml2r:property dc:contributor ; - xml2r:pattern "${mw:revision/mw:contributor/mw:username/text()}" . +#contributor is difficult because we can't easily create Contact instances +#:contributorBridge a xml2r:PropertyBridge ; +# xml2r:belongsToClassMap :publicationMap ; +# xml2r:property dc:contributor ; +# xml2r:pattern "${mw:revision/mw:contributor/mw:username/text()}" . :textBridge a xml2r:PropertyBridge ; xml2r:belongsToClassMap :publicationMap ; - xml2r:property dc:text ; - xml2r:pattern "${mw:revision/mw:text/text()}" . \ No newline at end of file + xml2r:property nie:plainTextContent ; + xml2r:pattern "${mw:revision/mw:text/text()||mediawiki}" . \ No newline at end of file Modified: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java 2011-02-28 15:15:34 UTC (rev 2461) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java 2011-02-28 15:16:35 UTC (rev 2462) @@ -23,6 +23,8 @@ subCrawl("ko-wiki.xml", subCrawler, handler); Model model = handler.getModel(); + model.dump(); + // there are two articles in there assertNewModUnmod(handler, 2, 0, 0); // there are three property bridges + the type triples + data object type This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <my...@us...> - 2011-05-26 10:01:12
|
Revision: 2492 http://aperture.svn.sourceforge.net/aperture/?rev=2492&view=rev Author: mylka Date: 2011-05-26 10:01:05 +0000 (Thu, 26 May 2011) Log Message: ----------- first draft of a X2RSubCrawler which doesn't require additional programming Modified Paths: -------------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerFactory.java aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java Added Paths: ----------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java Added: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java (rev 0) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java 2011-05-26 10:01:05 UTC (rev 2492) @@ -0,0 +1,166 @@ +/* + * Copyright (c) 2005 - 2008 Aduna. + * All rights reserved. + * + * Licensed under the Aperture BSD-style license. + */ +package org.semanticdesktop.aperture.x2r; + +import info.aduna.io.IOUtil; + +import java.io.File; +import java.io.IOException; +import java.util.List; + +import org.apache.tika.mime.MimeTypeException; +import org.ontoware.rdf2go.exception.ModelException; +import org.semanticdesktop.aperture.accessor.impl.DefaultDataAccessorRegistry; +import org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler; +import org.semanticdesktop.aperture.datasource.filesystem.FileSystemDataSource; +import org.semanticdesktop.aperture.examples.AbstractExampleCrawler; +import org.semanticdesktop.aperture.rdf.RDFContainer; +import org.semanticdesktop.aperture.rdf.impl.RDFContainerFactoryImpl; +import org.semanticdesktop.aperture.subcrawler.SubCrawlerRegistry; +import org.semanticdesktop.aperture.tika.TikaMimeTypeIdentifier; +import org.semanticdesktop.nepomuk.nrl.validator.ModelTester; +import org.semanticdesktop.nepomuk.nrl.validator.testers.DataObjectTreeModelTester; + +/** + * Example class that crawls a file system and stores all extracted metadata in a RDF file. + * @author fluit, sauermann, klinkigt + */ +public class ExampleMediawikiFileCrawler extends AbstractExampleCrawler { + + private File rootFile; + private Boolean suppressParentChildLinks = Boolean.FALSE; + + public static final String SUPPRESS_PARENT_CHILD_LINKS_OPTION = "--suppressParentChildLinks"; + + public void crawl() throws ModelException { + if (rootFile == null) { + throw new IllegalArgumentException("root file cannot be null"); + } + + SubCrawlerRegistry reg = handler.getSubCrawlerRegistry(); + TikaMimeTypeIdentifier tmti = handler.getMimeTypeIdentifier(); + + if (reg != null && tmti != null) { + try { + X2RSubCrawlerUtil.registerXMLDatatype( + reg, + tmti, + "application/x-mediawiki-xml-export", + IOUtil.readString(getClass().getResourceAsStream( + "wikipedia-mapping.ttl")), + "<mediawiki", + "http://www.mediawiki.org/xml/export-0.4/", + "mediawiki", + null); + } + catch (Exception e) { + throw new RuntimeException(e); + } + } + + // create a data source configuration + RDFContainerFactoryImpl factory = new RDFContainerFactoryImpl(); + RDFContainer configuration = factory.newInstance("source:testsource"); + + // create the data source + FileSystemDataSource source = new FileSystemDataSource(); + source.setConfiguration(configuration); + + source.setRootFolder(rootFile.getAbsolutePath()); + source.setSuppressParentChildLinks(suppressParentChildLinks); + + // setup a crawler that can handle this type of DataSource + FileSystemCrawler crawler = new FileSystemCrawler(); + crawler.setDataSource(source); + crawler.setDataAccessorRegistry(new DefaultDataAccessorRegistry()); + crawler.setCrawlerHandler(getHandler()); + crawler.setAccessData(getAccessData()); + + // start crawling + crawler.crawl(); + } + + public void setRootFile(File rootFile) { + this.rootFile = rootFile; + } + + public File getRootFile() { + return rootFile; + } + + + /** + * The FileSystem crawler satisfies a more strict constraint + */ + @Override + public ModelTester[] getAdditionalModelTesters() { + return new ModelTester[] { new DataObjectTreeModelTester() }; + } + + /** + * The main method + * @param args command line arguments + * @throws ModelException + */ + public static void main(String[] args) throws Exception { + // create a new ExampleFileCrawler instance + ExampleMediawikiFileCrawler crawler = new ExampleMediawikiFileCrawler(); + + // parse the command line options + + List<String> remaining = crawler.processCommonOptions(args); + + for (String arg : remaining) { + if (arg.equals(SUPPRESS_PARENT_CHILD_LINKS_OPTION)) { + crawler.setSuppressParentChildLinks(Boolean.TRUE); + continue; + } else if (arg.startsWith("-")) { + System.err.println("Unknown option: " + arg); + crawler.exitWithUsageMessage(); + } else if (crawler.getRootFile() == null) { + crawler.setRootFile(new File(arg)); + } + else { + crawler.exitWithUsageMessage(); + } + } + + if (crawler.getRootFile() == null) { + crawler.exitWithUsageMessage(); + } + + // start crawling and exit afterwards + crawler.crawl(); + } + + @Override + protected String getSpecificExplanationPart() { + return " " + SUPPRESS_PARENT_CHILD_LINKS_OPTION + " Supress the addition of parent->child nie:hasPart triples\n" + + " <root-folder> - the directory to start crawling"; + } + + @Override + protected String getSpecificSyntaxPart() { + return "[" + SUPPRESS_PARENT_CHILD_LINKS_OPTION + "] " + "<root-folder>"; + } + + + /** + * @return Returns the suppressParentChildLinks. + */ + public synchronized Boolean getSuppressParentChildLinks() { + return suppressParentChildLinks; + } + + + /** + * @param suppressParentChildLinks The suppressParentChildLinks to set. + */ + public synchronized void setSuppressParentChildLinks(Boolean suppressParentChildLinks) { + this.suppressParentChildLinks = suppressParentChildLinks; + } +} Property changes on: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java ___________________________________________________________________ Added: svn:mime-type + text/plain Modified: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerFactory.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerFactory.java 2011-05-26 10:00:15 UTC (rev 2491) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerFactory.java 2011-05-26 10:01:05 UTC (rev 2492) @@ -22,6 +22,7 @@ mimeType = new HashSet<String>(); mimeType.add(type); mimeType = Collections.unmodifiableSet(mimeType); + this.mapping = mapping; } public Set getSupportedMimeTypes() { Modified: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java 2011-05-26 10:00:15 UTC (rev 2491) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java 2011-05-26 10:01:05 UTC (rev 2492) @@ -1,5 +1,7 @@ package org.semanticdesktop.aperture.x2r; +import info.aduna.xml.XMLUtil; + import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; @@ -10,6 +12,7 @@ import org.apache.tika.mime.MimeTypeException; import org.apache.tika.mime.MimeTypes; import org.apache.tika.mime.MimeTypesEnhancer; +import org.semanticdesktop.aperture.rdf.util.XmlSafetyUtils; import org.semanticdesktop.aperture.subcrawler.SubCrawlerFactory; import org.semanticdesktop.aperture.subcrawler.SubCrawlerRegistry; import org.semanticdesktop.aperture.tika.TikaMimeTypeIdentifier; @@ -56,11 +59,18 @@ * @throws MimeTypeException * if the mimeTypeString is wrongly formatted */ - public void registerXMLDatatype(SubCrawlerRegistry registry, + public static void registerXMLDatatype(SubCrawlerRegistry registry, TikaMimeTypeIdentifier identifier, String mimeTypeString, String mapping, String fileHeader, String rootElementNameSpace, String rootElementName, String extension) throws MimeTypeException { + if (mimeTypeString == null) { + throw new NullPointerException("mimeTypeString cannot be null"); + } + if (mapping == null) { + throw new NullPointerException("mapping cannot be null"); + } + registry.add(new X2RSubCrawlerFactory(mimeTypeString, mapping)); MimeTypes types = identifier.getMimeTypes(); @@ -80,9 +90,10 @@ throw new RuntimeException(e); // can't happen } pw.println("<mime-info>"); - pw.println("<mime-type>"); + pw.println("<mime-type type=\"" + mimeTypeString + "\">"); if (fileHeader != null) { pw.println("<magic priority=\"50\">"); + fileHeader = XMLUtil.escapeAttributeValue(fileHeader); pw.println("<match value=\"" + fileHeader + "\" type =\"string\" offset=\"0\" />"); pw.println("</magic>"); } Added: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java (rev 0) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java 2011-05-26 10:01:05 UTC (rev 2492) @@ -0,0 +1,97 @@ +/* + * Copyright (c) 2011 Aduna and Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH. + * All rights reserved. + * + * Licensed under the Aperture BSD-style license. + */ +package org.semanticdesktop.aperture.x2r; + +import java.io.File; +import java.io.IOException; + +import info.aduna.io.FileUtil; +import info.aduna.io.IOUtil; + +import org.apache.tika.mime.MimeTypeException; +import org.junit.Test; +import org.ontoware.aifbcommons.collection.ClosableIterator; +import org.ontoware.rdf2go.model.Model; +import org.ontoware.rdf2go.model.Statement; +import org.ontoware.rdf2go.model.node.Variable; +import org.semanticdesktop.aperture.accessor.impl.DefaultDataAccessorRegistry; +import org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler; +import org.semanticdesktop.aperture.datasource.filesystem.FileSystemDataSource; +import org.semanticdesktop.aperture.extractor.ExtractorRegistry; +import org.semanticdesktop.aperture.extractor.impl.DefaultExtractorRegistry; +import org.semanticdesktop.aperture.subcrawler.SubCrawlerRegistry; +import org.semanticdesktop.aperture.subcrawler.impl.DefaultSubCrawlerRegistry; +import org.semanticdesktop.aperture.test.ApertureTestBase; +import org.semanticdesktop.aperture.test.TestIncrementalCrawlerHandler; +import org.semanticdesktop.aperture.tika.TikaMimeTypeIdentifier; +import org.semanticdesktop.aperture.vocabulary.NIE; + +public class X2RSubCrawlerUtilTest extends ApertureTestBase { + + public void testWikipedia() throws MimeTypeException, IOException { + TikaMimeTypeIdentifier id = new TikaMimeTypeIdentifier(); + SubCrawlerRegistry reg = new DefaultSubCrawlerRegistry(); + ExtractorRegistry exReg = new DefaultExtractorRegistry(); + + X2RSubCrawlerUtil.registerXMLDatatype( + reg, + id, + "application/x-mediawiki-xml-export", + IOUtil.readString(getClass().getResourceAsStream( + "wikipedia-mapping.ttl")), + "<mediawiki", + "http://www.mediawiki.org/xml/export-0.4/", + "mediawiki", + null); + + TestIncrementalCrawlerHandler hndlr = + new TestIncrementalCrawlerHandler(reg, exReg, id, false); + + File folder = new File( + System.getProperty("java.io.tmpdir") + + File.separator + "tmp-folder"); + + FileUtil.deltree(folder); + assertFalse(folder.exists()); + folder.mkdirs(); + assertTrue(folder.isDirectory()); + + IOUtil.writeStream(getClass().getResourceAsStream( + "ko-wiki.xml"), new File(folder, "ko-wiki.xml")); + + FileSystemDataSource ds = new FileSystemDataSource(); + ds.setConfiguration(createRDFContainer("uri:ds")); + ds.setRootFolder(folder.getCanonicalPath()); + + FileSystemCrawler crawler = new FileSystemCrawler(); + crawler.setDataAccessorRegistry(new DefaultDataAccessorRegistry()); + crawler.setCrawlerHandler(hndlr); + crawler.setDataSource(ds); + crawler.crawl(); + + // the ko-wiki.xml file should be correctly classified + // as a mediawiki dump and should get its plainTextContent + + Model model = hndlr.getModel(); + ClosableIterator<Statement> sts = + model.findStatements(Variable.ANY, NIE.plainTextContent, Variable.ANY); + + // there should be exactly two (there are two entries in the file) + // the first entry doesn't contain any reasonable content (a test for markup removal) + // the second entry does, it's from the "Jimmy Carter" page + Statement st1 = sts.next(); + Statement st2 = sts.next(); + assertFalse(sts.hasNext()); + + String content1 = st1.getObject().toString(); + String content2 = st2.getObject().toString(); + + assertTrue(content1.contains("Jimmy Carter") || content2.contains("Jimmy Carter")); + + } + +} Property changes on: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java ___________________________________________________________________ Added: svn:mime-type + text/plain This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <my...@us...> - 2011-06-10 22:56:35
|
Revision: 2502 http://aperture.svn.sourceforge.net/aperture/?rev=2502&view=rev Author: mylka Date: 2011-06-10 22:56:29 +0000 (Fri, 10 Jun 2011) Log Message: ----------- removed the apache.tika packages in aperture-addons Removed Paths: ------------- aperture-addons/trunk/src/main/java/org/apache/ aperture-addons/trunk/src/test/java/org/apache/ aperture-addons/trunk/src/test/resources/org/apache/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <my...@us...> - 2011-07-11 22:18:57
|
Revision: 2508 http://aperture.svn.sourceforge.net/aperture/?rev=2508&view=rev Author: mylka Date: 2011-07-11 22:18:49 +0000 (Mon, 11 Jul 2011) Log Message: ----------- implemented a simpler, one-argument version of the X2RSubCrawlerUtil.registerXMLDataType Modified Paths: -------------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java Added Paths: ----------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/AX.java Removed Paths: ------------- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java Added: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/AX.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/AX.java (rev 0) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/AX.java 2011-07-11 22:18:49 UTC (rev 2508) @@ -0,0 +1,25 @@ +package org.semanticdesktop.aperture.x2r; + +import org.ontoware.rdf2go.model.node.URI; +import org.ontoware.rdf2go.model.node.impl.URIImpl; + +/** + * Vocabulary for Aperture-specific properties in X2R mappings, used by the + * {@link X2RSubCrawler} and {@link X2RSubCrawlerUtil}. + * + * @author Antoni + * + */ +public class AX { + + /** + * The namespace + */ + public static final String NS = "http://aperture.sourceforge.net/2011/07/x2rsubcrawler#"; + + public static final URI MIMETYPE = new URIImpl(NS + "mimeType"); + + public static final URI ROOTELEMENTNAME = new URIImpl(NS + "rootElementName"); + + public static final URI ROOTELEMENTNS = new URIImpl(NS + "rootElementNameSpace"); +} Property changes on: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/AX.java ___________________________________________________________________ Added: svn:mime-type + text/plain Deleted: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java 2011-06-29 13:56:16 UTC (rev 2507) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/ExampleMediawikiFileCrawler.java 2011-07-11 22:18:49 UTC (rev 2508) @@ -1,166 +0,0 @@ -/* - * Copyright (c) 2005 - 2008 Aduna. - * All rights reserved. - * - * Licensed under the Aperture BSD-style license. - */ -package org.semanticdesktop.aperture.x2r; - -import info.aduna.io.IOUtil; - -import java.io.File; -import java.io.IOException; -import java.util.List; - -import org.apache.tika.mime.MimeTypeException; -import org.ontoware.rdf2go.exception.ModelException; -import org.semanticdesktop.aperture.accessor.impl.DefaultDataAccessorRegistry; -import org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler; -import org.semanticdesktop.aperture.datasource.filesystem.FileSystemDataSource; -import org.semanticdesktop.aperture.examples.AbstractExampleCrawler; -import org.semanticdesktop.aperture.rdf.RDFContainer; -import org.semanticdesktop.aperture.rdf.impl.RDFContainerFactoryImpl; -import org.semanticdesktop.aperture.subcrawler.SubCrawlerRegistry; -import org.semanticdesktop.aperture.tika.TikaMimeTypeIdentifier; -import org.semanticdesktop.nepomuk.nrl.validator.ModelTester; -import org.semanticdesktop.nepomuk.nrl.validator.testers.DataObjectTreeModelTester; - -/** - * Example class that crawls a file system and stores all extracted metadata in a RDF file. - * @author fluit, sauermann, klinkigt - */ -public class ExampleMediawikiFileCrawler extends AbstractExampleCrawler { - - private File rootFile; - private Boolean suppressParentChildLinks = Boolean.FALSE; - - public static final String SUPPRESS_PARENT_CHILD_LINKS_OPTION = "--suppressParentChildLinks"; - - public void crawl() throws ModelException { - if (rootFile == null) { - throw new IllegalArgumentException("root file cannot be null"); - } - - SubCrawlerRegistry reg = handler.getSubCrawlerRegistry(); - TikaMimeTypeIdentifier tmti = handler.getMimeTypeIdentifier(); - - if (reg != null && tmti != null) { - try { - X2RSubCrawlerUtil.registerXMLDatatype( - reg, - tmti, - "application/x-mediawiki-xml-export", - IOUtil.readString(getClass().getResourceAsStream( - "wikipedia-mapping.ttl")), - "<mediawiki", - "http://www.mediawiki.org/xml/export-0.4/", - "mediawiki", - null); - } - catch (Exception e) { - throw new RuntimeException(e); - } - } - - // create a data source configuration - RDFContainerFactoryImpl factory = new RDFContainerFactoryImpl(); - RDFContainer configuration = factory.newInstance("source:testsource"); - - // create the data source - FileSystemDataSource source = new FileSystemDataSource(); - source.setConfiguration(configuration); - - source.setRootFolder(rootFile.getAbsolutePath()); - source.setSuppressParentChildLinks(suppressParentChildLinks); - - // setup a crawler that can handle this type of DataSource - FileSystemCrawler crawler = new FileSystemCrawler(); - crawler.setDataSource(source); - crawler.setDataAccessorRegistry(new DefaultDataAccessorRegistry()); - crawler.setCrawlerHandler(getHandler()); - crawler.setAccessData(getAccessData()); - - // start crawling - crawler.crawl(); - } - - public void setRootFile(File rootFile) { - this.rootFile = rootFile; - } - - public File getRootFile() { - return rootFile; - } - - - /** - * The FileSystem crawler satisfies a more strict constraint - */ - @Override - public ModelTester[] getAdditionalModelTesters() { - return new ModelTester[] { new DataObjectTreeModelTester() }; - } - - /** - * The main method - * @param args command line arguments - * @throws ModelException - */ - public static void main(String[] args) throws Exception { - // create a new ExampleFileCrawler instance - ExampleMediawikiFileCrawler crawler = new ExampleMediawikiFileCrawler(); - - // parse the command line options - - List<String> remaining = crawler.processCommonOptions(args); - - for (String arg : remaining) { - if (arg.equals(SUPPRESS_PARENT_CHILD_LINKS_OPTION)) { - crawler.setSuppressParentChildLinks(Boolean.TRUE); - continue; - } else if (arg.startsWith("-")) { - System.err.println("Unknown option: " + arg); - crawler.exitWithUsageMessage(); - } else if (crawler.getRootFile() == null) { - crawler.setRootFile(new File(arg)); - } - else { - crawler.exitWithUsageMessage(); - } - } - - if (crawler.getRootFile() == null) { - crawler.exitWithUsageMessage(); - } - - // start crawling and exit afterwards - crawler.crawl(); - } - - @Override - protected String getSpecificExplanationPart() { - return " " + SUPPRESS_PARENT_CHILD_LINKS_OPTION + " Supress the addition of parent->child nie:hasPart triples\n" + - " <root-folder> - the directory to start crawling"; - } - - @Override - protected String getSpecificSyntaxPart() { - return "[" + SUPPRESS_PARENT_CHILD_LINKS_OPTION + "] " + "<root-folder>"; - } - - - /** - * @return Returns the suppressParentChildLinks. - */ - public synchronized Boolean getSuppressParentChildLinks() { - return suppressParentChildLinks; - } - - - /** - * @param suppressParentChildLinks The suppressParentChildLinks to set. - */ - public synchronized void setSuppressParentChildLinks(Boolean suppressParentChildLinks) { - this.suppressParentChildLinks = suppressParentChildLinks; - } -} Deleted: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java 2011-06-29 13:56:16 UTC (rev 2507) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawler.java 2011-07-11 22:18:49 UTC (rev 2508) @@ -1,39 +0,0 @@ -/* - * Copyright (c) 2011 Aduna and Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH. - * All rights reserved. - * - * Licensed under the Aperture BSD-style license. - */ -package org.semanticdesktop.aperture.x2r; - -import info.aduna.io.IOUtil; - -import java.io.IOException; - -/** - * A {@link X2RSubCrawler} implementation geared towards dealing with - * Wikipedia dump files. - * - * @author Antoni - * - */ -public class WikipediaSubCrawler extends X2RSubCrawler { - - - private static final String MAPPING_STRING; - - static { - try { - MAPPING_STRING = IOUtil.readString( - WikipediaSubCrawler.class - .getResourceAsStream("wikipedia-mapping.ttl")); - } - catch (IOException e) { - throw new RuntimeException(e); // this can't happen - } - } - - public WikipediaSubCrawler() { - super(MAPPING_STRING); - } -} Deleted: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java 2011-06-29 13:56:16 UTC (rev 2507) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerFactory.java 2011-07-11 22:18:49 UTC (rev 2508) @@ -1,38 +0,0 @@ -/* - * Copyright (c) 2011 Aduna and Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH. - * All rights reserved. - * - * Licensed under the Aperture BSD-style license. - */ -package org.semanticdesktop.aperture.x2r; - -import java.util.Collections; -import java.util.HashSet; -import java.util.Set; - -import org.semanticdesktop.aperture.subcrawler.SubCrawler; -import org.semanticdesktop.aperture.subcrawler.SubCrawlerFactory; - -public class WikipediaSubCrawlerFactory implements SubCrawlerFactory { - - private static final Set MIME_TYPES; - - static { - Set<String> set = new HashSet<String>(); - set.add("application/x-mediawiki-xml-export"); - MIME_TYPES = Collections.unmodifiableSet(set); - } - - public Set getSupportedMimeTypes() { - return MIME_TYPES; - } - - public SubCrawler get() { - return new WikipediaSubCrawler(); - } - - public String getUriPrefix() { - return "mediawiki"; - } - -} Modified: aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java =================================================================== --- aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java 2011-06-29 13:56:16 UTC (rev 2507) +++ aperture-addons/trunk/src/main/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtil.java 2011-07-11 22:18:49 UTC (rev 2508) @@ -1,10 +1,29 @@ package org.semanticdesktop.aperture.x2r; +import java.io.IOException; +import java.io.StringReader; + import org.apache.tika.mime.MimeTypeException; +import org.ontoware.aifbcommons.collection.ClosableIterator; +import org.ontoware.rdf2go.RDF2Go; +import org.ontoware.rdf2go.exception.ModelRuntimeException; +import org.ontoware.rdf2go.model.Model; +import org.ontoware.rdf2go.model.Statement; +import org.ontoware.rdf2go.model.Syntax; +import org.ontoware.rdf2go.model.node.Literal; +import org.ontoware.rdf2go.model.node.Node; +import org.ontoware.rdf2go.model.node.Resource; +import org.ontoware.rdf2go.model.node.URI; +import org.ontoware.rdf2go.model.node.Variable; +import org.ontoware.rdf2go.util.ModelUtils; +import org.ontoware.rdf2go.vocabulary.RDF; +import org.semanticdesktop.aperture.rdf.util.ModelUtil; import org.semanticdesktop.aperture.subcrawler.SubCrawlerFactory; import org.semanticdesktop.aperture.subcrawler.SubCrawlerRegistry; import org.semanticdesktop.aperture.tika.TikaMimeTypeIdentifier; +import pl.edu.agh.x2r.xml.XML2R; + /** * A utility class which can register an XML datatype in Aperture * @@ -12,6 +31,12 @@ * */ public class X2RSubCrawlerUtil { + + public static class X2RSubCrawlerUtilException extends Exception { + private static final long serialVersionUID = -7442049122744343323L; + public X2RSubCrawlerUtilException(String msg) { super(msg); } + public X2RSubCrawlerUtilException(Throwable e) { super(e); } + } /** * Registers an appropriately-initialized {@link X2RSubCrawler} in the @@ -50,18 +75,107 @@ public static void registerXMLDatatype(SubCrawlerRegistry registry, TikaMimeTypeIdentifier identifier, String mimeTypeString, String mapping, String fileHeader, String rootElementNameSpace, - String rootElementName, String extension) throws MimeTypeException { + String rootElementName, String extension) throws X2RSubCrawlerUtilException { if (mimeTypeString == null) { - throw new NullPointerException("mimeTypeString cannot be null"); + throw new X2RSubCrawlerUtilException("mimeTypeString cannot be null"); } if (mapping == null) { - throw new NullPointerException("mapping cannot be null"); + throw new X2RSubCrawlerUtilException("mapping cannot be null"); } registry.add(new X2RSubCrawlerFactory(mimeTypeString, mapping)); + try { identifier.addNewDefinition(mimeTypeString, fileHeader, rootElementNameSpace, rootElementName, extension); + } catch (MimeTypeException e) { + throw new X2RSubCrawlerUtilException(e); + } } + + public static void registerXMLDatatype(SubCrawlerRegistry registry, + TikaMimeTypeIdentifier identifier, String mapping) + throws X2RSubCrawlerUtilException { + + Model model = RDF2Go.getModelFactory().createModel().open(); + try { + try { + model.readFrom(new StringReader(mapping),Syntax.Turtle); + } catch (Exception e1) { + throw new X2RSubCrawlerUtilException(e1); + } + + Resource mappingResource = findMappingResource(model); + String mimeTypeString = findPropertyValue(model, mappingResource, AX.MIMETYPE); + String rootElementName = findPropertyValue(model, mappingResource, AX.ROOTELEMENTNAME); + String rootElementNameSpace = findPropertyValue(model, mappingResource, AX.ROOTELEMENTNS); + + if (mimeTypeString != null && registry != null) { + registry.add(new X2RSubCrawlerFactory(mimeTypeString, mapping)); + } + + if (mimeTypeString != null && identifier != null) { + try { + identifier.addNewDefinition( + mimeTypeString, + "<" + rootElementName, + rootElementNameSpace, + rootElementName, + null); + } catch (MimeTypeException e) { + throw new X2RSubCrawlerUtilException(e); + } + } + } finally { + model.close(); + } + } + + /** + * Returns the value of the property. There can be at most once value + * and it has to be a literal. + * + * @param model + * @param subject + * @param predicate + * @return the value of the property + * @throws X2RSubCrawlerUtilException if there is more than one value + * or the value is not a literal + */ + private static String findPropertyValue(Model model, Resource subject, + URI predicate) throws X2RSubCrawlerUtilException { + String result = null; + ClosableIterator<Statement> iter = + model.findStatements(subject, predicate, Variable.ANY); + if (iter.hasNext()) { + Node st = iter.next().getObject(); + if (iter.hasNext()) { + throw new X2RSubCrawlerUtilException("There can be at most one " + + "value for the " + predicate + " predicate" ); + } + if (!(st instanceof Literal)) { + throw new X2RSubCrawlerUtilException("The value of the " + predicate + + " predicate must be a literal"); + } + result = st.toString(); + } + return result; + } + + private static Resource findMappingResource(Model model) throws X2RSubCrawlerUtilException { + ClosableIterator<Statement> iter = + model.findStatements(Variable.ANY, RDF.type, model.createURI(XML2R.MAPPING.toString())); + Resource result = null; + if (iter.hasNext()) { + Statement st = iter.next(); + result = st.getSubject(); + if (iter.hasNext()) { + throw new X2RSubCrawlerUtilException("Mapping string contains more than one mapping"); + } + } else { + throw new X2RSubCrawlerUtilException("No mapping found in the mapping string"); + } + return result; + } } Modified: aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl =================================================================== --- aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl 2011-06-29 13:56:16 UTC (rev 2507) +++ aperture-addons/trunk/src/main/resources/org/semanticdesktop/aperture/x2r/wikipedia-mapping.ttl 2011-07-11 22:18:49 UTC (rev 2508) @@ -1,4 +1,5 @@ @prefix xml2r: <http://fivo.cyf-kr.edu.pl/trac/wiki/X2R/xml/mapping#> . +@prefix ax: <http://aperture.sourceforge.net/2011/07/x2rsubcrawler#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> . @@ -13,7 +14,10 @@ xml2r:namespaceDefinition [ xml2r:namespacePrefix "mw" ; xml2r:namespaceUri "http://www.mediawiki.org/xml/export-0.4/" - ] . + ] ; + ax:mimeType "application/x-mediawiki-xml-export" ; + ax:rootElementName "mediawiki" ; + ax:rootElementNameSpace "http://www.mediawiki.org/xml/export-0.4/" . :publicationMap a xml2r:ClassMap ; xml2r:belongsToMapping :wikipediaMapping ; Modified: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java 2011-06-29 13:56:16 UTC (rev 2507) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/WikipediaSubCrawlerTest.java 2011-07-11 22:18:49 UTC (rev 2508) @@ -1,24 +1,23 @@ package org.semanticdesktop.aperture.x2r; +import info.aduna.io.IOUtil; + import java.io.InputStream; import org.ontoware.rdf2go.model.Model; -import org.ontoware.rdf2go.model.node.URI; import org.ontoware.rdf2go.model.node.impl.URIImpl; import org.semanticdesktop.aperture.rdf.RDFContainer; import org.semanticdesktop.aperture.rdf.impl.RDFContainerImpl; import org.semanticdesktop.aperture.subcrawler.SubCrawler; -import org.semanticdesktop.aperture.subcrawler.SubCrawlerFactory; -import org.semanticdesktop.aperture.subcrawler.mime.MimeSubCrawlerFactory; import org.semanticdesktop.aperture.test.subcrawler.SubCrawlerTestBase; import org.semanticdesktop.aperture.test.subcrawler.TestBasicSubCrawlerHandler; -import org.semanticdesktop.aperture.vocabulary.NIE; -import org.semanticdesktop.aperture.vocabulary.NMO; public class WikipediaSubCrawlerTest extends SubCrawlerTestBase { public void testWikipediaExtraction() throws Exception { - SubCrawler subCrawler = new WikipediaSubCrawler(); + SubCrawler subCrawler = new X2RSubCrawlerFactory( + "application/x-mediawiki-xml-export", + IOUtil.readString(getClass().getResourceAsStream("wikipedia-mapping.ttl"))).get(); TestBasicSubCrawlerHandler handler = new TestBasicSubCrawlerHandler(); subCrawl("ko-wiki.xml", subCrawler, handler); Model model = handler.getModel(); Modified: aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java =================================================================== --- aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java 2011-06-29 13:56:16 UTC (rev 2507) +++ aperture-addons/trunk/src/test/java/org/semanticdesktop/aperture/x2r/X2RSubCrawlerUtilTest.java 2011-07-11 22:18:49 UTC (rev 2508) @@ -32,7 +32,7 @@ public class X2RSubCrawlerUtilTest extends ApertureTestBase { - public void testWikipedia() throws MimeTypeException, IOException { + public void testWikipedia() throws Exception { TikaMimeTypeIdentifier id = new TikaMimeTypeIdentifier(); SubCrawlerRegistry reg = new DefaultSubCrawlerRegistry(); ExtractorRegistry exReg = new DefaultExtractorRegistry(); @@ -48,6 +48,25 @@ "mediawiki", null); + performWikipediaTest(id, reg, exReg); + } + + public void testWikipediaOneFileSolution() throws Exception { + TikaMimeTypeIdentifier id = new TikaMimeTypeIdentifier(); + SubCrawlerRegistry reg = new DefaultSubCrawlerRegistry(); + ExtractorRegistry exReg = new DefaultExtractorRegistry(); + + X2RSubCrawlerUtil.registerXMLDatatype( + reg, + id, + IOUtil.readString(getClass().getResourceAsStream( + "wikipedia-mapping.ttl"))); + + performWikipediaTest(id, reg, exReg); + } + + private void performWikipediaTest(TikaMimeTypeIdentifier id, SubCrawlerRegistry reg, + ExtractorRegistry exReg) throws IOException { TestIncrementalCrawlerHandler hndlr = new TestIncrementalCrawlerHandler(reg, exReg, id, false); @@ -91,7 +110,6 @@ String content2 = st2.getObject().toString(); assertTrue(content1.contains("Jimmy Carter") || content2.contains("Jimmy Carter")); - } } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |