[Ikvm-developers] Tika/IKVM Crashing after Assembly.GetExportedTypes()
Brought to you by:
jfrijters
|
From: Trevor W. <tw...@da...> - 2012-02-29 17:11:47
|
Hello!
I've been trying to use Tika via IKVM to extract the contents of text
files. With some help from this mailing list (thanks guys!) i've got it
reading a MS Word (doc) file renamed to something odd (and that was the
goal of using Tika over IFilters)
Our project includes the ability to add plug-ins (that we write) to
process files that aren't handled by IFilters or Tika. These plugins are
loaded during run-time. We use the Assembly.GetExportedTypes to make
sure that the DLLs that we loaded are valid plugins. However, after
calling asm.GetExportedTypes() Tika/IKVM no longer works and crashes
with an odd exception.
We're using code found online called TikaOnDotNet to use Tika.
The code (C# / .NET 4.0) is as follows
---------------------------------------------------------------------------------------------------------
// Create and test Tika extractor
TikaOnDotNet.TextExtractor _cut = new TikaOnDotNet.TextExtractor();
TikaOnDotNet.TextExtractionResult result =
_cut.Extract(@"D:\Work\NamedWrong\What you need for
distribution_was_doc.qrt");
// Works here
// Works here
System.Reflection.Assembly asm =
System.Reflection.Assembly.LoadFrom(file.FullName);
// Works here
foreach (Type t in asm.GetExportedTypes())
// Calling asm.GetExportedTypes() breaks Tika
---------------------------------------------------------------------------------------------------------
In the TextExtractor.cs file from TikaOnDotNet, the crash occurs when
trying to load an AutoDetectParser (which when stepping through loads
the ClassLoader from the MyClassLoader.cs class)
---------------------------------------------------------------------------------------------------------
var parser = new AutoDetectParser(); // Crashes on this line
---------------------------------------------------------------------------------------------------------
The error is as follows
---------------------------------------------------------------------------------------------------------
FactoryConfigurationError was unhandled
{"Provider ���\0\0\0�\0\0\0)System.Resources.ResourceReader,
mscorlibsSystem.Resources.RuntimeResourceSet, mscorlib,
Version=1.0.5000.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089\0\0\0\0\0\0\0\0\0]System.Byte[],
mscorlib, Version=1.0.5000.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089PADP�nY\0\0\0\0\0-\0\0l\0z\0\0\0\0\0\0\0\0\0\0����\0\0\0\0\0\0\0\0\0\0)\0\0\0g��q��
not found"}
at javax.xml.parsers.DocumentBuilderFactory.newInstance()
at org.apache.tika.mime.MimeTypesReader.read(InputStream )
at org.apache.tika.mime.MimeTypesFactory.create(InputStream inputStream)
at org.apache.tika.mime.MimeTypesFactory.create(URL url)
at org.apache.tika.mime.MimeTypesFactory.create(String filePath)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes()
at org.apache.tika.config.TikaConfig..ctor(CompositeParser )
at org.apache.tika.config.TikaConfig..ctor()
at org.apache.tika.config.TikaConfig.getDefaultConfig()
at org.apache.tika.parser.AutoDetectParser..ctor()
at TikaOnDotNet.TextExtractor.Extract(String filePath) in
C:\<project>\Tika\TextExtractor.cs:line 43
---------------------------------------------------------------------------------------------------------
Any assistance would be greatly appreciated.
Trevor Watson
|