|
From: stack <st...@ar...> - 2005-11-21 19:06:56
|
I just tried it with the nutchwax 0.4.1 binary and seems to parse pdfs fine. The exception below is a complaint about a missing method and a missing class. Do you build nutchwax yourself Lukáą? If so, did you match the nutch and nutchwax versions? (Nutchwax works with Nutch 0.7). Maybe you've misnamed the extension point class? Compare against my parse-ext plugin.xml below. Do you have your own wrapper script for the nutch process? Maybe you've left off mention of the plugin dirs? (The parse-ext is present in your plugin dir?). St.Ack <?xml version="1.0" encoding="UTF-8"?> <plugin id="parse-ext" name="External Parser Plug-in" version="1.0.0" provider-name="nutch.org"> <extension-point id="org.apache.nutch.parse.Parser" name="Nutch Content Parser"/> <runtime> <library name="parse-ext.jar"> <export name="*"/> </library> </runtime> <extension id="org.apache.nutch.parse.ext" name="ExtParse" point="org.apache.nutch.parse.Parser"> <implementation id="ExtParser" class="org.apache.nutch.parse.ext.ExtParser" contentType="application/pdf" pathSuffix="pdf" command="/home/stack/workspace/archive-access/projects/nutch/target/distributions/nutchwax-0.4.1/bin/parse-pdf.sh" timeout="30"/> </extension> </plugin> Lukáš Matějka wrote: >Hi, > >i changed path in parser-ext to my local pdf-parser > >have any idea? > >adding 103214 bytes of mimetype application/pdf http://www.inforum.cz/infomedia98/pdf/telenor.pdf >051118 163001 SEVERE Error processing /home/nwa/nutchwax/data//queue/NEDLIB--20051116141001-00000.arc.gz >java.lang.NoSuchMethodError: org.apache.nutch.plugin.ExtensionPoint.getExtensions()[Lorg/apache/nutch/plugin/Extensions; > at org.apache.nutch.parse.ext.ExtParser.<clinit>(ExtParser.java:60) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at java.lang.Class.newInstance0(Unknown Source) > at java.lang.Class.newInstance(Unknown Source) > at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:144) > at org.apache.nutch.parse.ParserFactory.getParser(ParserFactory.java:63) > at org.archive.access.nutch.Arc2Segment.addRecord(Arc2Segment.java:243) > at org.archive.access.nutch.Arc2Segment.addArc(Arc2Segment.java:146) > at org.archive.access.nutch.Arc2Segment.main(Arc2Segment.java:326) > > > >051118 163001 adding 80205 bytes of mimetype application/pdf http://full.nkp.cz/nkkr/pdf/0101/nk0101051.pdf >051118 163001 SEVERE Error processing /home/nwa/nutchwax/data//queue/NEDLIB--20051116141314-00000.arc.gz >java.lang.NoClassDefFoundError > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at java.lang.Class.newInstance0(Unknown Source) > at java.lang.Class.newInstance(Unknown Source) > at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:144) > at org.apache.nutch.parse.ParserFactory.getParser(ParserFactory.java:63) > at org.archive.access.nutch.Arc2Segment.addRecord(Arc2Segment.java:243) > at org.archive.access.nutch.Arc2Segment.addArc(Arc2Segment.java:146) > at org.archive.access.nutch.Arc2Segment.main(Arc2Segment.java:326) > >thanks for comments > >lukas > > > >------------------------------------------------------- >This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >Register for a JBoss Training Course. Free Certification Exam >for All Training Attendees Through End of 2005. For more info visit: >http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |