#61 Exception while processing ZIP entry

closed-fixed
None
5
2009-06-04
2008-10-09
No

I encountered the following exception while crawling a ZIP file:

Exception while processing stream of file:/D:/Work/Document%20Collections.zip
org.semanticdesktop.aperture.subcrawler.SubCrawlerException: java.lang.IllegalArgumentException
at org.semanticdesktop.aperture.subcrawler.base.AbstractArchiverSubCrawler.subCrawl(AbstractArchiverSubCrawler.java:128)
at org.semanticdesktop.aperture.crawler.base.CrawlerBase.runSubCrawler(CrawlerBase.java:460)
at info.aduna.autofocus.crawling.extract.ExtractionUtil.extract(ExtractionUtil.java:144)
at info.aduna.autofocus.crawling.extract.ExtractionUtil.extract(ExtractionUtil.java:66)
at info.aduna.autofocus.crawling.CrawlResultProcessor.interpret(CrawlResultProcessor.java:297)
at info.aduna.autofocus.crawling.CrawlResultProcessor.process(CrawlResultProcessor.java:268)
at info.aduna.autofocus.crawling.CrawlResultProcessor.objectNew(CrawlResultProcessor.java:184)
at org.semanticdesktop.aperture.crawler.base.CrawlerBase.reportNewDataObject(CrawlerBase.java:373)
at org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlSingleFile(FileSystemCrawler.java:318)
at org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlFileTree(FileSystemCrawler.java:186)
at org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.access$200(FileSystemCrawler.java:33)
at org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler$CrawlerFileFilter.accept(FileSystemCrawler.java:357)
at java.io.File.listFiles(Unknown Source)
at org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.filterThroughFolderContent(FileSystemCrawler.java:228)
at org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlFileTree(FileSystemCrawler.java:214)
at org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlObjects(FileSystemCrawler.java:127)
at org.semanticdesktop.aperture.crawler.base.CrawlerBase.crawl(CrawlerBase.java:218)
at info.aduna.autofocus.crawling.CrawlingRepository.crawl(CrawlingRepository.java:516)
at info.aduna.autofocus.crawling.manager.CrawlManager.crawl(CrawlManager.java:369)
at info.aduna.autofocus.crawling.manager.CrawlManager.crawl(CrawlManager.java:340)
at info.aduna.autofocus.gui.AutoFocusFrame$2.run(AutoFocusFrame.java:357)
Caused by: java.lang.IllegalArgumentException
at java.util.zip.ZipInputStream.getUTF8String(Unknown Source)
at java.util.zip.ZipInputStream.readLOC(Unknown Source)
at java.util.zip.ZipInputStream.getNextEntry(Unknown Source)
at org.semanticdesktop.aperture.subcrawler.zip.ZipSubCrawler$ZipSubCrawlerInputStream.getNextEntry(ZipSubCrawler.java:29)
at org.semanticdesktop.aperture.subcrawler.base.AbstractArchiverSubCrawler.subCrawl(AbstractArchiverSubCrawler.java:119)
... 20 common frames omitted
org.semanticdesktop.aperture.subcrawler.SubCrawlerException: java.lang.IllegalArgumentException
org.semanticdesktop.aperture.subcrawler.base.AbstractArchiverSubCrawler.subCrawl(AbstractArchiverSubCrawler.java:128)
org.semanticdesktop.aperture.crawler.base.CrawlerBase.runSubCrawler(CrawlerBase.java:460)
info.aduna.autofocus.crawling.extract.ExtractionUtil.extract(ExtractionUtil.java:144)
info.aduna.autofocus.crawling.extract.ExtractionUtil.extract(ExtractionUtil.java:66)
info.aduna.autofocus.crawling.CrawlResultProcessor.interpret(CrawlResultProcessor.java:297)
info.aduna.autofocus.crawling.CrawlResultProcessor.process(CrawlResultProcessor.java:268)
info.aduna.autofocus.crawling.CrawlResultProcessor.objectNew(CrawlResultProcessor.java:184)
org.semanticdesktop.aperture.crawler.base.CrawlerBase.reportNewDataObject(CrawlerBase.java:373)
org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlSingleFile(FileSystemCrawler.java:318)
org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlFileTree(FileSystemCrawler.java:186)
org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.access$200(FileSystemCrawler.java:33)
org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler$CrawlerFileFilter.accept(FileSystemCrawler.java:357)
java.io.File.listFiles(Unknown Source)
org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.filterThroughFolderContent(FileSystemCrawler.java:228)
org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlFileTree(FileSystemCrawler.java:214)
org.semanticdesktop.aperture.crawler.filesystem.FileSystemCrawler.crawlObjects(FileSystemCrawler.java:127)
org.semanticdesktop.aperture.crawler.base.CrawlerBase.crawl(CrawlerBase.java:218)
info.aduna.autofocus.crawling.CrawlingRepository.crawl(CrawlingRepository.java:516)
info.aduna.autofocus.crawling.manager.CrawlManager.crawl(CrawlManager.java:369)
info.aduna.autofocus.crawling.manager.CrawlManager.crawl(CrawlManager.java:340)
info.aduna.autofocus.gui.AutoFocusFrame$2.run(AutoFocusFrame.java:357)
Caused by: java.lang.IllegalArgumentException
java.util.zip.ZipInputStream.getUTF8String(Unknown Source)
java.util.zip.ZipInputStream.readLOC(Unknown Source)
java.util.zip.ZipInputStream.getNextEntry(Unknown Source)
org.semanticdesktop.aperture.subcrawler.zip.ZipSubCrawler$ZipSubCrawlerInputStream.getNextEntry(ZipSubCrawler.java:29)
org.semanticdesktop.aperture.subcrawler.base.AbstractArchiverSubCrawler.subCrawl(AbstractArchiverSubCrawler.java:119)
20 common frames omitted

I think this reveals a bug in Java's ZIP implementation. Nevertheless, it may be a good idea to wrap the retrieval and processing of a single entry in a try(...)catch(Exception e) in order to recover from such bugs, as well as bugs in the SubCrawlerHandler implementation! Now, it aborts the processing of the rest of the ZIP file.

Discussion

  • Antoni Mylka

    Antoni Mylka - 2008-10-27
    • milestone: 533940 -->
     
  • Antoni Mylka

    Antoni Mylka - 2008-10-27

    due to lack of example files, I coulndn't pinpoint this issue in time for 1.2.0 release. I postpone it.

     
  • Antoni Mylka

    Antoni Mylka - 2009-06-04

    I've updated commons-compress to 1.0 and rewrote the ZipSubCrawler to use it instead of java.util.zip. If commons-compress fails, the subcrawler falls back to juz. This solbed the problem with the document collection Christiaan sent me therefore I close this issue.

     
  • Antoni Mylka

    Antoni Mylka - 2009-06-04
    • status: open --> closed-fixed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks