Activity for Tim Allison

  • Tim Allison Tim Allison modified a comment on discussion Open Discussion

    Got it. Will fix the break -> continue now. Will look into the others. We do something similar with PDF pages...we catch and store the exceptions per page but keep going. I didn't realize this was possible with jackcess/msaccess. Will update. Thank you!

  • Tim Allison Tim Allison posted a comment on discussion Open Discussion

    Got it. Will fix the break -> continue now. Will look into the others. Thank you!

  • Tim Allison Tim Allison posted a comment on discussion Open Discussion

    Y! they do... Given that we only have a coupla hundred files and the vagaries of whatever else was going on on the server, I don't put too much stock in those numbers. But, y, it looks like quite a bit of improvement in speed.

  • Tim Allison Tim Allison posted a comment on discussion Open Discussion

    The results comparing our last release with the upcoming release for msaccess files are here: https://corpora.tika.apache.org/base/reports/tika_1_25_v_1_26_msaccess_reports.tgz I protected against an NPE in our new release: https://github.com/apache/tika/blob/branch_1x/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java#L150 Everything else looks comparable. I'm attaching a list of msaccess files (some truncated). If you want to grab them, prepend: https://corpora.tika.apache.org/base/docs/...

  • Tim Allison Tim Allison posted a comment on discussion Open Discussion

    Whoa! Thank you!

  • Tim Allison Tim Allison modified a comment on discussion Open Discussion

    Y, running untrusted code on untrusted files at scale will do that to one...LOL XML External entity attacks are different from entity expansion attacks (billion laughs).

  • Tim Allison Tim Allison modified a comment on discussion Open Discussion

    Y, running untrusted code on untrusted files at scale will do that to one...LOL XML External entity attacks are different from entity expansion attacks (billion laughs: https://en.wikipedia.org/wiki/Billionlaughs_attack).

  • Tim Allison Tim Allison posted a comment on discussion Open Discussion

    Y, running untrusted code on untrusted files at scale will do that to one... XML External entity attacks are different from entity expansion attacks (billion laughs: https://en.wikipedia.org/wiki/Billionlaughs_attack).

  • Tim Allison Tim Allison posted a comment on discussion Open Discussion

    Thank you! Any chance you'd be willing to add this for backup protection against xxe?

  • Tim Allison Tim Allison posted a comment on discussion Open Discussion

    We tried to upgrade to 4.0.0 in Apache Tika, but we're running into a xerces2 problem... Any recommendations? https://issues.apache.org/jira/browse/TIKA-3244?focusedCommentId=17270655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17270655 com.healthmarketscience.jackcess.crypt.InvalidCryptoConfigurationException: Failed parsing encryption descriptor at org.apache.tika.parser.microsoft.JackcessParserTest.testTilman(JackcessParserTest.java:103) Caused by: java.lang.IllegalArgumentException:...

1