Got it. Will fix the break -> continue now. Will look into the others. We do something similar with PDF pages...we catch and store the exceptions per page but keep going. I didn't realize this was possible with jackcess/msaccess. Will update. Thank you!
Got it. Will fix the break -> continue now. Will look into the others. Thank you!
Y! they do... Given that we only have a coupla hundred files and the vagaries of whatever else was going on on the server, I don't put too much stock in those numbers. But, y, it looks like quite a bit of improvement in speed.
The results comparing our last release with the upcoming release for msaccess files are here: https://corpora.tika.apache.org/base/reports/tika_1_25_v_1_26_msaccess_reports.tgz I protected against an NPE in our new release: https://github.com/apache/tika/blob/branch_1x/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java#L150 Everything else looks comparable. I'm attaching a list of msaccess files (some truncated). If you want to grab them, prepend: https://corpora.tika.apache.org/base/docs/...
Whoa! Thank you!
Y, running untrusted code on untrusted files at scale will do that to one...LOL XML External entity attacks are different from entity expansion attacks (billion laughs).
Y, running untrusted code on untrusted files at scale will do that to one...LOL XML External entity attacks are different from entity expansion attacks (billion laughs: https://en.wikipedia.org/wiki/Billionlaughs_attack).
Y, running untrusted code on untrusted files at scale will do that to one... XML External entity attacks are different from entity expansion attacks (billion laughs: https://en.wikipedia.org/wiki/Billionlaughs_attack).
Thank you! Any chance you'd be willing to add this for backup protection against xxe?
We tried to upgrade to 4.0.0 in Apache Tika, but we're running into a xerces2 problem... Any recommendations? https://issues.apache.org/jira/browse/TIKA-3244?focusedCommentId=17270655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17270655 com.healthmarketscience.jackcess.crypt.InvalidCryptoConfigurationException: Failed parsing encryption descriptor at org.apache.tika.parser.microsoft.JackcessParserTest.testTilman(JackcessParserTest.java:103) Caused by: java.lang.IllegalArgumentException:...