Y. Sorry. This is a duplicate (I think) of 150. The difference is that with the file in bug 149, we used to get a similar looking exception in version 2.1.12 of Jackcess:
"org.apache.tika.parser.CompositeParser","org.apache.tika.parser.DefaultParser","org.apache.tika.parser.microsoft.JackcessParser"],"X-TIKA:EXCEPTION:runtime":"java.lang.IllegalStateException: invalid page number 2003\n\tat com.healthmarketscience.jackcess.impl.PageChannel.validatePageNumber(PageChannel.java:203)\n\tat com.healthmarketscience.jackcess.impl.PageChannel.readPage(PageChannel.java:214)\n\tat
...
With 2.2.0, we get a similar stacktrace, but it is happening earlier...
java.lang.IllegalStateException: invalid page number 1777\n\tat com.healthmarketscience.jackcess.impl.PageChannel.validatePageNumber(PageChannel.java:203)\n\tat com.healthmarketscience.jackcess.impl.PageChannel.readPage(PageChannel.java:214)\n\tat com.healthmarketscience.jackcess.impl.LongValueColumnImpl.readLongValue(LongValueColumnImpl.java:204)\n\tat com.healthmarketscience.jackcess.impl.LongValueColumnImpl.read(LongValueColumnImpl.java:96)\n\tat com.healthmarketscience.jackcess.impl.ColumnImpl.read(ColumnImpl.java:689)\n\tat com.healthmarketscience.jackcess.impl.TableImpl.getRowColumn(TableImpl.java:847)\n\tat com.healthmarketscience.jackcess.impl.TableImpl.getRow(TableImpl.java:753)\n\tat com.healthmarketscience.jackcess.impl.TableImpl.getRow(TableImpl.java:733)\n\tat com.healthmarketscience.jackcess.impl.CursorImpl.getCurrentRow(CursorImpl.java:699)\n\tat
In bug 150, we didn't get an exception at all in 2.1.12, but we do now. But, yes, looking at the stacktraces, they both point to the validate page number step. Sorry!
Last edit: Tim Allison 2018-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is the output in Tika 1.19.1 vs 1.20-pre-rc1
Clearly this has an exception, but it looks like we were able to extract more before hitting the exception with the earlier version of Jackcess.
Y. Sorry. This is a duplicate (I think) of 150. The difference is that with the file in bug 149, we used to get a similar looking exception in version 2.1.12 of Jackcess:
"org.apache.tika.parser.CompositeParser","org.apache.tika.parser.DefaultParser","org.apache.tika.parser.microsoft.JackcessParser"],"X-TIKA:EXCEPTION:runtime":"java.lang.IllegalStateException: invalid page number 2003\n\tat com.healthmarketscience.jackcess.impl.PageChannel.validatePageNumber(PageChannel.java:203)\n\tat com.healthmarketscience.jackcess.impl.PageChannel.readPage(PageChannel.java:214)\n\tat
...
With 2.2.0, we get a similar stacktrace, but it is happening earlier...
java.lang.IllegalStateException: invalid page number 1777\n\tat com.healthmarketscience.jackcess.impl.PageChannel.validatePageNumber(PageChannel.java:203)\n\tat com.healthmarketscience.jackcess.impl.PageChannel.readPage(PageChannel.java:214)\n\tat com.healthmarketscience.jackcess.impl.LongValueColumnImpl.readLongValue(LongValueColumnImpl.java:204)\n\tat com.healthmarketscience.jackcess.impl.LongValueColumnImpl.read(LongValueColumnImpl.java:96)\n\tat com.healthmarketscience.jackcess.impl.ColumnImpl.read(ColumnImpl.java:689)\n\tat com.healthmarketscience.jackcess.impl.TableImpl.getRowColumn(TableImpl.java:847)\n\tat com.healthmarketscience.jackcess.impl.TableImpl.getRow(TableImpl.java:753)\n\tat com.healthmarketscience.jackcess.impl.TableImpl.getRow(TableImpl.java:733)\n\tat com.healthmarketscience.jackcess.impl.CursorImpl.getCurrentRow(CursorImpl.java:699)\n\tat
In bug 150, we didn't get an exception at all in 2.1.12, but we do now. But, yes, looking at the stacktraces, they both point to the validate page number step. Sorry!
Last edit: Tim Allison 2018-12-17
Duplicate of https://sourceforge.net/p/jackcess/bugs/149/