Menu

#75 Hi, I get the following error when I try to extract the text from PDF. I'm sorry but I can't attach the problematic PDF, it has confidential information.

0.1.2.1
closed-fixed
None
5
2015-05-27
2015-05-14
No

Hi, I get the following error when I try to extract the text from PDF.
I'm sorry but I can't attach the problematic PDF, it has confidential information.

EXCEPTION: System.InvalidOperationException: Failed to compare two elements in the array. ---> System.InvalidOperationException: Nullable object must have a value.
   at System.Nullable`1.get_Value()
   at org.pdfclown.tools.TextExtractor.TextStringPositionComparer`1.Compare(T textString1, T textString2)
   at System.Collections.Generic.ArraySortHelper`1.PickPivotAndPartition(T[] keys, Int32 lo, Int32 hi, IComparer`1 comparer)
   at System.Collections.Generic.ArraySortHelper`1.IntroSort(T[] keys, Int32 lo, Int32 hi, Int32 depthLimit, IComparer`1 comparer)
   at System.Collections.Generic.ArraySortHelper`1.IntrospectiveSort(T[] keys, Int32 left, Int32 length, IComparer`1 comparer)
   at System.Collections.Generic.ArraySortHelper`1.Sort(T[] keys, Int32 index, Int32 length, IComparer`1 comparer)
   --- End of inner exception stack trace ---
   at System.Collections.Generic.ArraySortHelper`1.Sort(T[] keys, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Collections.Generic.List`1.Sort(Int32 index, Int32 count, IComparer`1 comparer)
   at org.pdfclown.tools.TextExtractor.Sort(List`1 rawTextStrings, List`1 textStrings)
   at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)

Discussion

  • Stefano Chizzolini

    Rarely, PDF files contain empty show-text operations whose inherently-undefined bounding box was not properly handled; the fix filters out such operations.

    Fixed on 0.1.2-Fix branch (rev 218) and 0.2.0 trunk (rev 219).

    thank you

     
  • Stefano Chizzolini

    • status: open --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB