Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Taking long time parsing PDF

Help
baba singh
2010-07-27
2012-09-13
  • baba singh
    baba singh
    2010-07-27

    Hi,

    I noticed that if the PDF is very large then the OSS takes long time to parse
    the next content. I reduced the size of the PDF input stream to read but seems
    like OSS somewhere tries to download the whole PDF before moving to the next
    link. The parser already finishes parsing but it does not start crawling the
    next link.

    Is there a solution? maybe I am missing something.

    regards,

    bbs

     
  • I suppose you already try to change the value "sizeLimit" in the "parser.xml"
    file (and restart OSS):

    <parser name="PDF parser" class="com.jaeksoft.searchlib.parser.PdfParser" sizeLimit="8388608">
    
     
  • baba singh
    baba singh
    2010-07-28

    Yes I did... but still I presume it downloads the whole document. I set the
    limit to mere 100000. It complets the parsing but I think it does not crawl
    the next link until it finishes the complete download.

    will appreciate your hint what I can do.