Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
I noticed that if the PDF is very large then the OSS takes long time to parse
the next content. I reduced the size of the PDF input stream to read but seems
like OSS somewhere tries to download the whole PDF before moving to the next
link. The parser already finishes parsing but it does not start crawling the
Is there a solution? maybe I am missing something.
I suppose you already try to change the value "sizeLimit" in the "parser.xml"
file (and restart OSS):
<parser name="PDF parser" class="com.jaeksoft.searchlib.parser.PdfParser" sizeLimit="8388608">
Yes I did... but still I presume it downloads the whole document. I set the
limit to mere 100000. It complets the parsing but I think it does not crawl
the next link until it finishes the complete download.
will appreciate your hint what I can do.