Menu

no permission: PDF / unknown file format: DOC

laurel
2011-04-15
2012-09-17
  • laurel

    laurel - 2011-04-15

    Hi, a few errors. "No permission to read file" for certain PDFs. I want
    DocFetcher to read all my files! How do I fix that?

    Also, some PDFs throw an "Unable to read file" error. Not much, maybe nothing.

    A lot of DOCs (it appears they are old docs, prob 2 or 3 versions ago), it
    says: Unknown file format. I suppose it will still index the file names, which
    will help.

    Mostly I want to know if there is a way to read all the PDFs.

    Also, I know you don't want me to index the entire hard drive, but... I want
    to. Just how much is too much?

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2011-04-15

    Hello,

    As far as I know, "No permission to read file" means the creator of the PDF
    file has set things up in such a way that you, and therefore DocFetcher, have
    permission to view the PDF file, but no permission to extract text or do some
    other fancy stuff with it.

    As for the other errors with PDFs and DOC files, these are problems only the
    authors of the respective PDF and DOC extraction libraries can fix, namely:

    If you really, really want the PDF problems to get fixed, you could report
    them on the PDFBox issue tracker: https://issues.apache.org/jira/browse/PDFBO
    X

    Be prepared to submit your problematic PDF files, though, otherwise those guys
    won't be able to fix anything.

    Indexing the entire hard drive is discouraged because:

    • System files will clutter up the search results
    • DocFetcher could crash at the end of the indexing while attempting to add folder watches to a huge number of folders
    • System files are frequently modified, so if folder watching is turned on, this will cause DocFetcher to update its index at the same rate, thus bringing the machine to its knees.
    • DocFetcher will take longer to start up because it has to load an internal tree representation of the entire hard drive into memory

    Best regards

    q:-) <= Tran Nam Quang (project admin)

     

Log in to post a comment.