Hi, a few errors. "No permission to read file" for certain PDFs. I want
DocFetcher to read all my files! How do I fix that?
Also, some PDFs throw an "Unable to read file" error. Not much, maybe nothing.
A lot of DOCs (it appears they are old docs, prob 2 or 3 versions ago), it
says: Unknown file format. I suppose it will still index the file names, which
will help.
Mostly I want to know if there is a way to read all the PDFs.
Also, I know you don't want me to index the entire hard drive, but... I want
to. Just how much is too much?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As far as I know, "No permission to read file" means the creator of the PDF
file has set things up in such a way that you, and therefore DocFetcher, have
permission to view the PDF file, but no permission to extract text or do some
other fancy stuff with it.
As for the other errors with PDFs and DOC files, these are problems only the
authors of the respective PDF and DOC extraction libraries can fix, namely:
Be prepared to submit your problematic PDF files, though, otherwise those guys
won't be able to fix anything.
Indexing the entire hard drive is discouraged because:
System files will clutter up the search results
DocFetcher could crash at the end of the indexing while attempting to add folder watches to a huge number of folders
System files are frequently modified, so if folder watching is turned on, this will cause DocFetcher to update its index at the same rate, thus bringing the machine to its knees.
DocFetcher will take longer to start up because it has to load an internal tree representation of the entire hard drive into memory
Best regards
q:-) <= Tran Nam Quang (project admin)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, a few errors. "No permission to read file" for certain PDFs. I want
DocFetcher to read all my files! How do I fix that?
Also, some PDFs throw an "Unable to read file" error. Not much, maybe nothing.
A lot of DOCs (it appears they are old docs, prob 2 or 3 versions ago), it
says: Unknown file format. I suppose it will still index the file names, which
will help.
Mostly I want to know if there is a way to read all the PDFs.
Also, I know you don't want me to index the entire hard drive, but... I want
to. Just how much is too much?
Hello,
As far as I know, "No permission to read file" means the creator of the PDF
file has set things up in such a way that you, and therefore DocFetcher, have
permission to view the PDF file, but no permission to extract text or do some
other fancy stuff with it.
As for the other errors with PDFs and DOC files, these are problems only the
authors of the respective PDF and DOC extraction libraries can fix, namely:
If you really, really want the PDF problems to get fixed, you could report
them on the PDFBox issue tracker: https://issues.apache.org/jira/browse/PDFBO
X
Be prepared to submit your problematic PDF files, though, otherwise those guys
won't be able to fix anything.
Indexing the entire hard drive is discouraged because:
Best regards
q:-) <= Tran Nam Quang (project admin)