[posted and mailed]
> Dominique Fourtune wrote:
> I'm using htdig 3.1.6, to parse html pages created by Apache =
> I can't merge pdf files, I get always error message " Deleted no =
> I'm using doc2html.pl, it is OK for .doc files, but not for pdf files
> pdf2html.pl on command line parses pdf files and creates html files
> I found this old post :
| According to Paul COURBIS:=20
| > When I run htmerge, I get a lot of messages :=20
| > Deleted, no excerpt: xxx/http...=20
| > What does it mean ? Why does htmerge suppress so many documents from =
| > database ? As far as I understand english it seems that it means =
| > there's no keyword for these pages, despite the fact that when I =
| > to it there's a lot of text...=20
| The most common causes of this are:=20
| - a noindex directive somewhere in the document=20
| - the document was disallowed by robots.txt=20
| - the server_max_docs limit was reached before this document could be =
| You'd need to correlate the htmerge -v output back to the htdig -v (or =
| output to see which of these conditions occurred.=20
> I think the first reason is the good one (I have no robots), but I =
> help to go further : what is a noindex directive ?
But I'd rather think it's the max_doc_size
One OS to rule them all | Martin Vorlaender | VMS & WNT =
One OS to find them | work: mv@...
One OS to bring them all | =
And in the Darkness bind them.| home: martin@...
Get latest updates about Open Source Projects, Conferences and News.