Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#171 htdig ignore PDF docs when indexing

need info
htdig (103)
Tobias G. Eberle

I have Ht:dig 3.1.5 on Suse Linux 8.1. XPDF is installed
correctly. When building the Ht:dig databases, no pdf
document is indexed. My site www.wanilla.ch has the
following structure

support.html -> support_article.php

The page support_article.php include links to pdf files. It
is nessessary to login first, to access
support_article.php (cutomer area). The login procedure
work with PHP and sessions.

To get access, outside the customer area (login), I put
a href to support_article.php in a public side.

Did you have a idea why indexing ignore mit pdf links.



  • Logged In: NO

    Check in the conf file if the max_doc_size is large enough
    for your file.
    Also run htdig with the -vv option and inspect the output.

  • Lachlan Andrew
    Lachlan Andrew

    Logged In: YES

    Does the rest of the PHP page get parsed? By default,
    ht://Dig parses HTML, but not PHP. If you want to parse
    the PHP document (to get to the PDF document) then you need
    to set up the external parsers to do that.

  • Neal Richter
    Neal Richter

    Logged In: YES

    Any update? Neal

  • Logged In: YES

    i did set up a external parser by the following command in my
    # external parser (standard per script)
    external_parsers: application/pdf-
    >text/html /home/wanillach/public_html/cgi-bin/doc2html.cgi
    # end external parser

    On my local machine I have installed Ht://Dig Version 3.1.6
    and the external parser works correctly for HTML and PHP
    files. My provider use Ht://Dig Version 3.1.5 and the external
    parser is ignored.

    • assigned_to: nobody --> grdetil
    • status: open --> closed-works-for-me
  • Logged In: YES

    If it works in 3.1.6, then it's not a bug, is it?
    Therefore, this isn't the best forum to discuss what appears
    to be a configuration issue, or some customization that your
    provider may have provided to their build of 3.1.5.

    I suggest you take a careful look at your doc2html setup on
    your provider's system to see if there's a problem there.
    For starters, it probably shouldn't be installed in cgi-bin,
    as it's not a CGI program. Also try running doc2html
    manually on that system to see if it produces output.

    If you're stuck, please take it up on the htdig-general
    mailing list, but not until you've searched the archives for
    that list a bit to get some troubleshooting hints that have
    been discussed many times before there.