#103 parse_doc.pl breaks accented words

resolved
closed-fixed
other (29)
5
2002-01-28
2002-01-27
Anonymous
No

A number of people have complained about parse_doc.pl
breaking words wherever accented characters occur. The
reason for this is the lack of a

use locale;

directive in parse_doc.pl. When one inserts this
directive (just after the #! line for example),
parse_doc.pl functions as it should.

Discussion

    • assigned_to: nobody --> grdetil
    • status: open --> closed-fixed
     
  • Logged In: YES
    user_id=149687

    Geoff added this directive to the parse_doc.pl script that will be bundled with 3.1.6, but for crying out loud
    this script has been obsolete since 3.1.4 was released over two years ago. Why are so many users still
    wasting their time with this old hack instead of switching to conv_doc.pl or doc2html.pl?

     
  • Logged In: NO

    > but for crying out loud this script has been obsolete
    > since 3.1.4 was released over two years ago. Why are
    > so many users still wasting their time with this old
    > hack instead of switching to conv_doc.pl or doc2html.pl?

    I can think of a number of reasons why parse_doc.pl is still
    in use.

    0. It is in the distribution.

    1. It works (in most cases).

    2. There is no indication in the documentation that
    parse_doc.pl is depreciated.

     
  • Logged In: YES
    user_id=149687

    Fair enough. In the 3.1.6 release, parse_doc.pl will still
    be included, just because it's likely still the only
    half-decent example of how to use the external parser
    interface in htdig, but its comments now clearly state that
    external converters are preferable. There are now at least 3
    mentions in the docs that parse_doc.pl is deprecated, and
    I've rewritten FAQ 4.8 & 4.9 to emphasize doc2html.pl over
    parse_doc.pl.