#141 Incorrect HTML parsing ?

resolved
closed-fixed
nobody
htdig (103)
5
2002-11-06
2002-11-06
Anonymous
No

When there is a '<' in a javascript section in a HTML file,
the rest of the file is not properly parsed.

In the following example, the words in body are not
indexed.

----------
<html>
<header>
<script language="JavaScript">
var foo = 0;
for (var i=0; i < 5; i++) {
foo=i;
}
</script>
</header>

<body>
All the following text won't be indexed. The < in the for
loop has been considered as a opening tag ?
</body>
</html>
-----------

I presume that the '<' in the loop statement is
considered as a HTML tag start.

Hope that I've been clear enough and that it is not
related to a configuration setting that I would have
missed.

Regards

Pierre

pierre.siwek@eurocontrol.int

Discussion

  • Gilles Detillieux

    • milestone: --> resolved
    • status: open --> closed-fixed
     
  • Gilles Detillieux

    Logged In: YES
    user_id=149687

    This is a known bug, fixed in 3.2.0b4 snapshots, and it
    will be fixed in the next 3.1.x release. The patch is here:

    ftp://ftp.ccsf.org/htdig-patches/3.1.6/JavaScript.0

    However, the problem could be avoided altogether by
    setting up in-line JavaScript properly, as described in

    http://www.htdig.org/FAQ.html#q4.26

     
  • Gilles Detillieux

    Logged In: YES
    user_id=149687

    This is also discussed in the bug report
    [ 613849 ] Problem parsing JavaScript

    The summary "Incorrect HTML parsing ?"
    is a tad misleading, as htdig parses HTML
    correctly. JavaScript is NOT HTML.

     
  • Nobody/Anonymous

    Logged In: NO

    purpose of html

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks