Re: [Htmlparser-developer] Page.getLine() seems broken.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

It's zero based, unlike the usual text editor counting.

Matthew Buckett wrote:

> Page.getLine always seems to return the previous line. Attached are 
> some  tests that show this. It seems that the documentation on 
> PageIndex says it should be the index the the first character of the 
> line but it is actually set as being the position of the newline.
>
> I've attached a fix to Page.getLine() that makes it work but I don't 
> know if the correct fix change PageIndex so that the index of the 
> start of the line is put in it instead.
>
>------------------------------------------------------------------------
>
>Index: Page.java
>===================================================================
>RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
>retrieving revision 1.51
>diff -u -r1.51 Page.java
>--- Page.java	20 Jun 2005 01:56:32 -0000	1.51
>+++ Page.java	28 Sep 2005 16:16:14 -0000
>@@ -1106,12 +1106,12 @@
>         size = mIndex.size ();
>         if (line < size)
>         {
>-            start = mIndex.elementAt (line);
>-            line++;
>-            if (line <= size)
>-                end = mIndex.elementAt (line);
>+            end = mIndex.elementAt (line);
>+            line--;
>+            if (line >= 0)
>+                start = mIndex.elementAt (line);
>             else
>-                end = mSource.offset ();
>+                start = 0;
>         }
>         else // current line
>         {
>  
>
>------------------------------------------------------------------------
>
>/* ======================================================================
>The Bodington System Software License, Version 1.0
> 
>Copyright (c) 2001 The University of Leeds.  All rights reserved.
> 
>Redistribution and use in source and binary forms, with or without
>modification, are permitted provided that the following conditions are
>met:
> 
>1.  Redistributions of source code must retain the above copyright notice,
>this list of conditions and the following disclaimer.
> 
>2.  Redistributions in binary form must reproduce the above copyright
>notice, this list of conditions and the following disclaimer in the
>documentation and/or other materials provided with the distribution.
> 
>3.  The end-user documentation included with the redistribution, if any,
>must include the following acknowledgement:  "This product includes
>software developed by the University of Leeds
>(http://www.bodington.org/)."  Alternately, this acknowledgement may
>appear in the software itself, if and wherever such third-party
>acknowledgements normally appear.
> 
>4.  The names "Bodington", "Nathan Bodington", "Bodington System",
>"Bodington Open Source Project", and "The University of Leeds" must not be
>used to endorse or promote products derived from this software without
>prior written permission. For written permission, please contact
>d.g...@le....
> 
>5.  The name "Bodington" may not appear in the name of products derived
>from this software without prior written permission of the University of
>Leeds.
> 
>THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
>WARRANTIES, INCLUDING, BUT NOT LIMITED TO,  TITLE,  THE IMPLIED WARRANTIES
>OF QUALITY  AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
>EVENT SHALL THE UNIVERSITY OF LEEDS OR ITS CONTRIBUTORS BE LIABLE FOR
>ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
>GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
>STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
>ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>POSSIBILITY OF SUCH DAMAGE.
>=========================================================
> 
>This software was originally created by the University of Leeds and may contain voluntary
>contributions from others.  For more information on the Bodington Open Source Project, please
>see http://bodington.org/
> 
>====================================================================== */
>
>package org.htmlparser.tests;
>
>import junit.framework.TestCase;
>
>import org.htmlparser.Node;
>import org.htmlparser.Parser;
>import org.htmlparser.filters.TagNameFilter;
>import org.htmlparser.util.NodeList;
>import org.htmlparser.util.ParserException;
>
>public class LineTests extends TestCase
>{
>    public void testGetLine1() throws ParserException   {
>        Parser parser = getParser();
>        NodeList list = parser.parse(new TagNameFilter("h1"));
>        Node node = list.elementAt(0);
>        assertEquals("<h1>Line 1</h1>\n", node.getPage().getLine(
>            node.getStartPosition()));
>    }
>    
>    public void testGetLine2() throws ParserException   {
>        Parser parser = getParser();
>        NodeList list = parser.parse(new TagNameFilter("h2"));
>        Node node = list.elementAt(0);
>        assertEquals("<h2>Line 2</h2>\n", node.getPage().getLine(
>            node.getStartPosition()));
>    }
>    
>    public void testGetLine3() throws ParserException   {
>        Parser parser = getParser();
>        NodeList list = parser.parse(new TagNameFilter("h3"));
>        Node node = list.elementAt(0);
>        assertEquals("<h3>Line 3</h3>\n", node.getPage().getLine(
>            node.getStartPosition()));
>    }
>    
>    public Parser getParser()
>    {
>        Parser parser = new Parser();
>        try
>        {
>            parser.setInputHTML(
>                "<h1>Line 1</h1>\n"+
>                "<h2>Line 2</h2>\n"+
>                "<h3>Line 3</h3>\n"
>                );
>        }
>        catch (ParserException e)
>        {
>            fail("Failed to parse");
>        }
>        return parser;
>    }
>}
>  
>