[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer PageIndex.java,1.10,1.11 package.html,1.7,1.8

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1:/tmp/cvs-serv7966

Modified Files:
	PageIndex.java package.html 
Log Message:
Doco update. Move the lexer from future tense to current.

Index: PageIndex.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/PageIndex.java,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** PageIndex.java	29 Sep 2003 00:00:39 -0000	1.10
--- PageIndex.java	26 Oct 2003 17:58:25 -0000	1.11
***************
*** 39,45 ****

  /**
!  * A sorted array of integers which are the positions of end of line characters.
!  * Maintains a list of integers which are (the positions of the first
!  * characters of each line.
   * To facilitate processing the first element should be maintained at position 0.
   * Facilities to add, remove, search and determine row and column are provided.
--- 39,43 ----

  /**
!  * A sorted array of integers, the positions of the first characters of each line.
   * To facilitate processing the first element should be maintained at position 0.
   * Facilities to add, remove, search and determine row and column are provided.

Index: package.html
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/package.html,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** package.html	22 Sep 2003 02:39:59 -0000	1.7
--- package.html	26 Oct 2003 17:58:25 -0000	1.8
***************
*** 39,44 ****
  </HEAD>
  <BODY>
! The lexer package will eventually be the base level I/O subsystem.
! <EM>It is currently under development.</EM>
  <P>The lexer package is responsible for reading characters from the HTML source
  and identifying the node lexemes. For example, the HTML code below would return
--- 39,43 ----
  </HEAD>
  <BODY>
! The lexer package is the base level I/O subsystem.
  <P>The lexer package is responsible for reading characters from the HTML source
  and identifying the node lexemes. For example, the HTML code below would return
***************
*** 98,110 ****
  <DD><B>Adjacent nodes have no characters between them.</B> The list of nodes forms an
  uninterrupted chain that, by start and end definitions, completely covers the
! characters that were read from the HTML source. Despite this, the nodes are not
! stored in a linked list, but rather an array to ease any editing tasks that may
! be performed.
  <DT>Text Fidelity
! <DD>Besides complete coverage, the <B>nodes do not contain copies of the text</B>,
! but instead simply contain offsets into a single large buffer that contains the
! text read from the HTML source. Even within tags, the attributes list can
! contain whitespace, thus there is no lost whitespace or text formatting
! either outside or within tags. Upper and lower case text is preserved.
  <DT>Line Endings
  <DD><B>End of line characters are just whitespace.</B> There is no distinction
--- 97,108 ----
  <DD><B>Adjacent nodes have no characters between them.</B> The list of nodes forms an
  uninterrupted chain that, by start and end definitions, completely covers the
! characters that were read from the HTML source.
  <DT>Text Fidelity
! <DD>Besides complete coverage, the <B>nodes do not initially contain copies of 
! the text</B>, but instead simply contain offsets into a single large buffer
! that contains the text read from the HTML source. Even within tags, the
! attributes list can contain whitespace, thus there is no lost whitespace or
! text formatting either outside or within tags. Upper and lower case text is
! preserved.
  <DT>Line Endings
  <DD><B>End of line characters are just whitespace.</B> There is no distinction
***************
*** 127,138 ****
  all that's needed for a low level parse of the HTML source. In previous
  implementations, the attributes were parsed on a second scan after the initial
! tag was extracted.
  <DT>Two Jars
  <DD>For elementary operations at the node level, a minimalist jar file containing
  <B>only the lexer and base tag classes</B> is split out from the larger <CODE>htmlparser.jar</CODE>.
  In this way, simple parsing and output is handled with a jar file that is under
! 40 kilobytes, but anything beyond peephole manipulation, i.e. closing tag detection
  and other semantic reasoning will need the full set of scanners, nodes and ancillary
! classes, which now stands at 160 kilobytes.
  </DL>
  </BODY>
--- 125,137 ----
  all that's needed for a low level parse of the HTML source. In previous
  implementations, the attributes were parsed on a second scan after the initial
! tag was extracted. (Actually, for error conditions, the lexer can back up  a
! node to handle missing end tags etc.).
  <DT>Two Jars
  <DD>For elementary operations at the node level, a minimalist jar file containing
  <B>only the lexer and base tag classes</B> is split out from the larger <CODE>htmlparser.jar</CODE>.
  In this way, simple parsing and output is handled with a jar file that is under
! 45 kilobytes, but anything beyond peephole manipulation, i.e. closing tag detection
  and other semantic reasoning will need the full set of scanners, nodes and ancillary
! classes, which now stands at 210 kilobytes.
  </DL>
  </BODY>