htmlparser-cvs Mailing List for HTML Parser (Page 37)

Brought to you by: derrickoswald

htmlparser-cvs — syncmail email notification of CVS commits

You can subscribe to this list here.

2003	_Jan	_Feb	_Mar	_Apr	_May (141)	_Jun (108)	_Jul (66)	_Aug (127)	_Sep (155)	_Oct (149)	_Nov (72)	_Dec (72)
2004	_Jan (100)	_Feb (36)	_Mar (21)	_Apr (3)	_May (87)	_Jun (28)	_Jul (84)	_Aug (5)	_Sep (14)	_Oct	_Nov	_Dec
2005	_Jan (1)	_Feb (39)	_Mar (26)	_Apr (38)	_May (14)	_Jun (10)	_Jul	_Aug	_Sep (13)	_Oct (8)	_Nov (10)	_Dec
2006	_Jan	_Feb (1)	_Mar (17)	_Apr (20)	_May (28)	_Jun (24)	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2015	_Jan	_Feb	_Mar (1)	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 35 36 37 38 39 .. 61 > >> (Page 37 of 61)

[Htmlparser-cvs] htmlparser/src/org/htmlparser AbstractNode.java,1.15,1.16 Node.java,1.40,1.41

From: <der...@us...> - 2003-10-05 13:50:17

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1:/tmp/cvs-serv9618

Modified Files:
	AbstractNode.java Node.java 
Log Message:
Add bean like accessors for positions on Node, AbstractNode and AbstractNodeDecorator.
Handle null page in Cursor.
Add smartquotes mode in Lexer and CompositeTagScannerHelper.
Add simple name constructor in Attribute.
Remove emptyxmltag member, replace with computing accessors in TagNode.
Removed ScriptScannerHelper and moved scanning logic to ScriptScanner.
Reworked extractImageLocn in ImageScanner
Implement extractXMLData in TagScanner.
Made virtual tags zero length in TagData.
Added push() to IteratorImpl.
Added single node constructor to NodeList.
Numerous and various test adjustments. Still 133 failures.



Index: AbstractNode.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/AbstractNode.java,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** AbstractNode.java	28 Sep 2003 15:33:57 -0000	1.15
--- AbstractNode.java	5 Oct 2003 13:49:40 -0000	1.16
***************
*** 176,179 ****
--- 176,215 ----
      }
  
+     /**
+      * Gets the starting position of the node.
+      * @return The start position.
+      */
+     public int getStartPosition ()
+     {
+         return (nodeBegin);
+     }
+ 
+     /**
+      * Sets the starting position of the node.
+      * @param position The new start position.
+      */
+     public void setStartPosition (int position)
+     {
+         nodeBegin = position;
+     }
+ 
+     /**
+      * Gets the ending position of the node.
+      * @return The end position.
+      */
+     public int getEndPosition ()
+     {
+         return (nodeEnd);
+     }
+ 
+     /**
+      * Sets the ending position of the node.
+      * @param position The new end position.
+      */
+     public void setEndPosition (int position)
+     {
+         nodeEnd = position;
+     }
+ 
      public abstract void accept(Object visitor);
  

Index: Node.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v
retrieving revision 1.40
retrieving revision 1.41
diff -C2 -d -r1.40 -r1.41
*** Node.java	22 Sep 2003 02:39:58 -0000	1.40
--- Node.java	5 Oct 2003 13:49:40 -0000	1.41
***************
*** 120,129 ****
--- 120,156 ----
      /**
       * Returns the beginning position of the tag.
+      * <br>deprecated Use {@link #getEndPosition}
       */
      public abstract int elementBegin();
+ 
      /**
       * Returns the ending position fo the tag
+      * <br>deprecated Use {@link #getEndPosition}
       */
      public abstract int elementEnd();
+ 
+     /**
+      * Gets the starting position of the node.
+      * @return The start position.
+      */
+     public abstract int getStartPosition ();
+ 
+     /**
+      * Sets the starting position of the node.
+      * @param position The new start position.
+      */
+     public abstract void setStartPosition (int position);
+ 
+     /**
+      * Gets the ending position of the node.
+      * @return The end position.
+      */
+     public abstract int getEndPosition ();
+ 
+     /**
+      * Sets the ending position of the node.
+      * @param position The new end position.
+      */
+     public abstract void setEndPosition (int position);
  
      public abstract void accept(Object visitor);

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/scannersTests AppletScannerTest.java,1.27,1.28 FormScannerTest.java,1.33,1.34 FrameScannerTest.java,1.27,1.28 ImageScannerTest.java,1.32,1.33 JspScannerTest.java,1.28,1.29 LabelScannerTest.java,1.36,1.37 LinkScannerTest.java,1.38,1.39 MetaTagScannerTest.java,1.29,1.30 OptionTagScannerTest.java,1.29,1.30 ScriptScannerTest.java,1.40,1.41 TagScannerTest.java,1.30,1.31 TitleScannerTest.java,1.28,1.29

From: <der...@us...> - 2003-10-05 13:50:00

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests
In directory sc8-pr-cvs1:/tmp/cvs-serv9618/tests/scannersTests

Modified Files:
	AppletScannerTest.java FormScannerTest.java 
	FrameScannerTest.java ImageScannerTest.java 
	JspScannerTest.java LabelScannerTest.java LinkScannerTest.java 
	MetaTagScannerTest.java OptionTagScannerTest.java 
	ScriptScannerTest.java TagScannerTest.java 
	TitleScannerTest.java 
Log Message:
Add bean like accessors for positions on Node, AbstractNode and AbstractNodeDecorator.
Handle null page in Cursor.
Add smartquotes mode in Lexer and CompositeTagScannerHelper.
Add simple name constructor in Attribute.
Remove emptyxmltag member, replace with computing accessors in TagNode.
Removed ScriptScannerHelper and moved scanning logic to ScriptScanner.
Reworked extractImageLocn in ImageScanner
Implement extractXMLData in TagScanner.
Made virtual tags zero length in TagData.
Added push() to IteratorImpl.
Added single node constructor to NodeList.
Numerous and various test adjustments. Still 133 failures.



Index: AppletScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/AppletScannerTest.java,v
retrieving revision 1.27
retrieving revision 1.28
diff -C2 -d -r1.27 -r1.28
*** AppletScannerTest.java	22 Sep 2003 02:40:08 -0000	1.27
--- AppletScannerTest.java	5 Oct 2003 13:49:54 -0000	1.28
***************
*** 61,66 ****
          }
          testHTML+=
!             "</APPLET>\n"+
!             "</HTML>";
          createParser(testHTML);
  
--- 61,65 ----
          }
          testHTML+=
!             "</APPLET></HTML>";
          createParser(testHTML);
  

Index: FormScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/FormScannerTest.java,v
retrieving revision 1.33
retrieving revision 1.34
diff -C2 -d -r1.33 -r1.34
*** FormScannerTest.java	22 Sep 2003 02:40:09 -0000	1.33
--- FormScannerTest.java	5 Oct 2003 13:49:54 -0000	1.34
***************
*** 149,153 ****
          parser.addScanner(new FormScanner("",parser));
  
!         parseAndAssertNodeCount(2);
      }
      /**
--- 149,153 ----
          parser.addScanner(new FormScanner("",parser));
  
!         parseAndAssertNodeCount(3);
      }
      /**
***************
*** 263,272 ****
          parser.addScanner(new FormScanner("",parser));
          parser.addScanner(new LinkScanner());
!         parseAndAssertNodeCount(6);
!         assertTrue("Fifth Node is a link",node[4] instanceof LinkTag);
!         LinkTag linkTag = (LinkTag)node[4];
!         assertEquals("Link Text","Yahoo!\r\n",linkTag.getLinkText());
          assertEquals("Link URL","http://www.yahoo.com",linkTag.getLink());
!         assertType("Sixth Node",FormTag.class,node[5]);
      }
  
--- 263,272 ----
          parser.addScanner(new FormScanner("",parser));
          parser.addScanner(new LinkScanner());
!         parseAndAssertNodeCount(8);
!         assertTrue("Seventh Node is a link",node[6] instanceof LinkTag);
!         LinkTag linkTag = (LinkTag)node[6];
!         assertEquals("Link Text","Yahoo!\n",linkTag.getLinkText());
          assertEquals("Link URL","http://www.yahoo.com",linkTag.getLink());
!         assertType("Eigth Node",FormTag.class,node[7]);
      }
  

Index: FrameScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/FrameScannerTest.java,v
retrieving revision 1.27
retrieving revision 1.28
diff -C2 -d -r1.27 -r1.28
*** FrameScannerTest.java	22 Sep 2003 02:40:09 -0000	1.27
--- FrameScannerTest.java	5 Oct 2003 13:49:54 -0000	1.28
***************
*** 49,59 ****
          parser.addScanner(new FrameScanner(""));
  
!         parseAndAssertNodeCount(4);
  
-         assertTrue("Node 1 should be Frame Tag",node[1] instanceof FrameTag);
          assertTrue("Node 2 should be Frame Tag",node[2] instanceof FrameTag);
  
!         FrameTag frameTag1 = (FrameTag)node[1];
!         FrameTag frameTag2 = (FrameTag)node[2];
          assertEquals("Frame 1 Locn","http://www.google.com/test/demo_bc_top.html",frameTag1.getFrameLocation());
          assertEquals("Frame 1 Name","topFrame",frameTag1.getFrameName());
--- 49,59 ----
          parser.addScanner(new FrameScanner(""));
  
!         parseAndAssertNodeCount(7);
  
          assertTrue("Node 2 should be Frame Tag",node[2] instanceof FrameTag);
+         assertTrue("Node 4 should be Frame Tag",node[4] instanceof FrameTag);
  
!         FrameTag frameTag1 = (FrameTag)node[2];
!         FrameTag frameTag2 = (FrameTag)node[4];
          assertEquals("Frame 1 Locn","http://www.google.com/test/demo_bc_top.html",frameTag1.getFrameLocation());
          assertEquals("Frame 1 Name","topFrame",frameTag1.getFrameName());

Index: ImageScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ImageScannerTest.java,v
retrieving revision 1.32
retrieving revision 1.33
diff -C2 -d -r1.32 -r1.33
*** ImageScannerTest.java	28 Sep 2003 15:33:59 -0000	1.32
--- ImageScannerTest.java	5 Oct 2003 13:49:54 -0000	1.33
***************
*** 73,81 ****
      public void testExtractImageLocnInvertedCommasBug() throws ParserException
      {
!         fail ("not implemented");
! //        Tag tag = new Tag(new TagData(0,0,"img width=638 height=53 border=0 usemap=\"#m\" src=http://us.a1.yimg.com/us.yimg.com/i/ww/m5v5.gif alt=Yahoo",""));
! //        String url = "c:\\cvs\\html\\binaries\\yahoo.htm";
! //        ImageScanner scanner = new ImageScanner("-i",new LinkProcessor());
! //        assertEquals("Extracted Image Locn","http://us.a1.yimg.com/us.yimg.com/i/ww/m5v5.gif",scanner.extractImageLocn(tag,url));
      }
  
--- 73,84 ----
      public void testExtractImageLocnInvertedCommasBug() throws ParserException
      {
!         String locn = "http://us.a1.yimg.com/us.yimg.com/i/ww/m5v5.gif";
!         createParser ("<img width=638 height=53 border=0 usemap=\"#m\" src=" + locn + " alt=Yahoo>");
!         // Register the image scanner
!         parser.addScanner(new ImageScanner("-i",new LinkProcessor()));
!         parseAndAssertNodeCount(1);
!         assertTrue("Node identified should be HTMLImageTag",node[0] instanceof ImageTag);
!         ImageTag imageTag = (ImageTag)node[0];
!         assertEquals("Expected Image Locn",locn,imageTag.getImageURL());
      }
  

Index: JspScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/JspScannerTest.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** JspScannerTest.java	22 Sep 2003 02:40:10 -0000	1.28
--- JspScannerTest.java	5 Oct 2003 13:49:54 -0000	1.29
***************
*** 53,59 ****
          // Register the Jsp Scanner
          parser.addScanner(new JspScanner("-j"));
!         parseAndAssertNodeCount(4);
!         // The first node should be an HTMLJspTag
!         assertTrue("Third should be an HTMLJspTag",node[2] instanceof JspTag);
          JspTag tag = (JspTag)node[2];
          assertEquals("tag contents","=object",tag.getText());
--- 53,59 ----
          // Register the Jsp Scanner
          parser.addScanner(new JspScanner("-j"));
!         parseAndAssertNodeCount(5);
!         // The first node should be an JspTag
!         assertTrue("Third should be an JspTag",node[2] instanceof JspTag);
          JspTag tag = (JspTag)node[2];
          assertEquals("tag contents","=object",tag.getText());
***************
*** 78,82 ****
              "return value;\n" +
              "}\n" +
!             "%>\n");
          Parser.setLineSeparator("\r\n");
          // Register the Jsp Scanner
--- 78,82 ----
              "return value;\n" +
              "}\n" +
!             "%>");
          Parser.setLineSeparator("\r\n");
          // Register the Jsp Scanner

Index: LabelScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/LabelScannerTest.java,v
retrieving revision 1.36
retrieving revision 1.37
diff -C2 -d -r1.36 -r1.37
*** LabelScannerTest.java	22 Sep 2003 02:40:11 -0000	1.36
--- LabelScannerTest.java	5 Oct 2003 13:49:54 -0000	1.37
***************
*** 141,145 ****
                                      "<LABEL>Mailcity\n</LABEL>"+
                                      "<LABEL>\nIndiatimes\n</LABEL>"+
!                                     "<LABEL>\nRediff\n</LABEL>\n"+
                                      "<LABEL>Cricinfo" +
                                      "<LABEL value=\"Microsoft Passport\">" +
--- 141,145 ----
                                      "<LABEL>Mailcity\n</LABEL>"+
                                      "<LABEL>\nIndiatimes\n</LABEL>"+
!                                     "<LABEL>\nRediff\n</LABEL>"+
                                      "<LABEL>Cricinfo" +
                                      "<LABEL value=\"Microsoft Passport\">" +

Index: LinkScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/LinkScannerTest.java,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** LinkScannerTest.java	28 Sep 2003 15:33:59 -0000	1.38
--- LinkScannerTest.java	5 Oct 2003 13:49:54 -0000	1.39
***************
*** 69,73 ****
          );
          parser.registerScanners();
!         parseAndAssertNodeCount(6);
          // The first node should be a Tag
          assertTrue("First node should be a Tag",node[0] instanceof Tag);
--- 69,73 ----
          );
          parser.registerScanners();
!         parseAndAssertNodeCount(5);
          // The first node should be a Tag
          assertTrue("First node should be a Tag",node[0] instanceof Tag);
***************
*** 398,404 ****
          // Register the image scanner
          parser.registerScanners();
!         parseAndAssertNodeCount(7);
!         assertTrue("Node 4 should be a link tag",node[4] instanceof LinkTag);
!         LinkTag linkTag = (LinkTag)node[4];
          assertEquals("Resolved Link","http://www.abc.com/home.cfm",linkTag.getLink());
          assertEquals("Resolved Link Text","Home",linkTag.getLinkText());
--- 398,404 ----
          // Register the image scanner
          parser.registerScanners();
!         parseAndAssertNodeCount(11);
!         assertTrue("Node 4 should be a link tag",node[6] instanceof LinkTag);
!         LinkTag linkTag = (LinkTag)node[6];
          assertEquals("Resolved Link","http://www.abc.com/home.cfm",linkTag.getLink());
          assertEquals("Resolved Link Text","Home",linkTag.getLinkText());

Index: MetaTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/MetaTagScannerTest.java,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** MetaTagScannerTest.java	28 Sep 2003 15:33:59 -0000	1.29
--- MetaTagScannerTest.java	5 Oct 2003 13:49:54 -0000	1.30
***************
*** 55,89 ****
          parser.addScanner(scanner);
  
!         parseAndAssertNodeCount(11);
!         assertTrue("Node 5 should be End Tag",node[5] instanceof Tag && ((Tag)node[5]).isEndTag ());
!         assertTrue("Node 6 should be META Tag",node[6] instanceof MetaTag);
          MetaTag metaTag;
!         metaTag = (MetaTag) node[6];
!         assertEquals("Meta Tag 6 Name","description",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 6 Contents","Protecting the internet community through technology, not legislation.  SpamCop eliminates spam.  Automatically file spam reports with the network administrators who can stop spam at the source.  Subscribe, and filter your email through powerful statistical analysis before it reaches your inbox.",metaTag.getMetaContent());
  
!         assertTrue("Node 7 should be META Tag",node[7] instanceof MetaTag);
!         assertTrue("Node 8 should be META Tag",node[8] instanceof MetaTag);
!         assertTrue("Node 9 should be META Tag",node[9] instanceof MetaTag);
  
!         metaTag = (MetaTag) node[7];
!         assertEquals("Meta Tag 7 Name","keywords",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 7 Contents","SpamCop spam cop email filter abuse header headers parse parser utility script net net-abuse filter mail program system trace traceroute dns",metaTag.getMetaContent());
!         assertNull("Meta Tag 7 Http-Equiv",metaTag.getHttpEquiv());
  
!         metaTag = (MetaTag) node[8];
!         assertEquals("Meta Tag 8 Name","language",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 8 Contents","en",metaTag.getMetaContent());
!         assertNull("Meta Tag 8 Http-Equiv",metaTag.getHttpEquiv());
  
!         metaTag = (MetaTag) node[9];
!         assertEquals("Meta Tag 9 Name","owner",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 9 Contents","se...@ad...",metaTag.getMetaContent());
!         assertNull("Meta Tag 9 Http-Equiv",metaTag.getHttpEquiv());
  
!         metaTag = (MetaTag) node[10];
!         assertNull("Meta Tag 10 Name",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 10 Contents","text/html; charset=ISO-8859-1",metaTag.getMetaContent());
!         assertEquals("Meta Tag 10 Http-Equiv","content-type",metaTag.getHttpEquiv());
  
          assertEquals("This Scanner",scanner,metaTag.getThisScanner());
--- 55,90 ----
          parser.addScanner(scanner);
  
!         parseAndAssertNodeCount(18);
!         assertTrue("Node 7 should be End Tag",node[7] instanceof Tag && ((Tag)node[7]).isEndTag ());
!         assertTrue("Node 9 should be META Tag",node[9] instanceof MetaTag);
          MetaTag metaTag;
!         metaTag = (MetaTag) node[9];
!         assertEquals("Meta Tag 9 Name","description",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 9 Contents","Protecting the internet community through technology, not legislation.  SpamCop eliminates spam.  Automatically file spam reports with the network administrators who can stop spam at the source.  Subscribe, and filter your email through powerful statistical analysis before it reaches your inbox.",metaTag.getMetaContent());
  
!         assertTrue("Node 11 should be META Tag",node[11] instanceof MetaTag);
!         assertTrue("Node 13 should be META Tag",node[13] instanceof MetaTag);
!         assertTrue("Node 15 should be META Tag",node[15] instanceof MetaTag);
!         assertTrue("Node 17 should be META Tag",node[17] instanceof MetaTag);
  
!         metaTag = (MetaTag) node[11];
!         assertEquals("Meta Tag 11 Name","keywords",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 11 Contents","SpamCop spam cop email filter abuse header headers parse parser utility script net net-abuse filter mail program system trace traceroute dns",metaTag.getMetaContent());
!         assertNull("Meta Tag 11 Http-Equiv",metaTag.getHttpEquiv());
  
!         metaTag = (MetaTag) node[13];
!         assertEquals("Meta Tag 13 Name","language",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 13 Contents","en",metaTag.getMetaContent());
!         assertNull("Meta Tag 13 Http-Equiv",metaTag.getHttpEquiv());
  
!         metaTag = (MetaTag) node[15];
!         assertEquals("Meta Tag 15 Name","owner",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 15 Contents","se...@ad...",metaTag.getMetaContent());
!         assertNull("Meta Tag 15 Http-Equiv",metaTag.getHttpEquiv());
  
!         metaTag = (MetaTag) node[17];
!         assertNull("Meta Tag 17 Name",metaTag.getMetaTagName());
!         assertEquals("Meta Tag 17 Contents","text/html; charset=ISO-8859-1",metaTag.getMetaContent());
!         assertEquals("Meta Tag 17 Http-Equiv","content-type",metaTag.getHttpEquiv());
  
          assertEquals("This Scanner",scanner,metaTag.getThisScanner());

Index: OptionTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/OptionTagScannerTest.java,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** OptionTagScannerTest.java	22 Sep 2003 02:40:11 -0000	1.29
--- OptionTagScannerTest.java	5 Oct 2003 13:49:54 -0000	1.30
***************
*** 32,35 ****
--- 32,36 ----
  
  import org.htmlparser.Node;
+ import org.htmlparser.StringNode;
  import org.htmlparser.scanners.OptionTagScanner;
  import org.htmlparser.tags.OptionTag;
***************
*** 65,71 ****
          createParser(testHTML,"http://www.google.com/test/index.html");
          parser.addScanner(scanner);
!         parseAndAssertNodeCount(9);
          for(int j=0;j<i;j++)
          {
              assertTrue("Node " + j + " should be Option Tag",node[j] instanceof OptionTag);
              OptionTag OptionTag = (OptionTag) node[j];
--- 66,74 ----
          createParser(testHTML,"http://www.google.com/test/index.html");
          parser.addScanner(scanner);
!         parseAndAssertNodeCount(10);
          for(int j=0;j<i;j++)
          {
+             if (node[j] instanceof StringNode)
+                 continue;
              assertTrue("Node " + j + " should be Option Tag",node[j] instanceof OptionTag);
              OptionTag OptionTag = (OptionTag) node[j];

Index: ScriptScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v
retrieving revision 1.40
retrieving revision 1.41
diff -C2 -d -r1.40 -r1.41
*** ScriptScannerTest.java	22 Sep 2003 02:40:11 -0000	1.40
--- ScriptScannerTest.java	5 Oct 2003 13:49:54 -0000	1.41
***************
*** 75,79 ****
      public void testScanBug() throws ParserException
      {
!         createParser("<SCRIPT LANGUAGE=\"JavaScript\" SRC=\"../js/DetermineBrowser.js\"></SCRIPT>","http://www.google.com/test/index.html");
          // Register the image scanner
          parser.addScanner(new ScriptScanner("-s"));
--- 75,80 ----
      public void testScanBug() throws ParserException
      {
!         String src = "\"../js/DetermineBrowser.js\"";
!         createParser("<SCRIPT LANGUAGE=\"JavaScript\" SRC=" + src + "></SCRIPT>","http://www.google.com/test/index.html");
          // Register the image scanner
          parser.addScanner(new ScriptScanner("-s"));
***************
*** 84,88 ****
          Hashtable table = scriptTag.getAttributes();
          String srcExpected = (String)table.get("SRC");
!         assertEquals("Expected SRC value","../js/DetermineBrowser.js",srcExpected);
      }
  
--- 85,89 ----
          Hashtable table = scriptTag.getAttributes();
          String srcExpected = (String)table.get("SRC");
!         assertEquals("Expected SRC value",src,srcExpected);
      }
  
***************
*** 100,111 ****
      public void testScanBugWG() throws ParserException
      {
          StringBuffer sb1 = new StringBuffer();
!         sb1.append("<body><script language=\"javascript\">\r\n");
!         sb1.append("if(navigator.appName.indexOf(\"Netscape\") != -1)\r\n");
!         sb1.append(" document.write ('xxx');\r\n");
!         sb1.append("else\r\n");
!         sb1.append(" document.write ('yyy');\r\n");
!         sb1.append("</script>\r\n");
!         String testHTML1 = new String(sb1.toString());
  
          createParser(testHTML1,"http://www.google.com/test/index.html");
--- 101,116 ----
      public void testScanBugWG() throws ParserException
      {
+         StringBuffer sb2 = new StringBuffer();
+         sb2.append("\r\nif(navigator.appName.indexOf(\"Netscape\") != -1)\r\n");
+         sb2.append(" document.write ('xxx');\r\n");
+         sb2.append("else\r\n");
+         sb2.append(" document.write ('yyy');\r\n");
+         String testHTML2 = sb2.toString();
+ 
          StringBuffer sb1 = new StringBuffer();
!         sb1.append("<body><script language=\"javascript\">");
!         sb1.append(testHTML2);
!         sb1.append("</script>");
!         String testHTML1 = sb1.toString();
  
          createParser(testHTML1,"http://www.google.com/test/index.html");
***************
*** 116,126 ****
          parseAndAssertNodeCount(2);
  
-         StringBuffer sb2 = new StringBuffer();
-         sb2.append("\r\nif(navigator.appName.indexOf(\"Netscape\") != -1)\r\n");
-         sb2.append(" document.write ('xxx');\r\n");
-         sb2.append("else\r\n");
-         sb2.append(" document.write ('yyy');\r\n");
-         String testHTML2 = new String(sb2.toString());
- 
          assertTrue("Node should be a script tag",node[1]
          instanceof ScriptTag);
--- 121,124 ----
***************
*** 144,148 ****
          //parser.addScanner(new HTMLScriptScanner("-s"));
  
!         parseAndAssertNodeCount(1);
          assertTrue("Node should be a script tag",node[0]
          instanceof ScriptTag);
--- 142,146 ----
          //parser.addScanner(new HTMLScriptScanner("-s"));
  
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be a script tag",node[0]
          instanceof ScriptTag);
***************
*** 150,155 ****
  
      public void testScanScriptWithComments() throws ParserException {
!         createParser("<SCRIPT Language=\"JavaScript\">\n"+
!                           "<!--\n"+
                            "  function validateForm()\n"+
                            "  {\n"+
--- 148,152 ----
  
      public void testScanScriptWithComments() throws ParserException {
!         String expectedCode = "\n<!--\n"+
                            "  function validateForm()\n"+
                            "  {\n"+
***************
*** 159,163 ****
                            "     return true;\n"+
                            "  }\n"+
!                           "// -->\n"+
                            "</SCRIPT>","http://www.hardwareextreme.com/");
          // Register the image scanner
--- 156,161 ----
                            "     return true;\n"+
                            "  }\n"+
!                           "// -->\n";
!         createParser("<SCRIPT Language=\"JavaScript\">"+expectedCode+
                            "</SCRIPT>","http://www.hardwareextreme.com/");
          // Register the image scanner
***************
*** 169,181 ****
          ScriptTag scriptTag = (ScriptTag)node[0];
          String scriptCode = scriptTag.getScriptCode();
-         String expectedCode = "\r\n<!--\r\n"+
-                           "  function validateForm()\r\n"+
-                           "  {\r\n"+
-                           "     var i = 10;\r\n"+
-                           "     if(i < 5)\r\n"+
-                           "     i = i - 1 ; \r\n"+
-                           "     return true;\r\n"+
-                           "  }\r\n"+
-                           "// -->\r\n";
          assertStringEquals("Expected Code",expectedCode,scriptCode);
      }
--- 167,170 ----
***************
*** 558,563 ****
  
      public void testScanScriptWithTagsInComment() throws ParserException {
!         String javascript = "// This is javascript with <li> tag in the comment";
!         createParser("<script>\n"+ javascript + "\n</script>");
          parser.registerScanners();
          parseAndAssertNodeCount(1);
--- 547,552 ----
  
      public void testScanScriptWithTagsInComment() throws ParserException {
!         String javascript = "\n// This is javascript with <li> tag in the comment\n";
!         createParser("<script>"+ javascript + "</script>");
          parser.registerScanners();
          parseAndAssertNodeCount(1);
***************
*** 565,583 ****
          ScriptTag scriptTag = (ScriptTag)node[0];
          String scriptCode = scriptTag.getScriptCode();
!         String expectedCode =
!             wrapLineSeperatorAround(javascript);
!         assertStringEquals("Expected Code",expectedCode,scriptCode);
!     }
! 
!     private String wrapLineSeperatorAround(String javascript) {
!         return Parser.getLineSeparator()+javascript+Parser.getLineSeparator();
      }
  
      public void testScanScriptWithJavascriptLineEndings() throws ParserException {
          String javascript =
              "var s = \"This is a string \\\n" +
!             "that spans multiple lines;";
!         createParser("<script>\n"+ javascript + "\n</script>");
!         Parser.setLineSeparator("\n");
          parser.registerScanners();
          parseAndAssertNodeCount(1);
--- 554,566 ----
          ScriptTag scriptTag = (ScriptTag)node[0];
          String scriptCode = scriptTag.getScriptCode();
!         assertStringEquals("Expected Code",javascript,scriptCode);
      }
  
      public void testScanScriptWithJavascriptLineEndings() throws ParserException {
          String javascript =
+             "\n" +
              "var s = \"This is a string \\\n" +
!             "that spans multiple lines;\"\n";
!         createParser("<script>"+ javascript + "</script>");
          parser.registerScanners();
          parseAndAssertNodeCount(1);
***************
*** 585,598 ****
          ScriptTag scriptTag = (ScriptTag)node[0];
          String scriptCode = scriptTag.getScriptCode();
! 
!         String expectedCode =
!             wrapLineSeperatorAround(javascript);
!         assertStringEquals("Expected Code",expectedCode,scriptCode);
      }
  
  
      public void testScanScriptWithTags() throws ParserException {
!         String javascript = "Anything inside the script tag should be unchanged, even <li> and other html tags";
!         createParser("<script>\n"+ javascript + "\n</script>");
          parser.registerScanners();
          parseAndAssertNodeCount(1);
--- 568,578 ----
          ScriptTag scriptTag = (ScriptTag)node[0];
          String scriptCode = scriptTag.getScriptCode();
!         assertStringEquals("Expected Code",javascript,scriptCode);
      }
  
  
      public void testScanScriptWithTags() throws ParserException {
!         String javascript = "\nAnything inside the script tag should be unchanged, even <li> and other html tags\n";
!         createParser("<script>"+ javascript + "</script>");
          parser.registerScanners();
          parseAndAssertNodeCount(1);
***************
*** 600,606 ****
          ScriptTag scriptTag = (ScriptTag)node[0];
          String scriptCode = scriptTag.getScriptCode();
!         String expectedCode =
!             wrapLineSeperatorAround(javascript);
!         assertStringEquals("Expected Code",expectedCode,scriptCode);
      }
  
--- 580,584 ----
          ScriptTag scriptTag = (ScriptTag)node[0];
          String scriptCode = scriptTag.getScriptCode();
!         assertStringEquals("Expected Code",javascript,scriptCode);
      }
  

Index: TagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TagScannerTest.java,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** TagScannerTest.java	28 Sep 2003 15:33:59 -0000	1.30
--- TagScannerTest.java	5 Oct 2003 13:49:54 -0000	1.31
***************
*** 31,36 ****
--- 31,38 ----
  import org.htmlparser.Node;
  import org.htmlparser.Parser;
+ import org.htmlparser.lexer.Lexer;
  import org.htmlparser.scanners.TagScanner;
  import org.htmlparser.tags.Tag;
+ import org.htmlparser.tags.data.TagData;
  import org.htmlparser.tests.ParserTestCase;
  import org.htmlparser.util.NodeIterator;
***************
*** 53,98 ****
  
      public void testExtractXMLData() throws ParserException {
!         fail ("not implemented");
! //        createParser(
! //            "<MESSAGE>\n"+
! //            "Abhi\n"+
! //            "Sri\n"+
! //            "</MESSAGE>");
! //        Parser.setLineSeparator("\r\n");
! //        NodeIterator e = parser.elements();
! //
! //        Node node = e.nextNode();
! //        try {
! //            String result = TagScanner.extractXMLData(node,"MESSAGE",parser.getReader());
! //            assertEquals("Result","Abhi\r\nSri\r\n",result);
! //        }
! //        catch (ParserException ex) {
! //            assertTrue(e.toString(),false);
! //        }
      }
  
      public void testExtractXMLDataSingle() throws ParserException {
!         fail ("not implemented");
! //        createParser(
! //            "<MESSAGE>Test</MESSAGE>");
! //        NodeIterator e = parser.elements();
! //
! //        Node node = (Node)e.nextNode();
! //        try {
! //            String result = TagScanner.extractXMLData(node,"MESSAGE",parser.getReader());
! //            assertEquals("Result","Test",result);
! //        }
! //        catch (ParserException ex) {
! //            assertTrue(e.toString(),false);
! //        }
      }
  
!     public void testTagExtraction()
      {
!         fail ("not implemented");
! //        String testHTML = "<AREA \n coords=0,0,52,52 href=\"http://www.yahoo.com/r/c1\" shape=RECT>";
! //        createParser(testHTML);
! //        Tag tag = Tag.find(parser.getReader(),testHTML,0);
! //        assertNotNull(tag);
      }
  
--- 55,97 ----
  
      public void testExtractXMLData() throws ParserException {
!         createParser(
!             "<MESSAGE>\n"+
!             "Abhi\n"+
!             "Sri\n"+
!             "</MESSAGE>");
!         Parser.setLineSeparator("\r\n");
!         NodeIterator e = parser.elements();
! 
!         Node node = e.nextNode();
!         try {
!             String result = TagScanner.extractXMLData (node, "MESSAGE", e);
!             assertEquals("Result","\nAbhi\nSri\n",result);
!         }
!         catch (ParserException ex) {
!             assertTrue(e.toString(),false);
!         }
      }
  
      public void testExtractXMLDataSingle() throws ParserException {
!         createParser(
!             "<MESSAGE>Test</MESSAGE>");
!         NodeIterator e = parser.elements();
! 
!         Node node = (Node)e.nextNode();
!         try {
!             String result = TagScanner.extractXMLData (node, "MESSAGE", e);
!             assertEquals("Result","Test",result);
!         }
!         catch (ParserException ex) {
!             assertTrue(e.toString(),false);
!         }
      }
  
!     public void testTagExtraction() throws ParserException
      {
!         String testHTML = "<AREA \n coords=0,0,52,52 href=\"http://www.yahoo.com/r/c1\" shape=RECT>";
!         createParser(testHTML);
!         Tag tag = (Tag)parser.elements ().nextNode ();
!         assertNotNull(tag);
      }
  
***************
*** 117,132 ****
  
      public void testRemoveChars2() {
!         fail ("not implemented");
! //        String test = "hello\r\nworld\r\n\tqsdsds";
! //        TagScanner scanner = new TagScanner() {
! //            public Tag scan(Tag tag,String url,NodeReader reader,String currLine) { return null;}
! //            public boolean evaluate(String s,TagScanner previousOpenScanner) { return false; }
! //            public String [] getID() {
! //                return null;
! //            }
! //
! //        };
! //        String result = scanner.removeChars(test,"\r\n");
! //        assertEquals("Removing Chars","helloworld\tqsdsds",result);
      }
  
--- 116,128 ----
  
      public void testRemoveChars2() {
!         String test = "hello\r\nworld\r\n\tqsdsds";
!         TagScanner scanner = new TagScanner() {
!             public Tag scan(Tag tag,String url,Lexer lexer) { return null;}
!             public boolean evaluate(String s,TagScanner previousOpenScanner) { return false; }
!             public String [] getID() { return null; }
!             protected Tag createTag(TagData tagData, Tag tag, String url) { return null; }
!         };
!         String result = scanner.removeChars(test,"\r\n");
!         assertEquals("Removing Chars","helloworld\tqsdsds",result);
      }
  

Index: TitleScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TitleScannerTest.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** TitleScannerTest.java	22 Sep 2003 02:40:11 -0000	1.28
--- TitleScannerTest.java	5 Oct 2003 13:49:54 -0000	1.29
***************
*** 86,90 ****
          TitleScanner titleScanner = new TitleScanner("-t");
          parser.addScanner(titleScanner);
!         parseAndAssertNodeCount(7);
          assertTrue("Third tag should be a title tag",node[2] instanceof TitleTag);
          TitleTag titleTag = (TitleTag)node[2];
--- 86,90 ----
          TitleScanner titleScanner = new TitleScanner("-t");
          parser.addScanner(titleScanner);
!         parseAndAssertNodeCount(8);
          assertTrue("Third tag should be a title tag",node[2] instanceof TitleTag);
          TitleTag titleTag = (TitleTag)node[2];

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/tagTests AppletTagTest.java,1.28,1.29 DoctypeTagTest.java,1.28,1.29 JspTagTest.java,1.31,1.32 LinkTagTest.java,1.35,1.36 MetaTagTest.java,1.29,1.30 ScriptTagTest.java,1.30,1.31

From: <der...@us...> - 2003-10-05 13:50:00

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests
In directory sc8-pr-cvs1:/tmp/cvs-serv9618/tests/tagTests

Modified Files:
	AppletTagTest.java DoctypeTagTest.java JspTagTest.java 
	LinkTagTest.java MetaTagTest.java ScriptTagTest.java 
Log Message:
Add bean like accessors for positions on Node, AbstractNode and AbstractNodeDecorator.
Handle null page in Cursor.
Add smartquotes mode in Lexer and CompositeTagScannerHelper.
Add simple name constructor in Attribute.
Remove emptyxmltag member, replace with computing accessors in TagNode.
Removed ScriptScannerHelper and moved scanning logic to ScriptScanner.
Reworked extractImageLocn in ImageScanner
Implement extractXMLData in TagScanner.
Made virtual tags zero length in TagData.
Added push() to IteratorImpl.
Added single node constructor to NodeList.
Numerous and various test adjustments. Still 133 failures.



Index: AppletTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/AppletTagTest.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** AppletTagTest.java	22 Sep 2003 02:40:12 -0000	1.28
--- AppletTagTest.java	5 Oct 2003 13:49:54 -0000	1.29
***************
*** 55,67 ****
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          // Check the data in the applet tag
          AppletTag appletTag = (AppletTag)node[0];
          String expectedRawString =
!         "<APPLET CODE=\"Myclass.class\" CODEBASE=\"www.kizna.com\" ARCHIVE=\"test.jar\">\r\n"+
!         "<PARAM VALUE=\"Value1\" NAME=\"Param1\">\r\n"+
!         "<PARAM VALUE=\"Somik\" NAME=\"Name\">\r\n"+
!         "<PARAM VALUE=\"23\" NAME=\"Age\">\r\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
--- 55,67 ----
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(3);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          // Check the data in the applet tag
          AppletTag appletTag = (AppletTag)node[0];
          String expectedRawString =
!         "<APPLET CODE=Myclass.class ARCHIVE=test.jar CODEBASE=www.kizna.com>\n"+
!         "<PARAM NAME=\"Param1\" VALUE=\"Value1\">\n"+
!         "<PARAM NAME=\"Name\" VALUE=\"Somik\">\n"+
!         "<PARAM NAME=\"Age\" VALUE=\"23\">\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
***************
*** 82,86 ****
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
--- 82,86 ----
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(3);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
***************
*** 88,95 ****
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=\"Myclass.class\" CODEBASE=\"htmlparser.sourceforge.net\" ARCHIVE=\"test.jar\">\r\n"+
!         "<PARAM VALUE=\"Value1\" NAME=\"Param1\">\r\n"+
!         "<PARAM VALUE=\"Somik\" NAME=\"Name\">\r\n"+
!         "<PARAM VALUE=\"23\" NAME=\"Age\">\r\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
--- 88,95 ----
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=Myclass.class ARCHIVE=test.jar CODEBASE=htmlparser.sourceforge.net>\n"+
!         "<PARAM NAME=\"Param1\" VALUE=\"Value1\">\n"+
!         "<PARAM NAME=\"Name\" VALUE=\"Somik\">\n"+
!         "<PARAM NAME=\"Age\" VALUE=\"23\">\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
***************
*** 110,114 ****
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
--- 110,114 ----
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(3);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
***************
*** 116,123 ****
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=\"Myclass.class\" CODEBASE=\"www.kizna.com\" ARCHIVE=\"htmlparser.jar\">\r\n"+
!         "<PARAM VALUE=\"Value1\" NAME=\"Param1\">\r\n"+
!         "<PARAM VALUE=\"Somik\" NAME=\"Name\">\r\n"+
!         "<PARAM VALUE=\"23\" NAME=\"Age\">\r\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
--- 116,123 ----
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=Myclass.class ARCHIVE=htmlparser.jar CODEBASE=htmlparser.sourceforge.net>\n"+
!         "<PARAM NAME=\"Param1\" VALUE=\"Value1\">\n"+
!         "<PARAM NAME=\"Name\" VALUE=\"Somik\">\n"+
!         "<PARAM NAME=\"Age\" VALUE=\"23\">\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
***************
*** 138,142 ****
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
--- 138,142 ----
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(3);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
***************
*** 144,151 ****
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=\"MyOtherClass.class\" CODEBASE=\"www.kizna.com\" ARCHIVE=\"test.jar\">\r\n"+
!         "<PARAM VALUE=\"Value1\" NAME=\"Param1\">\r\n"+
!         "<PARAM VALUE=\"Somik\" NAME=\"Name\">\r\n"+
!         "<PARAM VALUE=\"23\" NAME=\"Age\">\r\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
--- 144,151 ----
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=MyOtherClass.class ARCHIVE=htmlparser.jar CODEBASE=htmlparser.sourceforge.net>\n"+
!         "<PARAM NAME=\"Param1\" VALUE=\"Value1\">\n"+
!         "<PARAM NAME=\"Name\" VALUE=\"Somik\">\n"+
!         "<PARAM NAME=\"Age\" VALUE=\"23\">\n"+
          "</APPLET>";
          assertStringEquals("toHTML()",expectedRawString,appletTag.toHtml());
***************
*** 166,170 ****
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
--- 166,170 ----
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(3);
          assertTrue("Node should be an applet tag",node[0] instanceof AppletTag);
          AppletTag appletTag = (AppletTag)node[0];
***************
*** 178,184 ****
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=\"Myclass.class\" CODEBASE=\"www.kizna.com\" ARCHIVE=\"test.jar\">\r\n"+
!         "<PARAM VALUE=\"Two\" NAME=\"Second\">"+
          "<PARAM VALUE=\"One\" NAME=\"First\">"+
          "<PARAM VALUE=\"3\" NAME=\"Third\">"+
          "</APPLET>";
--- 178,184 ----
          // Check the data in the applet tag
          String expectedRawString =
!         "<APPLET CODE=MyOtherClass.class ARCHIVE=htmlparser.jar CODEBASE=htmlparser.sourceforge.net>\n"+
          "<PARAM VALUE=\"One\" NAME=\"First\">"+
+         "<PARAM VALUE=\"Two\" NAME=\"Second\">"+
          "<PARAM VALUE=\"3\" NAME=\"Third\">"+
          "</APPLET>";

Index: DoctypeTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/DoctypeTagTest.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** DoctypeTagTest.java	22 Sep 2003 02:40:12 -0000	1.28
--- DoctypeTagTest.java	5 Oct 2003 13:49:54 -0000	1.29
***************
*** 52,58 ****
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(9);
!         // The node should be an HTMLLinkTag
!         assertTrue("Node should be a HTMLDoctypeTag",node[0] instanceof DoctypeTag);
          DoctypeTag docTypeTag = (DoctypeTag)node[0];
          assertStringEquals("toHTML()","<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0//EN\">",docTypeTag.toHtml());
--- 52,58 ----
          createParser(testHTML);
          parser.registerScanners();
!         parseAndAssertNodeCount(16);
!         // The node should be an DoctypeTag
!         assertTrue("Node should be a DoctypeTag",node[0] instanceof DoctypeTag);
          DoctypeTag docTypeTag = (DoctypeTag)node[0];
          assertStringEquals("toHTML()","<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0//EN\">",docTypeTag.toHtml());

Index: JspTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/JspTagTest.java,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** JspTagTest.java	22 Sep 2003 02:40:12 -0000	1.31
--- JspTagTest.java	5 Oct 2003 13:49:54 -0000	1.32
***************
*** 77,94 ****
          // Register the Jsp Scanner
          parser.addScanner(new JspScanner("-j"));
!         parseAndAssertNodeCount(5);
!         // The first node should be an HTMLJspTag
!         assertTrue("Node 1 should be an HTMLJspTag",node[0] instanceof JspTag);
          JspTag tag = (JspTag)node[0];
          assertStringEquals("Contents of the tag","@ taglib uri=\"/WEB-INF/struts.tld\" prefix=\"struts\" ",tag.getText());
  
          // The second node should be a normal tag
!         assertTrue("Node 2 should be an Tag",node[1] instanceof Tag);
!         Tag htag = (Tag)node[1];
          assertStringEquals("Contents of the tag","jsp:useBean id=\"transfer\" scope=\"session\" class=\"com.bank.PageBean\"",htag.getText());
          assertStringEquals("html","<JSP:USEBEAN ID=\"transfer\" SCOPE=\"session\" CLASS=\"com.bank.PageBean\"/>",htag.toHtml());
!         // The third node should be an HTMLJspTag
!         assertTrue("Node 3 should be an HTMLJspTag",node[2] instanceof JspTag);
!         JspTag tag2 = (JspTag)node[2];
          String expected = "\r\n"+
              "    org.apache.struts.util.BeanUtils.populate(transfer, request);\r\n"+
--- 77,94 ----
          // Register the Jsp Scanner
          parser.addScanner(new JspScanner("-j"));
!         parseAndAssertNodeCount(8);
!         // The first node should be an JspTag
!         assertTrue("Node 1 should be an JspTag",node[0] instanceof JspTag);
          JspTag tag = (JspTag)node[0];
          assertStringEquals("Contents of the tag","@ taglib uri=\"/WEB-INF/struts.tld\" prefix=\"struts\" ",tag.getText());
  
          // The second node should be a normal tag
!         assertTrue("Node 3 should be a normal Tag",node[2] instanceof Tag);
!         Tag htag = (Tag)node[2];
          assertStringEquals("Contents of the tag","jsp:useBean id=\"transfer\" scope=\"session\" class=\"com.bank.PageBean\"",htag.getText());
          assertStringEquals("html","<JSP:USEBEAN ID=\"transfer\" SCOPE=\"session\" CLASS=\"com.bank.PageBean\"/>",htag.toHtml());
!         // The third node should be an JspTag
!         assertTrue("Node 5 should be an JspTag",node[4] instanceof JspTag);
!         JspTag tag2 = (JspTag)node[4];
          String expected = "\r\n"+
              "    org.apache.struts.util.BeanUtils.populate(transfer, request);\r\n"+
***************
*** 139,152 ****
          // Register the Jsp Scanner
          parser.addScanner(new JspScanner("-j"));
!         parseAndAssertNodeCount(5);
!         // The first node should be an HTMLJspTag
!         assertTrue("Node 1 should be an HTMLJspTag",node[0] instanceof JspTag);
          JspTag tag = (JspTag)node[0];
          assertEquals("Raw String of the first JSP tag","<%@ taglib uri=\"/WEB-INF/struts.tld\" prefix=\"struts\" %>",tag.toHtml());
  
  
!         // The third node should be an HTMLJspTag
!         assertTrue("Node 2 should be an HTMLJspTag",node[2] instanceof JspTag);
!         JspTag tag2 = (JspTag)node[2];
          String expected = "<%\r\n"+
              "    org.apache.struts.util.BeanUtils.populate(transfer, request);\r\n"+
--- 139,152 ----
          // Register the Jsp Scanner
          parser.addScanner(new JspScanner("-j"));
!         parseAndAssertNodeCount(8);
!         // The first node should be an JspTag
!         assertTrue("Node 1 should be an JspTag",node[0] instanceof JspTag);
          JspTag tag = (JspTag)node[0];
          assertEquals("Raw String of the first JSP tag","<%@ taglib uri=\"/WEB-INF/struts.tld\" prefix=\"struts\" %>",tag.toHtml());
  
  
!         // The third node should be an JspTag
!         assertTrue("Node 5 should be an JspTag",node[5] instanceof JspTag);
!         JspTag tag2 = (JspTag)node[8];
          String expected = "<%\r\n"+
              "    org.apache.struts.util.BeanUtils.populate(transfer, request);\r\n"+

Index: LinkTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/LinkTagTest.java,v
retrieving revision 1.35
retrieving revision 1.36
diff -C2 -d -r1.35 -r1.36
*** LinkTagTest.java	28 Sep 2003 15:33:59 -0000	1.35
--- LinkTagTest.java	5 Oct 2003 13:49:54 -0000	1.36
***************
*** 276,286 ****
          parser.addScanner(new LinkScanner("-l"));
  
!         parseAndAssertNodeCount(9);
!         assertTrue("First Node should be a HTMLLinkTag",node[0] instanceof LinkTag);
          LinkTag linkTag = (LinkTag)node[0];
          assertStringEquals("Link Raw Text","<A HREF=\"mailto:so...@ya...\">hello</A>",linkTag.toHtml());
!         assertTrue("Eighth Node should be a HTMLLinkTag",node[7] instanceof LinkTag);
!         linkTag = (LinkTag)node[7];
!         assertStringEquals("Link Raw Text","<A HREF=\"http://ads.samachar.com/bin/redirect/tech.txt?http://www.samachar.com/tech\r\nnical.html\"> Journalism 3.0</A>",linkTag.toHtml());
      }
  
--- 276,286 ----
          parser.addScanner(new LinkScanner("-l"));
  
!         parseAndAssertNodeCount(10);
!         assertTrue("First Node should be a LinkTag",node[0] instanceof LinkTag);
          LinkTag linkTag = (LinkTag)node[0];
          assertStringEquals("Link Raw Text","<A HREF=\"mailto:so...@ya...\">hello</A>",linkTag.toHtml());
!         assertTrue("Ninth Node should be a HTMLLinkTag",node[8] instanceof LinkTag);
!         linkTag = (LinkTag)node[8];
!         assertStringEquals("Link Raw Text","<A HREF=\"http://ads.samachar.com/bin/redirect/tech.txt?http://www.samachar.com/tech\nnical.html\"> Journalism 3.0</A>",linkTag.toHtml());
      }
  

Index: MetaTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/MetaTagTest.java,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** MetaTagTest.java	22 Sep 2003 02:40:12 -0000	1.29
--- MetaTagTest.java	5 Oct 2003 13:49:54 -0000	1.30
***************
*** 53,63 ****
          parser.registerScanners();
  
!         parseAndAssertNodeCount(9);
!         assertTrue("Node 5 should be META Tag",node[4] instanceof MetaTag);
          MetaTag metaTag;
!         metaTag = (MetaTag) node[4];
!         assertStringEquals("Meta Tag 4 Name","description",metaTag.getMetaTagName());
!         assertStringEquals("Meta Tag 4 Contents","Protecting the internet community through technology, not legislation.  SpamCop eliminates spam.  Automatically file spam reports with the network administrators who can stop spam at the source.  Subscribe, and filter your email through powerful statistical analysis before it reaches your inbox.",metaTag.getMetaContent());
!         assertStringEquals("toHTML()","<META CONTENT=\"Protecting the internet community through technology, not legislation.  SpamCop eliminates spam.  Automatically file spam reports with the network administrators who can stop spam at the source.  Subscribe, and filter your email through powerful statistical analysis before it reaches your inbox.\" NAME=\"description\">",metaTag.toHtml());
      }
  }
--- 53,63 ----
          parser.registerScanners();
  
!         parseAndAssertNodeCount(16);
!         assertTrue("Node 8 should be META Tag",node[7] instanceof MetaTag);
          MetaTag metaTag;
!         metaTag = (MetaTag) node[7];
!         assertStringEquals("Meta Tag 7 Name","description",metaTag.getMetaTagName());
!         assertStringEquals("Meta Tag 7 Contents","Protecting the internet community through technology, not legislation.  SpamCop eliminates spam.  Automatically file spam reports with the network administrators who can stop spam at the source.  Subscribe, and filter your email through powerful statistical analysis before it reaches your inbox.",metaTag.getMetaContent());
!         assertStringEquals("toHTML()","<META name=\"description\" content=\"Protecting the internet community through technology, not legislation.  SpamCop eliminates spam.  Automatically file spam reports with the network administrators who can stop spam at the source.  Subscribe, and filter your email through powerful statistical analysis before it reaches your inbox.\">",metaTag.toHtml());
      }
  }

Index: ScriptTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ScriptTagTest.java,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** ScriptTagTest.java	28 Sep 2003 15:33:59 -0000	1.30
--- ScriptTagTest.java	5 Oct 2003 13:49:54 -0000	1.31
***************
*** 99,110 ****
      public void testToHTMLWG() throws ParserException
      {
          StringBuffer sb1 = new StringBuffer();
!         sb1.append("<body><script language=\"javascript\">\r\n");
!         sb1.append("if(navigator.appName.indexOf(\"Netscape\") != -1)\r\n");
!         sb1.append(" document.write ('xxx');\r\n");
!         sb1.append("else\r\n");
!         sb1.append(" document.write ('yyy');\r\n");
!         sb1.append("</script>\r\n");
!         String testHTML1 = new String(sb1.toString());
  
          createParser(testHTML1);
--- 99,116 ----
      public void testToHTMLWG() throws ParserException
      {
+         StringBuffer sb2 = new StringBuffer();
+         sb2.append("<script language=\"javascript\">\r\n");
+         sb2.append("if(navigator.appName.indexOf(\"Netscape\") != -1)\r\n");
+         sb2.append(" document.write ('xxx');\r\n");
+         sb2.append("else\r\n");
+         sb2.append(" document.write ('yyy');\r\n");
+         sb2.append("</script>");
+         String expectedHTML = sb2.toString();
+ 
          StringBuffer sb1 = new StringBuffer();
!         sb1.append("<body>");
!         sb1.append(expectedHTML);
!         sb1.append("\r\n");
!         String testHTML1 = sb1.toString();
  
          createParser(testHTML1);
***************
*** 113,132 ****
          parser.addScanner(new ScriptScanner("-s"));
  
! 
!         StringBuffer sb2 = new StringBuffer();
!         sb2.append("<SCRIPT LANGUAGE=\"javascript\">\r\n");
!         sb2.append("if(navigator.appName.indexOf(\"Netscape\") != -1)\r\n");
!         sb2.append(" document.write ('xxx');\r\n");
!         sb2.append("else\r\n");
!         sb2.append(" document.write ('yyy');\r\n");
!         sb2.append("</SCRIPT>");
!         String expectedHTML = new String(sb2.toString());
! 
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be a script tag",node[1]
          instanceof ScriptTag);
!         // Check the data in the applet tag
!         ScriptTag scriptTag = (ScriptTag)node
!         [1];
          assertStringEquals("Expected Script Code",expectedHTML,scriptTag.toHtml());
      }
--- 119,127 ----
          parser.addScanner(new ScriptScanner("-s"));
  
!         parseAndAssertNodeCount(3);
          assertTrue("Node should be a script tag",node[1]
          instanceof ScriptTag);
!         // Check the data in the script tag
!         ScriptTag scriptTag = (ScriptTag)node[1];
          assertStringEquals("Expected Script Code",expectedHTML,scriptTag.toHtml());
      }
***************
*** 161,168 ****
          parser.addScanner(new ScriptScanner("-s"));
          //parser.registerScanners();
!         parseAndAssertNodeCount(1);
          assertTrue("Node should be a script tag",node[0] instanceof ScriptTag);
          ScriptTag scriptTag = (ScriptTag)node[0];
!         assertStringEquals("Script toHTML()","<SCRIPT LANGUAGE=\"javascript\">\r\nvar lower = '<%=lowerValue%>';\r\n</SCRIPT>",scriptTag.toHtml());
      }
  
--- 156,163 ----
          parser.addScanner(new ScriptScanner("-s"));
          //parser.registerScanners();
!         parseAndAssertNodeCount(2);
          assertTrue("Node should be a script tag",node[0] instanceof ScriptTag);
          ScriptTag scriptTag = (ScriptTag)node[0];
!         assertStringEquals("Script toHTML()","<script language=\"javascript\">\nvar lower = '<%=lowerValue%>';\n</script>",scriptTag.toHtml());
      }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/util IteratorImpl.java,1.29,1.30 NodeList.java,1.44,1.45

From: <der...@us...> - 2003-10-05 13:50:00

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util
In directory sc8-pr-cvs1:/tmp/cvs-serv9618/util

Modified Files:
	IteratorImpl.java NodeList.java 
Log Message:
Add bean like accessors for positions on Node, AbstractNode and AbstractNodeDecorator.
Handle null page in Cursor.
Add smartquotes mode in Lexer and CompositeTagScannerHelper.
Add simple name constructor in Attribute.
Remove emptyxmltag member, replace with computing accessors in TagNode.
Removed ScriptScannerHelper and moved scanning logic to ScriptScanner.
Reworked extractImageLocn in ImageScanner
Implement extractXMLData in TagScanner.
Made virtual tags zero length in TagData.
Added push() to IteratorImpl.
Added single node constructor to NodeList.
Numerous and various test adjustments. Still 133 failures.



Index: IteratorImpl.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/IteratorImpl.java,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** IteratorImpl.java	28 Sep 2003 15:33:59 -0000	1.29
--- IteratorImpl.java	5 Oct 2003 13:49:54 -0000	1.30
***************
*** 75,78 ****
--- 75,87 ----
  
      /**
+      * Makes <code>node</code> the next <code>Node</code> that will be returned.
+      * @param node The node to return next.
+      */
+     public void push (Node node)
+     {
+         preRead.insertElementAt (node, 0);
+     }
+ 
+     /**
       * Check if more nodes are available.
       * @return <code>true</code> if a call to <code>nextHTMLNode()</code> will succeed.

Index: NodeList.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/NodeList.java,v
retrieving revision 1.44
retrieving revision 1.45
diff -C2 -d -r1.44 -r1.45
*** NodeList.java	22 Sep 2003 02:40:15 -0000	1.44
--- NodeList.java	5 Oct 2003 13:49:54 -0000	1.45
***************
*** 51,54 ****
--- 51,64 ----
      }
  
+     /**
+      * Create a one element node list.
+      * @param node The initial node to add.
+      */
+     public NodeList(Node node)
+     {
+         this ();
+         add (node);
+     }
+         
      public void add(Node node) {
          if (size==capacity)

[Htmlparser-cvs] htmlparser/src/org/htmlparser/scanners CompositeTagScanner.java,1.68,1.69 ImageScanner.java,1.29,1.30 LinkScanner.java,1.53,1.54 ScriptScanner.java,1.40,1.41 TagScanner.java,1.41,1.42

From: <der...@us...> - 2003-10-05 13:49:59

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners
In directory sc8-pr-cvs1:/tmp/cvs-serv9618/scanners

Modified Files:
	CompositeTagScanner.java ImageScanner.java LinkScanner.java 
	ScriptScanner.java TagScanner.java 
Log Message:
Add bean like accessors for positions on Node, AbstractNode and AbstractNodeDecorator.
Handle null page in Cursor.
Add smartquotes mode in Lexer and CompositeTagScannerHelper.
Add simple name constructor in Attribute.
Remove emptyxmltag member, replace with computing accessors in TagNode.
Removed ScriptScannerHelper and moved scanning logic to ScriptScanner.
Reworked extractImageLocn in ImageScanner
Implement extractXMLData in TagScanner.
Made virtual tags zero length in TagData.
Added push() to IteratorImpl.
Added single node constructor to NodeList.
Numerous and various test adjustments. Still 133 failures.



Index: CompositeTagScanner.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/CompositeTagScanner.java,v
retrieving revision 1.68
retrieving revision 1.69
diff -C2 -d -r1.68 -r1.69
*** CompositeTagScanner.java	28 Sep 2003 15:33:58 -0000	1.68
--- CompositeTagScanner.java	5 Oct 2003 13:49:52 -0000	1.69
***************
*** 216,229 ****
      public abstract Tag createTag(TagData tagData, CompositeTagData compositeTagData) throws ParserException;
  
!     public final boolean isTagToBeEndedFor(Tag tag) {
!         boolean isEndTag = tag.isEndTag ();
!         String tagName = tag.getTagName();
!         if (isEndTag)
!             tagName = tagName.substring (1);
!         if (
!                 ( isEndTag && endTagEnderSet.contains(tagName)) ||
!                 (!isEndTag &&    tagEnderSet.contains(tagName))
!             )
!         return true; else return false;
      }
  
--- 216,232 ----
      public abstract Tag createTag(TagData tagData, CompositeTagData compositeTagData) throws ParserException;
  
!     public final boolean isTagToBeEndedFor(Tag tag)
!     {
!         String name;
!         boolean ret;
! 
!         ret = false;
!         name = tag.getTagName ();
!         if (tag.isEndTag ())
!             ret = endTagEnderSet.contains (name);
!         else
!             ret = tagEnderSet.contains (name);
!         
!         return (ret);
      }
  

Index: ImageScanner.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/ImageScanner.java,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** ImageScanner.java	28 Sep 2003 15:33:58 -0000	1.29
--- ImageScanner.java	5 Oct 2003 13:49:52 -0000	1.30
***************
*** 33,36 ****
--- 33,38 ----
  //////////////////
  import java.util.Hashtable;
+ import java.util.Vector;
+ import org.htmlparser.lexer.nodes.Attribute;
  
  import org.htmlparser.tags.ImageTag;
***************
*** 66,104 ****
          this.processor = processor;
      }
!   /**
!    * Extract the location of the image, given the string to be parsed, and the url
!    * of the html page in which this tag exists.
!    * @param tag The tag with the 'SRC' attribute.
!    * @param url URL of web page being parsed.
!    */
!     public String extractImageLocn (Tag tag,String url) throws ParserException
      {
          String ret;
-         Hashtable table;
  
          ret = "";
!         try
          {
!             table = tag.getAttributes ();
!             ret =  (String)table.get ("SRC");
!             if (null != ret)
              {
!                 ret = ParserUtils.removeChars (ret, '\n');
!                 ret = ParserUtils.removeChars (ret, '\r');
!                 ret = processor.extract (ret, url);
              }
-             else
-                 ret = "";
-         }
-         catch (Exception e)
-         {
-             throw new ParserException (
-                 "ImageScanner.extractImageLocn() : "
-                     + "Error in extracting image location, relativeLink = "
-                     + ret
-                     + ", url = "
-                     + url,
-                 e);
          }
          
          return (ret);
--- 68,172 ----
          this.processor = processor;
      }
! 
!    /**
!     * Extract the location of the image
!     * Given the tag (with attributes), and the url of the html page in which
!     * this tag exists, perform best effort to extract the 'intended' URL.
!     * Attempts to handle such attributes as:
!     * <pre>
!     * &lt;IMG SRC=http://www.redgreen.com&gt; - normal
!     * &lt;IMG SRC =http://www.redgreen.com&gt; - space between attribute name and equals sign
!     * &lt;IMG SRC= http://www.redgreen.com&gt; - space between equals sign and attribute value
!     * &lt;IMG SRC = http://www.redgreen.com&gt; - space both sides of equals sign
!     * </pre>
!     * @param tag The tag with the 'SRC' attribute.
!     * @param url URL of web page being parsed.
!     */
!     public String extractImageLocn (Tag tag, String url) throws ParserException
      {
+         Vector attributes;
+         int size;
+         Attribute attribute;
+         String string;
+         String data;
+         int state;
+         String name;
          String ret;
  
          ret = "";
!         state = 0;
!         attributes = tag.getAttributesEx ();
!         size = attributes.size ();
!         for (int i = 0; (i < size) && (state < 3); i++)
          {
!             attribute = (Attribute)attributes.elementAt (i);
!             string = attribute.getName ();
!             data = attribute.getValue ();
!             switch (state)
              {
!                 case 0: // looking for 'src'
!                     if (null != string)
!                     {
!                         name = string.toUpperCase ();
!                         if (name.equals ("SRC"))
!                         {
!                             state = 1;
!                             if (null != data)
!                             {
!                                 if ("".equals (data))
!                                     state = 2; // empty attribute, SRC= 
!                                 else
!                                 {
!                                     ret = data;
!                                     i = size; // exit fast
!                                 }
!                             }
! 
!                         }
!                         else if (name.startsWith ("SRC"))
!                         {
!                             // missing equals sign
!                             ret = string.substring (3);
!                             state = 0; // go back to searching for SRC
!                             // because, maybe we found SRCXXX
!                             // where XXX isn't a URL
!                         }
!                     }
!                     break;
!                 case 1: // looking for equals sign
!                     if (null != string)
!                     {
!                         if (string.startsWith ("="))
!                         {
!                             state = 2;
!                             if (1 < string.length ())
!                             {
!                                 ret = string.substring (1);
!                                 state = 0; // keep looking ?
!                             }
!                             else if (null != data)
!                             {
!                                 ret = string.substring (1);
!                                 state = 0; // keep looking ?
!                             }
!                         }
!                     }
!                     break;
!                 case 2: // looking for a valueless attribute that could be a relative or absolute URL
!                     if (null != string)
!                     {
!                         if (null == data)
!                             ret = string;
!                         state = 0; // only check first non-whitespace item
!                         // not every valid attribute after an equals
!                     }
!                     break;
!                 default:
!                     throw new IllegalStateException ("we're not supposed to in state " + state);
              }
          }
+         ret = ParserUtils.removeChars (ret, '\n');
+         ret = ParserUtils.removeChars (ret, '\r');
+         ret = processor.extract (ret, url);
          
          return (ret);

Index: LinkScanner.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/LinkScanner.java,v
retrieving revision 1.53
retrieving revision 1.54
diff -C2 -d -r1.53 -r1.54
*** LinkScanner.java	22 Sep 2003 02:40:00 -0000	1.53
--- LinkScanner.java	5 Oct 2003 13:49:53 -0000	1.54
***************
*** 53,57 ****
      private static final String MATCH_NAME [] = {"A"};
      public static final String LINK_SCANNER_ID = "A";
-     public static final String DIRTY_TAG_MESSAGE=" is a dirty link tag - the tag was not closed. \nWe encountered an open tag, before the previous end tag was found.\nCorrecting this..";
      private LinkProcessor processor;
      private final static String ENDERS [] = { "TD","TR","FORM","LI","BODY", "HTML" };
--- 53,56 ----

Index: ScriptScanner.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/ScriptScanner.java,v
retrieving revision 1.40
retrieving revision 1.41
diff -C2 -d -r1.40 -r1.41
*** ScriptScanner.java	28 Sep 2003 15:33:58 -0000	1.40
--- ScriptScanner.java	5 Oct 2003 13:49:53 -0000	1.41
***************
*** 28,44 ****
  
  package org.htmlparser.scanners;
! /////////////////////////
! // HTML Parser Imports //
! /////////////////////////
! import org.htmlparser.*;
  import org.htmlparser.lexer.Lexer;
! import org.htmlparser.parserHelper.*;
! import org.htmlparser.tags.*;
! import org.htmlparser.tags.data.*;
! import org.htmlparser.util.*;
  /**
!  * The HTMLScriptScanner identifies javascript code
   */
- 
  public class ScriptScanner extends CompositeTagScanner {
      private static final String SCRIPT_END_TAG = "</SCRIPT>";
--- 28,50 ----
  
  package org.htmlparser.scanners;
! 
! import java.util.Vector;
! import org.htmlparser.Node;
! import org.htmlparser.Parser;
! import org.htmlparser.RemarkNode;
! import org.htmlparser.StringNode;
  import org.htmlparser.lexer.Lexer;
! import org.htmlparser.lexer.nodes.NodeFactory;
! import org.htmlparser.tags.ScriptTag;
! import org.htmlparser.tags.Tag;
! import org.htmlparser.tags.data.CompositeTagData;
! import org.htmlparser.tags.data.TagData;
! import org.htmlparser.util.NodeList;
! import org.htmlparser.util.ParserException;
! 
  /**
!  * The ScriptScanner handles javascript code.
!  * It gathers all interior nodes into one undifferentiated string node.
   */
  public class ScriptScanner extends CompositeTagScanner {
      private static final String SCRIPT_END_TAG = "</SCRIPT>";
***************
*** 68,84 ****
      }
  
!     public Tag scan (Tag tag, Lexer lexer)
!         throws ParserException {
!         try {
!             ScriptScannerHelper helper =
!                 new ScriptScannerHelper(tag, lexer, this);
!             return helper.scan();
  
          }
!         catch (Exception e) {
!             throw new ParserException("Error in ScriptScanner: ",e);
          }
-     }
  
  
      /**
--- 74,188 ----
      }
  
!     /**
!      * Scan for script.
!      * Accumulates nodes returned from the lexer, until &lt;/SCRIPT&gt;,
!      * &lt;BODY&gt; or &lt;HTML&gt; is encountered. Replaces the node factory
!      * in the lexer with a new Parser to avoid other scanners missing their 
!      * end tags and accumulating even the &lt;/SCRIPT&gt;.
!      */
!     public Tag scan (Tag tag, String url, Lexer lexer)
!         throws ParserException
!     {
!         Node node;
!         boolean done;
!         int position;
!         StringNode last;
!         Tag end;
!         NodeFactory factory;
!         TagData data;
!         Tag ret;
! 
!         done = false;
!         last = null;
!         end = null;
!         factory = lexer.getNodeFactory ();
!         lexer.setNodeFactory (new Parser ()); // no scanners on a new Parser right?
!         try
!         {
!             do
!             {
!                 position = lexer.getPosition ();
!                 node = lexer.nextNode (true);
!                 if (null == node)
!                     break;
!                 else
!                     if (node instanceof Tag)
!                         if (   ((Tag)node).isEndTag ()
!                             && ((Tag)node).getTagName ().equals (MATCH_NAME[0]))
!                         {
!                             end = (Tag)node;
!                             done = true;
!                         }
!                         else if (isTagToBeEndedFor ((Tag)node))
!                         {
!                             lexer.setPosition (position);
!                             done = true;
!                         }
!                         else
!                         {
!                             // must be a string, even though it looks like a tag
!                             if (null != last)
!                                 // append it to the previous one
!                                 last.setEndPosition (node.elementEnd ());
!                             else
!                                 // TODO: need to remove this cast
!                                 last = (StringNode)lexer.createStringNode (lexer, node.elementBegin (), node.elementEnd ());
!                         }
!                     else if (node instanceof RemarkNode)
!                     {
!                         if (null != last)
!                             last.setEndPosition (node.getEndPosition ());
!                         else
!                             // TODO: need to remove this cast
!                             last = (StringNode)lexer.createStringNode (lexer, node.elementBegin (), node.elementEnd ());
!                     }
!                     else // StringNode
!                     {
!                         if (null != last)
!                             last.setEndPosition (node.getEndPosition ());
!                         else
!                             // TODO: need to remove this cast
!                             last = (StringNode)node;
!                     }
! 
!             }
!             while (!done);
! 
!             // build new string tag if required
!             if (null == last)
!                 // TODO: need to remove this cast
!                 last = (StringNode)factory.createStringNode (lexer, position, position);
!             // build new end tag if required
!             if (null == end)
!             {
!                 data =  new TagData(
!                     "/" + tag.getTagName (),
!                     tag.getEndPosition (),
!                     new Vector (),
!                     lexer.getPage ().getUrl (),
!                     false);
!                 end = new Tag (data);
! //TODO: use the factory: end = factory.createTagNode (mLexer, last.getEndPosition (), last.getEndPosition () + 
!             }
!             data =  new TagData(
!                 lexer.getPage (),
!                 tag.elementBegin(),
!                 end.elementEnd(),
!                 tag.getAttributesEx (),
!                 lexer.getPage ().getUrl (),
!                 tag.isEmptyXmlTag ());
  
+             ret = createTag(
+                 data,
+                 new CompositeTagData(tag, end, new NodeList (last))
+                 );
          }
!         finally
!         {
!             lexer.setNodeFactory (factory);
          }
  
+         return (ret);
+     }
  
      /**
***************
*** 87,95 ****
       * @return String containing the end tag to search for, i.e. &lt;/SCRIPT&gt;
       */
!     public String getEndTag() {
          return SCRIPT_END_TAG;
      }
- 
- 
- 
  }
--- 191,197 ----
       * @return String containing the end tag to search for, i.e. &lt;/SCRIPT&gt;
       */
!     public String getEndTag()
!     {
          return SCRIPT_END_TAG;
      }
  }

Index: TagScanner.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/TagScanner.java,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** TagScanner.java	28 Sep 2003 15:33:58 -0000	1.41
--- TagScanner.java	5 Oct 2003 13:49:53 -0000	1.42
***************
*** 43,46 ****
--- 43,47 ----
  import org.htmlparser.tags.Tag;
  import org.htmlparser.tags.data.TagData;
+ import org.htmlparser.util.NodeIterator;
  import org.htmlparser.util.ParserException;
  import org.htmlparser.util.ParserFeedback;
***************
*** 142,188 ****
      return true;
    }
!   
! //  public static String extractXMLData(Node node, String tagName, NodeReader reader) throws ParserException{
! //    try {
! //      String xmlData = "";
! //
! //      boolean xmlTagFound = isXMLTagFound(node, tagName);
! //      if (xmlTagFound) {
! //        try{
! //          do {
! //            node = reader.readElement();
! //            if (node!=null) {
! //              if (node instanceof StringNode) {
! //                StringNode stringNode = (StringNode)node;
! //                if (xmlData.length()>0) xmlData+=" ";
! //            xmlData += stringNode.getText();
! //          } else if (!(node instanceof org.htmlparser.tags.EndTag))
! //            xmlTagFound = false;
! //        }
! //      }
! //      while (node instanceof StringNode);
! //
! //    }
! //
! //    catch (Exception e) {
! //        throw new ParserException("HTMLTagScanner.extractXMLData() : error while trying to find xml tag",e);
! //        }
! //      }
! //      if (xmlTagFound) {
! //          if (node!=null) {
! //            if (node instanceof org.htmlparser.tags.EndTag) {
! //              org.htmlparser.tags.EndTag endTag = (org.htmlparser.tags.EndTag)node;
! //              if (!endTag.getText().equals(tagName)) xmlTagFound = false;
! //            }
! //
! //          }
! //
! //      }
! //      if (xmlTagFound) return xmlData; else return null;
! //    }
! //    catch (Exception e) {
! //        throw new ParserException("HTMLTagScanner.extractXMLData() : Error occurred while trying to extract xml tag",e);
! //        }
! //      }
  
      public String getFilter() {
--- 143,215 ----
      return true;
    }
! 
!   /**
!    * Pull the text between two matching capitalized 'XML' tags.
!    * @deprecated This reads ahead on your iterator and doesn't put them back if it's not an XML tag.
!    */
!   public static String extractXMLData (Node node, String tagName, NodeIterator iterator)
!     throws
!         ParserException
!   {
!       try
!       {
!           String xmlData = "";
!           
!           boolean xmlTagFound = isXMLTagFound (node, tagName);
!           if (xmlTagFound)
!           {
!               try
!               {
!                   do
!                   {
!                       node = iterator.nextNode ();
!                       if (node!=null)
!                       {
!                           if (node instanceof StringNode)
!                           {
!                               StringNode stringNode = (StringNode)node;
!                               if (xmlData.length ()>0)
!                                 xmlData+=" ";
!                               xmlData += stringNode.getText ();
!                           }
!                           else
!                               if (!(node instanceof Tag && ((Tag)node).isEndTag ()))
!                                 xmlTagFound = false;
!                       }
!                   }
!                   while (node instanceof StringNode);
!                   
!               }
!               
!               catch (Exception e)
!               {
!                   throw new ParserException ("TagScanner.extractXMLData() : error while trying to find xml tag",e);
!               }
!           }
!           // check end tag matches start tag
!           if (xmlTagFound)
!           {
!               if (node!=null)
!               {
!                   if (node instanceof Tag && ((Tag)node).isEndTag ())
!                   {
!                       Tag endTag = (Tag)node;
!                       if (!endTag.getTagName ().equals (tagName))
!                           xmlTagFound = false;
!                   }
!                   
!               }
!               
!           }
!           if (xmlTagFound)
!              return xmlData;
!           else
!               return null;
!       }
!       catch (Exception e)
!       {
!           throw new ParserException ("TagScanner.extractXMLData() : Error occurred while trying to extract xml tag",e);
!       }
!   }
  
      public String getFilter() {

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests ParserTest.java,1.43,1.44 ParserTestCase.java,1.31,1.32

From: <der...@us...> - 2003-10-05 13:49:59

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1:/tmp/cvs-serv9618/tests

Modified Files:
	ParserTest.java ParserTestCase.java 
Log Message:
Add bean like accessors for positions on Node, AbstractNode and AbstractNodeDecorator.
Handle null page in Cursor.
Add smartquotes mode in Lexer and CompositeTagScannerHelper.
Add simple name constructor in Attribute.
Remove emptyxmltag member, replace with computing accessors in TagNode.
Removed ScriptScannerHelper and moved scanning logic to ScriptScanner.
Reworked extractImageLocn in ImageScanner
Implement extractXMLData in TagScanner.
Made virtual tags zero length in TagData.
Added push() to IteratorImpl.
Added single node constructor to NodeList.
Numerous and various test adjustments. Still 133 failures.



Index: ParserTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTest.java,v
retrieving revision 1.43
retrieving revision 1.44
diff -C2 -d -r1.43 -r1.44
*** ParserTest.java	28 Sep 2003 15:33:58 -0000	1.43
--- ParserTest.java	5 Oct 2003 13:49:53 -0000	1.44
***************
*** 626,631 ****
              node.collectInto(collectionList,LinkTag.class);
          }
!         // NOTE: the link within the script is also found... this may be debatable
!         assertEquals("Size of collection vector should be 12",12,collectionList.size());
          // All items in collection vector should be links
          for (SimpleNodeIterator e = collectionList.elements();e.hasMoreNodes();) {
--- 626,630 ----
              node.collectInto(collectionList,LinkTag.class);
          }
!         assertEquals("Size of collection vector should be 11",11,collectionList.size());
          // All items in collection vector should be links
          for (SimpleNodeIterator e = collectionList.elements();e.hasMoreNodes();) {

Index: ParserTestCase.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** ParserTestCase.java	3 Oct 2003 02:15:20 -0000	1.31
--- ParserTestCase.java	5 Oct 2003 13:49:53 -0000	1.32
***************
*** 32,35 ****
--- 32,36 ----
  import java.io.StringReader;
  import java.util.Iterator;
+ import java.util.Vector;
  
  import junit.framework.TestCase;
***************
*** 44,48 ****
--- 45,51 ----
  import org.htmlparser.tags.InputTag;
  import org.htmlparser.tags.Tag;
+ import org.htmlparser.tags.data.TagData;
  import org.htmlparser.util.DefaultParserFeedback;
+ import org.htmlparser.util.IteratorImpl;
  import org.htmlparser.util.NodeIterator;
  import org.htmlparser.util.ParserException;
***************
*** 151,154 ****
--- 154,158 ----
                          " \n\n**** COMPLETE STRING ACTUAL***\n" + actual
                      );
+                     System.out.println ("string differs, expected \"" + expected + "\", actual \"" + actual + "\"");
                      fail(errorMsg.toString());
              }
***************
*** 171,174 ****
--- 175,180 ----
              msg.append("-->\n").append(node[i].toHtml()).append("\n");
          }
+         if (nodeCountExpected != nodeCount)
+             System.out.println ("node count differs, expected " + nodeCountExpected + ", actual " + nodeCount);
          assertEquals("Number of nodes parsed didn't match, nodes found were :\n"+msg.toString(),nodeCountExpected,nodeCount);
      }
***************
*** 230,235 ****
                  nextActualNode
              );
!             fixIfXmlEndTag(resultParser, nextActualNode);
!             fixIfXmlEndTag(expectedParser, nextExpectedNode);
              assertSameType(displayMessage, nextExpectedNode, nextActualNode);
              assertTagEquals(displayMessage, nextExpectedNode, nextActualNode);
--- 236,241 ----
                  nextActualNode
              );
!             fixIfXmlEndTag(actualIterator, nextActualNode);
!             fixIfXmlEndTag(expectedIterator, nextExpectedNode);
              assertSameType(displayMessage, nextExpectedNode, nextActualNode);
              assertTagEquals(displayMessage, nextExpectedNode, nextActualNode);
***************
*** 288,293 ****
      }
  
!     private void fixIfXmlEndTag (Parser parser, Node node)
      {
          if (node instanceof Tag)
          {
--- 294,300 ----
      }
  
!     private void fixIfXmlEndTag (NodeIterator iterator, Node node)
      {
+         TagData data;
          if (node instanceof Tag)
          {
***************
*** 295,307 ****
              if (tag.isEmptyXmlTag())
              {
!                 System.out.println (tag);
! //                // Add end tag
! //                String currLine = parser.getReader().getCurrentLine();
! //                int pos = parser.getReader().getLastReadPosition();
! //                currLine =
! //                    currLine.substring(0,pos+1)+
! //                    "</"+tag.getTagName()+">"+
! //                    currLine.substring(pos+1,currLine.length());
! //                parser.getReader().changeLine(currLine);
              }
          }
--- 302,311 ----
              if (tag.isEmptyXmlTag())
              {
!                 tag.setEmptyXmlTag (false);
!                 data = new TagData
!                     ("/" + tag.getTagName (), tag.elementEnd (), new Vector (), "", false);
!                 node = new Tag (data);
!                 // cheat here and poink the new node into the iterator
!                 ((IteratorImpl)iterator).push (node);
              }
          }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tags/data TagData.java,1.33,1.34

From: <der...@us...> - 2003-10-05 13:49:58

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/data
In directory sc8-pr-cvs1:/tmp/cvs-serv9618/tags/data

Modified Files:
	TagData.java 
Log Message:
Add bean like accessors for positions on Node, AbstractNode and AbstractNodeDecorator.
Handle null page in Cursor.
Add smartquotes mode in Lexer and CompositeTagScannerHelper.
Add simple name constructor in Attribute.
Remove emptyxmltag member, replace with computing accessors in TagNode.
Removed ScriptScannerHelper and moved scanning logic to ScriptScanner.
Reworked extractImageLocn in ImageScanner
Implement extractXMLData in TagScanner.
Made virtual tags zero length in TagData.
Added push() to IteratorImpl.
Added single node constructor to NodeList.
Numerous and various test adjustments. Still 133 failures.



Index: TagData.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/data/TagData.java,v
retrieving revision 1.33
retrieving revision 1.34
diff -C2 -d -r1.33 -r1.34
*** TagData.java	3 Oct 2003 02:15:20 -0000	1.33
--- TagData.java	5 Oct 2003 13:49:53 -0000	1.34
***************
*** 32,35 ****
--- 32,36 ----
  import org.htmlparser.lexer.Cursor;
  import org.htmlparser.lexer.Page;
+ import org.htmlparser.lexer.nodes.Attribute;
  import org.htmlparser.util.ParserException;
  
***************
*** 68,76 ****
              null,
              tagBegin,
!             tagBegin + name.length () + 2 + (isXmlEndTag ? 1 : 0),
              attributes,
              urlBeingParsed,
              isXmlEndTag);
!         // todo: add attribute sizes
      }
      
--- 69,80 ----
              null,
              tagBegin,
!             tagBegin, // a virtual node has no length, + name.length () + 2 + (isXmlEndTag ? 1 : 0),
!                       // was a todo: add attribute sizes to length
              attributes,
              urlBeingParsed,
              isXmlEndTag);
!         if (null != name && (0 == attributes.size ()))
!             attributes.insertElementAt (new Attribute (name), 0);
!             
      }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tags/data TagData.java,1.32,1.33

From: <der...@us...> - 2003-10-03 02:15:26

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/data
In directory sc8-pr-cvs1:/tmp/cvs-serv23938/tags/data

Modified Files:
	TagData.java 
Log Message:
Fix all testcases generating exceptions. Still 160 failures.



Index: TagData.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/data/TagData.java,v
retrieving revision 1.32
retrieving revision 1.33
diff -C2 -d -r1.32 -r1.33
*** TagData.java	28 Sep 2003 15:33:58 -0000	1.32
--- TagData.java	3 Oct 2003 02:15:20 -0000	1.33
***************
*** 30,34 ****
--- 30,36 ----
  
  import java.util.Vector;
+ import org.htmlparser.lexer.Cursor;
  import org.htmlparser.lexer.Page;
+ import org.htmlparser.util.ParserException;
  
  public class TagData {
***************
*** 121,124 ****
--- 123,136 ----
          tagBegin = 0;
          tagEnd = tagContents.length ();
+         // TODO: this really needs work
+         try
+         {
+             Cursor cursor = new Cursor (mPage, tagBegin);
+             for (int i = tagBegin; i < tagEnd; i++)
+                 mPage.getCharacter (cursor);
+         }
+         catch (ParserException pe)
+         {
+         }
          mAttributes = attributes;
          urlBeingParsed = url;

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes StringNode.java,1.8,1.9

From: <der...@us...> - 2003-10-03 02:15:26

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1:/tmp/cvs-serv23938/lexer/nodes

Modified Files:
	StringNode.java 
Log Message:
Fix all testcases generating exceptions. Still 160 failures.



Index: StringNode.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/StringNode.java,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** StringNode.java	28 Sep 2003 15:33:57 -0000	1.8
--- StringNode.java	3 Oct 2003 02:15:19 -0000	1.9
***************
*** 70,73 ****
--- 70,83 ----
          nodeBegin = 0;
          nodeEnd = text.length ();
+         // TODO: this really needs work
+         try
+         {
+             Cursor cursor = new Cursor (mPage, nodeBegin);
+             for (int i = nodeBegin; i < nodeEnd; i++)
+                 mPage.getCharacter (cursor);
+         }
+         catch (ParserException pe)
+         {
+         }
      }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tags CompositeTag.java,1.57,1.58 FrameSetTag.java,1.26,1.27 SelectTag.java,1.28,1.29

From: <der...@us...> - 2003-10-03 02:15:25

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags
In directory sc8-pr-cvs1:/tmp/cvs-serv23938/tags

Modified Files:
	CompositeTag.java FrameSetTag.java SelectTag.java 
Log Message:
Fix all testcases generating exceptions. Still 160 failures.



Index: CompositeTag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/CompositeTag.java,v
retrieving revision 1.57
retrieving revision 1.58
diff -C2 -d -r1.57 -r1.58
*** CompositeTag.java	28 Sep 2003 19:30:04 -0000	1.57
--- CompositeTag.java	3 Oct 2003 02:15:19 -0000	1.58
***************
*** 119,137 ****
      }
  
!     protected void putChildrenInto(StringBuffer sb) {
!         Node node,prevNode=startTag;
!         for (SimpleNodeIterator e=children();e.hasMoreNodes();) {
!             node = e.nextNode();
!             if (prevNode!=null) {
!                 if (prevNode.elementEnd()>node.elementBegin()) {
!                     // Its a new line
!                     sb.append(Parser.getLineSeparator());
!                 }
!             }
!             sb.append(node.toHtml());
!             prevNode=node;
!         }
!         if (prevNode.elementEnd()>endTag.elementBegin()) {
!             sb.append(Parser.getLineSeparator());
          }
      }
--- 119,129 ----
      }
  
!     protected void putChildrenInto(StringBuffer sb)
!     {
!         Node node;
!         for (SimpleNodeIterator e = children (); e.hasMoreNodes ();)
!         {
!             node = e.nextNode ();
!             sb.append (node.toHtml ());
          }
      }

Index: FrameSetTag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/FrameSetTag.java,v
retrieving revision 1.26
retrieving revision 1.27
diff -C2 -d -r1.26 -r1.27
*** FrameSetTag.java	22 Sep 2003 02:40:01 -0000	1.26
--- FrameSetTag.java	3 Oct 2003 02:15:20 -0000	1.27
***************
*** 29,32 ****
--- 29,33 ----
  package org.htmlparser.tags;
  
+ import org.htmlparser.Node;
  import org.htmlparser.tags.data.CompositeTagData;
  import org.htmlparser.tags.data.TagData;
***************
*** 78,85 ****
      public FrameTag getFrame(String frameName) {
          boolean found = false;
          FrameTag frameTag=null;
!         for (SimpleNodeIterator e=frames.elements();e.hasMoreNodes() && !found;) {
!             frameTag = (FrameTag)e.nextNode();
!             if (frameTag.getFrameName().toUpperCase().equals(frameName.toUpperCase())) found = true;
          }
          if (found)
--- 79,93 ----
      public FrameTag getFrame(String frameName) {
          boolean found = false;
+         Node node;
          FrameTag frameTag=null;
!         for (SimpleNodeIterator e=frames.elements();e.hasMoreNodes() && !found;)
!         {
!             node = e.nextNode();
!             if (node instanceof FrameTag)
!             {
!                 frameTag = (FrameTag)node;
!                 if (frameTag.getFrameName().toUpperCase().equals(frameName.toUpperCase()))
!                     found = true;
!             }
          }
          if (found)

Index: SelectTag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/SelectTag.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** SelectTag.java	22 Sep 2003 02:40:01 -0000	1.28
--- SelectTag.java	3 Oct 2003 02:15:20 -0000	1.29
***************
*** 29,32 ****
--- 29,33 ----
  package org.htmlparser.tags;
  
+ import org.htmlparser.Node;
  
  import org.htmlparser.tags.data.CompositeTagData;
***************
*** 64,67 ****
--- 65,69 ----
          StringBuffer lString;
          NodeList children;
+         Node node;
  
          lString = new StringBuffer(ParserUtils.toString(this));
***************
*** 69,74 ****
          for(int i=0;i<children.size(); i++)
          {
!             OptionTag optionTag = (OptionTag)children.elementAt(i);
!             lString.append(optionTag.toString()).append("\n");
          }
  
--- 71,80 ----
          for(int i=0;i<children.size(); i++)
          {
!             node = children.elementAt(i);
!             if (node instanceof OptionTag)
!             {
!                 OptionTag optionTag = (OptionTag)node;
!                 lString.append(optionTag.toString()).append("\n");
!             }
          }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests AllTests.java,1.51,1.52 ParserTestCase.java,1.30,1.31

From: <der...@us...> - 2003-10-03 02:15:25

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1:/tmp/cvs-serv23938/tests

Modified Files:
	AllTests.java ParserTestCase.java 
Log Message:
Fix all testcases generating exceptions. Still 160 failures.



Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/AllTests.java,v
retrieving revision 1.51
retrieving revision 1.52
diff -C2 -d -r1.51 -r1.52
*** AllTests.java	22 Sep 2003 02:40:03 -0000	1.51
--- AllTests.java	3 Oct 2003 02:15:20 -0000	1.52
***************
*** 101,118 ****
      }
  
!     public static TestSuite suite() {
!         TestSuite suite = new TestSuite("HTMLParser Tests");
!         TestSuite basic = new TestSuite("Basic Tests");
!         basic.addTestSuite(ParserTest.class);
!         suite.addTest(basic);
!         suite.addTest(org.htmlparser.tests.scannersTests.AllTests.suite());
!         suite.addTest(org.htmlparser.tests.utilTests.AllTests.suite());
!         suite.addTest(org.htmlparser.tests.tagTests.AllTests.suite());
!         suite.addTest(org.htmlparser.tests.visitorsTests.AllTests.suite());
!         suite.addTest(org.htmlparser.tests.parserHelperTests.AllTests.suite());
!         suite.addTest(org.htmlparser.tests.nodeDecoratorTests.AllTests.suite());
!         suite.addTest(AssertXmlEqualsTest.suite());
!         suite.addTest(LineNumberAssignedByNodeReaderTest.suite());
!         return suite;
      }
  }
--- 101,124 ----
      }
  
!     public static TestSuite suite()
!     {
!         TestSuite suite;
!         TestSuite sub;
!         
!         suite = new TestSuite ("HTMLParser Tests");
!         sub = new TestSuite ("Basic Tests");
!         sub.addTestSuite (ParserTest.class);
!         sub.addTestSuite (AssertXmlEqualsTest.class);
!         sub.addTestSuite (FunctionalTests.class);
!         sub.addTestSuite (LineNumberAssignedByNodeReaderTest.class);
!         suite.addTest (sub);
!         suite.addTest (org.htmlparser.tests.scannersTests.AllTests.suite ());
!         suite.addTest (org.htmlparser.tests.utilTests.AllTests.suite ());
!         suite.addTest (org.htmlparser.tests.tagTests.AllTests.suite ());
!         suite.addTest (org.htmlparser.tests.visitorsTests.AllTests.suite ());
!         suite.addTest (org.htmlparser.tests.parserHelperTests.AllTests.suite ());
!         suite.addTest (org.htmlparser.tests.nodeDecoratorTests.AllTests.suite ());
! 
!         return (suite);
      }
  }

Index: ParserTestCase.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** ParserTestCase.java	2 Oct 2003 23:48:53 -0000	1.30
--- ParserTestCase.java	3 Oct 2003 02:15:20 -0000	1.31
***************
*** 224,228 ****
              nextExpectedNode = getNextNodeUsing(expectedIterator);
              nextActualNode = getNextNodeUsing(actualIterator);
! 
              assertStringValueMatches(
                  displayMessage,
--- 224,228 ----
              nextExpectedNode = getNextNodeUsing(expectedIterator);
              nextActualNode = getNextNodeUsing(actualIterator);
!             assertNotNull (nextActualNode);
              assertStringValueMatches(
                  displayMessage,
***************
*** 288,297 ****
      }
  
!     private void fixIfXmlEndTag(Parser parser, Node node) {
!         if (node instanceof Tag) {
              Tag tag = (Tag)node;
!             if (tag.isEmptyXmlTag()) {
! // oh crap...
!                 // Add end tag
  //                String currLine = parser.getReader().getCurrentLine();
  //                int pos = parser.getReader().getLastReadPosition();
--- 288,300 ----
      }
  
!     private void fixIfXmlEndTag (Parser parser, Node node)
!     {
!         if (node instanceof Tag)
!         {
              Tag tag = (Tag)node;
!             if (tag.isEmptyXmlTag())
!             {
!                 System.out.println (tag);
! //                // Add end tag
  //                String currLine = parser.getReader().getCurrentLine();
  //                int pos = parser.getReader().getLastReadPosition();
***************
*** 347,351 ****
          while (i.hasNext()) {
              String key = (String)i.next();
!             if (key=="/") continue;
              String expectedValue =
                  expectedTag.getAttribute(key);
--- 350,354 ----
          while (i.hasNext()) {
              String key = (String)i.next();
!             if (key.trim().equals ("/")) continue;
              String expectedValue =
                  expectedTag.getAttribute(key);

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes TagNode.java,1.13,1.14

From: <der...@us...> - 2003-10-03 00:20:56

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1:/tmp/cvs-serv2396/lexer/nodes

Modified Files:
	TagNode.java 
Log Message:
Updated tag line numbers test.
***** Line numbers reported by tags are now zero based, not one based. *****
Strip off possible ending slash in tag name.



Index: TagNode.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** TagNode.java	2 Oct 2003 23:48:53 -0000	1.13
--- TagNode.java	3 Oct 2003 00:20:44 -0000	1.14
***************
*** 365,369 ****
       * <em>
       * Note: This value is converted to uppercase and does not
!      * begin with "/" if it is an end tag.
       * To get at the original text of the tag name use
       * {@link #getRawTagName getRawTagName()}.
--- 365,370 ----
       * <em>
       * Note: This value is converted to uppercase and does not
!      * begin with "/" if it is an end tag. Nor does it end with
!      * a slash in the case of an XML type tag.
       * To get at the original text of the tag name use
       * {@link #getRawTagName getRawTagName()}.
***************
*** 387,390 ****
--- 388,393 ----
                  if (ret.startsWith ("/"))
                      ret = ret.substring (1);
+                 if (ret.endsWith ("/"))
+                     ret = ret.substring (0, ret.length () - 1);
              }
          }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests LineNumberAssignedByNodeReaderTest.java,1.22,1.23

From: <der...@us...> - 2003-10-03 00:20:56

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1:/tmp/cvs-serv2396/tests

Modified Files:
	LineNumberAssignedByNodeReaderTest.java 
Log Message:
Updated tag line numbers test.
***** Line numbers reported by tags are now zero based, not one based. *****
Strip off possible ending slash in tag name.



Index: LineNumberAssignedByNodeReaderTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/LineNumberAssignedByNodeReaderTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** LineNumberAssignedByNodeReaderTest.java	22 Sep 2003 02:40:04 -0000	1.22
--- LineNumberAssignedByNodeReaderTest.java	3 Oct 2003 00:20:44 -0000	1.23
***************
*** 57,70 ****
       */
      public void testLineNumbers() throws ParserException {
!         testLineNumber("<Custom/>", 1, 0, 1, 1);
!         testLineNumber("<Custom />", 1, 0, 1, 1);
!         testLineNumber("<Custom></Custom>", 1, 0, 1, 1);
!         testLineNumber("<Custom>Content</Custom>", 1, 0, 1, 1);
!         testLineNumber("<Custom>Content<Custom></Custom>", 1, 0, 1, 1);
          testLineNumber(
              "<Custom>\n" +
              "   Content\n" +
              "</Custom>",
!             1, 0, 1, 3
          );
          testLineNumber(
--- 57,70 ----
       */
      public void testLineNumbers() throws ParserException {
!         testLineNumber("<Custom/>", 1, 0, 0, 0);
!         testLineNumber("<Custom />", 1, 0, 0, 0);
!         testLineNumber("<Custom></Custom>", 1, 0, 0, 0);
!         testLineNumber("<Custom>Content</Custom>", 1, 0, 0, 0);
!         testLineNumber("<Custom>Content<Custom></Custom>", 1, 0, 0, 0);
          testLineNumber(
              "<Custom>\n" +
              "   Content\n" +
              "</Custom>",
!             1, 0, 0, 2
          );
          testLineNumber(
***************
*** 73,77 ****
              "   Content\n" +
              "</Custom>",
!             2, 1, 2, 4
          );
          testLineNumber(
--- 73,77 ----
              "   Content\n" +
              "</Custom>",
!             2, 1, 1, 3
          );
          testLineNumber(
***************
*** 80,84 ****
              "   <Custom>SubContent</Custom>\n" +
              "</Custom>",
!             2, 1, 2, 4
          );
          char[] oneHundredNewLines = new char[100];
--- 80,84 ----
              "   <Custom>SubContent</Custom>\n" +
              "</Custom>",
!             2, 1, 1, 3
          );
          char[] oneHundredNewLines = new char[100];
***************
*** 90,94 ****
              "   <Custom>SubContent</Custom>\n" +
              "</Custom>",
!             2, 1, 102, 104
          );
      }
--- 90,94 ----
              "   <Custom>SubContent</Custom>\n" +
              "</Custom>",
!             2, 1, 101, 103
          );
      }

[Htmlparser-cvs] htmlparser/src/org/htmlparser Parser.java,1.64,1.65

From: <der...@us...> - 2003-10-03 00:20:56

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1:/tmp/cvs-serv2396

Modified Files:
	Parser.java 
Log Message:
Updated tag line numbers test.
***** Line numbers reported by tags are now zero based, not one based. *****
Strip off possible ending slash in tag name.



Index: Parser.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v
retrieving revision 1.64
retrieving revision 1.65
diff -C2 -d -r1.64 -r1.65
*** Parser.java	29 Sep 2003 00:00:38 -0000	1.64
--- Parser.java	3 Oct 2003 00:20:44 -0000	1.65
***************
*** 603,607 ****
          try
          {
!             if (null == scanners.get ("-m"))
              {
                  addScanner (new MetaTagScanner ("-m"));
--- 603,607 ----
          try
          {
!             if (null == scanners.get ("META"))
              {
                  addScanner (new MetaTagScanner ("-m"));
***************
*** 644,648 ****
          {
              if (remove_scanner)
!                 scanners.remove ("-m");
          }
  
--- 644,648 ----
          {
              if (remove_scanner)
!                 scanners.remove ("META");
          }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/tagTests TagTest.java,1.44,1.45

From: <der...@us...> - 2003-10-02 23:49:05

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests
In directory sc8-pr-cvs1:/tmp/cvs-serv28867/tests/tagTests

Modified Files:
	TagTest.java 
Log Message:
Moved SpecialHashTable to util.
Fixed some attribute bugs and some test cases.



Index: TagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TagTest.java,v
retrieving revision 1.44
retrieving revision 1.45
diff -C2 -d -r1.44 -r1.45
*** TagTest.java	28 Sep 2003 15:33:59 -0000	1.44
--- TagTest.java	2 Oct 2003 23:48:53 -0000	1.45
***************
*** 40,43 ****
--- 40,44 ----
  import org.htmlparser.util.NodeIterator;
  import org.htmlparser.util.ParserException;
+ import org.htmlparser.util.SpecialHashtable;
  
  public class TagTest extends ParserTestCase
***************
*** 54,63 ****
       */
      public void testBodyTagBug1() throws ParserException {
!         createParser("<BODY aLink=#ff0000 bgColor=#ffffff link=#0000cc onload=setfocus() text=#000000\nvLink=#551a8b>");
          parseAndAssertNodeCount(1);
          // The node should be an Tag
          assertTrue("Node should be a Tag",node[0] instanceof Tag);
          Tag tag = (Tag)node[0];
!         assertEquals("Contents of the tag","BODY aLink=#ff0000 bgColor=#ffffff link=#0000cc onload=setfocus() text=#000000\r\nvLink=#551a8b",tag.getText());
      }
  
--- 55,67 ----
       */
      public void testBodyTagBug1() throws ParserException {
!         String body = "BODY aLink=#ff0000 bgColor=#ffffff link=#0000cc "
!             + "onload=setfocus() text=#000000\nvLink=#551a8b";
!         createParser("<" + body + ">");
          parseAndAssertNodeCount(1);
          // The node should be an Tag
          assertTrue("Node should be a Tag",node[0] instanceof Tag);
          Tag tag = (Tag)node[0];
!         String text = tag.getText();
!         assertEquals("Contents of the tag",body,text);
      }
  
***************
*** 71,79 ****
       */
      public void testLargeTagBug() throws ParserException {
!         createParser(
!             "<MYTAG abcd\n"+
              "efgh\n"+
              "ijkl\n"+
!             "mnop>"
          );
          parseAndAssertNodeCount(1);
--- 75,84 ----
       */
      public void testLargeTagBug() throws ParserException {
!         String mytag = "MYTAG abcd\n"+
              "efgh\n"+
              "ijkl\n"+
!             "mnop";
!         createParser(
!             "<" + mytag + ">"
          );
          parseAndAssertNodeCount(1);
***************
*** 81,85 ****
          assertTrue("Node should be a Tag",node[0] instanceof Tag);
          Tag tag = (Tag)node[0];
!         assertEquals("Contents of the tag","MYTAG abcd\r\nefgh\r\nijkl\r\nmnop",tag.getText());
  
  
--- 86,90 ----
          assertTrue("Node should be a Tag",node[0] instanceof Tag);
          Tag tag = (Tag)node[0];
!         assertEquals("Contents of the tag",mytag,tag.getText());
  
  
***************
*** 152,156 ****
                  tag = (Tag)node;
                  h = tag.getAttributes();
!                 a = (String)h.get(Tag.TAGNAME);
                  href = (String)h.get("HREF");
                  myValue = (String)h.get("MYPARAMETER");
--- 157,161 ----
                  tag = (Tag)node;
                  h = tag.getAttributes();
!                 a = (String)h.get(SpecialHashtable.TAGNAME);
                  href = (String)h.get("HREF");
                  myValue = (String)h.get("MYPARAMETER");
***************
*** 222,226 ****
                  tag = (Tag)node;
                  h = tag.getAttributes();
!                 a = (String)h.get(Tag.TAGNAME);
                  href = (String)h.get("HREF");
                  myValue = (String)h.get("MYPARAMETER");
--- 227,231 ----
                  tag = (Tag)node;
                  h = tag.getAttributes();
!                 a = (String)h.get(SpecialHashtable.TAGNAME);
                  href = (String)h.get("HREF");
                  myValue = (String)h.get("MYPARAMETER");
***************
*** 290,294 ****
                  tag = (Tag)node;
                  h = tag.getAttributes();
!                 a = (String)h.get(Tag.TAGNAME);
                  nice = (String)h.get("YOURPARAMETER");
                  assertEquals ("Link tag (A)",a,"A");
--- 295,299 ----
                  tag = (Tag)node;
                  h = tag.getAttributes();
!                 a = (String)h.get(SpecialHashtable.TAGNAME);
                  nice = (String)h.get("YOURPARAMETER");
                  assertEquals ("Link tag (A)",a,"A");
***************
*** 354,376 ****
  
      public void testToHTML() throws ParserException {
!         String testHTML = new String(
!             "<MYTAG abcd\n"+
              "efgh\n"+
              "ijkl\n"+
!             "mnop>\n"+
              "<TITLE>Hello</TITLE>\n"+
!             "<A HREF=\"Hello.html\">Hey</A>"
!         );
          createParser(testHTML);
!         parseAndAssertNodeCount(7);
          // The node should be an Tag
          assertTrue("1st Node should be a Tag",node[0] instanceof Tag);
          Tag tag = (Tag)node[0];
!         assertStringEquals("toHTML()","<MYTAG EFGH ABCD MNOP IJKL>",tag.toHtml());
!         assertTrue("2nd Node should be a Tag",node[1] instanceof Tag);
!         assertTrue("5th Node should be a Tag",node[4] instanceof Tag);
!         tag = (Tag)node[1];
          assertEquals("Raw String of the tag","<TITLE>",tag.toHtml());
!         tag = (Tag)node[4];
          assertEquals("Raw String of the tag","<A HREF=\"Hello.html\">",tag.toHtml());
      }
--- 359,381 ----
  
      public void testToHTML() throws ParserException {
!         String tag1 = "<MYTAG abcd\n"+
              "efgh\n"+
              "ijkl\n"+
!             "mnop>";
!         String testHTML = tag1 +
!             "\n"+
              "<TITLE>Hello</TITLE>\n"+
!             "<A HREF=\"Hello.html\">Hey</A>";
          createParser(testHTML);
!         parseAndAssertNodeCount(9);
          // The node should be an Tag
          assertTrue("1st Node should be a Tag",node[0] instanceof Tag);
          Tag tag = (Tag)node[0];
!         assertStringEquals("toHTML()",tag1,tag.toHtml());
!         assertTrue("3rd Node should be a Tag",node[2] instanceof Tag);
!         assertTrue("5th Node should be a Tag",node[6] instanceof Tag);
!         tag = (Tag)node[2];
          assertEquals("Raw String of the tag","<TITLE>",tag.toHtml());
!         tag = (Tag)node[6];
          assertEquals("Raw String of the tag","<A HREF=\"Hello.html\">",tag.toHtml());
      }
***************
*** 676,680 ****
  
          Hashtable tempHash = tag.getAttributes ();
!         tempHash.put ("BORDER","1");
          tag.setAttributes (tempHash);
  
--- 681,685 ----
  
          Hashtable tempHash = tag.getAttributes ();
!         tempHash.put ("BORDER","\"1\"");
          tag.setAttributes (tempHash);

[Htmlparser-cvs] htmlparser/src/org/htmlparser/util SpecialHashtable.java,NONE,1.1 ParserUtils.java,1.31,1.32

From: <der...@us...> - 2003-10-02 23:49:05

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util
In directory sc8-pr-cvs1:/tmp/cvs-serv28867/util

Modified Files:
	ParserUtils.java 
Added Files:
	SpecialHashtable.java 
Log Message:
Moved SpecialHashTable to util.
Fixed some attribute bugs and some test cases.



--- NEW FILE: SpecialHashtable.java ---
// HTMLParser Library v1_4_20030921 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
//
// For any questions or suggestions, you can write to me at :
// Email :so...@in...
//
// Postal Address :
// Somik Raha
// Extreme Programmer & Coach
// Industrial Logic Corporation
// 2583 Cedar Street, Berkeley,
// CA 94708, USA
// Website : http://www.industriallogic.com

package org.htmlparser.util;

import java.util.Hashtable;

/**
 * Acts like a regular HashTable, except some values are translated in get(String).
 * Specifically, <code>Tag.NULLVALUE</code> is translated to <code>null</code> and
 * <code>Tag.NOTHING</code> is translated to <code>""</code>.
 * This is done for backwards compatibility, users are expecting a HashTable,
 * but Tag.toHTML needs to know when there is no attribute value (&lt;<TAG ATTRIBUTE&gt;)
 * and when the value was not present (&lt;<TAG ATTRIBUTE=&gt;).
 */
public class SpecialHashtable extends Hashtable
{
    /**
     * Special key for the tag name.
     */
    public final static String TAGNAME = "$<TAGNAME>$";

    /**
     * Special value for a null attribute value.
     */
    public final static String NULLVALUE = "$<NULL>$";

    /**
     * Special value for an empty attribute value.
     */
    public final static String NOTHING = "$<NOTHING>$";

    /**
     * Constructs a new, empty hashtable with a default initial capacity (11)
     * and load factor, which is 0.75.
     */
    public SpecialHashtable ()
    {
        super ();
    }

    /**
     * Constructs a new, empty hashtable with the specified initial capacity
     * and default load factor, which is 0.75.
     */
    public SpecialHashtable (int initialCapacity)
    {
        super (initialCapacity);
    }

    /**
     * Constructs a new, empty hashtable with the specified initial capacity
     * and the specified load factor.
     */
    public SpecialHashtable (int initialCapacity, float loadFactor)
    {
        super (initialCapacity, loadFactor);
    }

    /**
     * Returns the value to which the specified key is mapped in this hashtable.
     * This is translated to provide backwards compatibility.
     * @return The translated value of the attribute. <em>This will be
     * <code>null</code> if the attribute is a stand-alone attribute.</em>
     */
    public Object get (Object key)
    {
        Object ret;

        ret = getRaw (key);
        if (NULLVALUE == ret)
            ret = null;
        else if (NOTHING == ret)
            ret = "";

        return (ret);
    }

    /**
     * Returns the raw (untranslated) value to which the specified key is
     * mapped in this hashtable.
     */
    public Object getRaw (Object key)
    {
        return (super.get (key));
    }
}

Index: ParserUtils.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** ParserUtils.java	28 Sep 2003 15:33:59 -0000	1.31
--- ParserUtils.java	2 Oct 2003 23:48:54 -0000	1.32
***************
*** 40,44 ****
  
      public static String toString(Tag tag) {
!         String tagName = tag.getAttribute(Tag.TAGNAME);
          Hashtable attrs = tag.getAttributes();
  
--- 40,44 ----
  
      public static String toString(Tag tag) {
!         String tagName = tag.getRawTagName ();
          Hashtable attrs = tag.getAttributes();
  
***************
*** 50,54 ****
              String key = (String) e.nextElement();
              String value = (String) attrs.get(key);
!             if (!key.equalsIgnoreCase(Tag.TAGNAME) && value.length() > 0)
                  lString.append(key).append(" : ").append(value).append("\n");
          }
--- 50,54 ----
              String key = (String) e.nextElement();
              String value = (String) attrs.get(key);
!             if (!key.equalsIgnoreCase(SpecialHashtable.TAGNAME) && value.length() > 0)
                  lString.append(key).append(" : ").append(value).append("\n");
          }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/parserHelper SpecialHashtable.java,1.6,NONE

From: <der...@us...> - 2003-10-02 23:49:05

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/parserHelper
In directory sc8-pr-cvs1:/tmp/cvs-serv28867/parserHelper

Removed Files:
	SpecialHashtable.java 
Log Message:
Moved SpecialHashTable to util.
Fixed some attribute bugs and some test cases.



--- SpecialHashtable.java DELETED ---

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/temporaryFailures AttributeParserTest.java,1.15,1.16

From: <der...@us...> - 2003-10-02 23:49:03

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures
In directory sc8-pr-cvs1:/tmp/cvs-serv28867/tests/temporaryFailures

Modified Files:
	AttributeParserTest.java 
Log Message:
Moved SpecialHashTable to util.
Fixed some attribute bugs and some test cases.



Index: AttributeParserTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures/AttributeParserTest.java,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** AttributeParserTest.java	28 Sep 2003 15:33:59 -0000	1.15
--- AttributeParserTest.java	2 Oct 2003 23:48:54 -0000	1.16
***************
*** 41,44 ****
--- 41,45 ----
  import org.htmlparser.tags.data.TagData;
  import org.htmlparser.tests.ParserTestCase;
+ import org.htmlparser.util.SpecialHashtable;
  
  public class AttributeParserTest extends ParserTestCase {
***************
*** 96,100 ****
      public void testValueMissing() {
          getParameterTableFor("INPUT type=\"checkbox\" name=\"Authorize\" value=\"Y\" checked");
!         assertEquals("Name of Tag","INPUT",table.get(Tag.TAGNAME));
          assertEquals("Type","checkbox",table.get("TYPE"));
          assertEquals("Name","Authorize",table.get("NAME"));
--- 97,101 ----
      public void testValueMissing() {
          getParameterTableFor("INPUT type=\"checkbox\" name=\"Authorize\" value=\"Y\" checked");
!         assertEquals("Name of Tag","INPUT",table.get(SpecialHashtable.TAGNAME));
          assertEquals("Type","checkbox",table.get("TYPE"));
          assertEquals("Name","Authorize",table.get("NAME"));
***************
*** 116,120 ****
          String value1 = (String)table.get(key1);
          assertEquals("Expected value 1", "Remarks",value1);
!         String key2 = Tag.TAGNAME;
          assertEquals("Expected Value 2","TEXTAREA",table.get(key2));
      }
--- 117,121 ----
          String value1 = (String)table.get(key1);
          assertEquals("Expected value 1", "Remarks",value1);
!         String key2 = SpecialHashtable.TAGNAME;
          assertEquals("Expected Value 2","TEXTAREA",table.get(key2));
      }
***************
*** 122,126 ****
      public void testNullTag(){
          getParameterTableFor("INPUT type=");
!         assertEquals("Name of Tag","INPUT",table.get(Tag.TAGNAME));
          assertEquals("Type","",table.get("TYPE"));
      }
--- 123,127 ----
      public void testNullTag(){
          getParameterTableFor("INPUT type=");
!         assertEquals("Name of Tag","INPUT",table.get(SpecialHashtable.TAGNAME));
          assertEquals("Type","",table.get("TYPE"));
      }
***************
*** 149,153 ****
              "tag name",
              "A",
!             (String)table.get(Tag.TAGNAME)
          );
      }
--- 150,154 ----
              "tag name",
              "A",
!             (String)table.get(SpecialHashtable.TAGNAME)
          );
      }
***************
*** 164,168 ****
      public void testEmptyTag () {
          getParameterTableFor("");
!         assertNotNull ("No Tag.TAGNAME",table.get(Tag.TAGNAME));
      }
  
--- 165,169 ----
      public void testEmptyTag () {
          getParameterTableFor("");
!         assertNotNull ("No Tag.TAGNAME",table.get(SpecialHashtable.TAGNAME));
      }
  
***************
*** 196,200 ****
          {
              getParameterTableFor("body onLoad=defaultStatus=''");
!             String name = (String)table.get(Tag.TAGNAME);
              assertNotNull ("No Tag.TAGNAME", name);
              assertStringEquals("tag name parsed incorrectly", "BODY", name);
--- 197,201 ----
          {
              getParameterTableFor("body onLoad=defaultStatus=''");
!             String name = (String)table.get(SpecialHashtable.TAGNAME);
              assertNotNull ("No Tag.TAGNAME", name);
              assertStringEquals("tag name parsed incorrectly", "BODY", name);

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes Attribute.java,1.8,1.9 TagNode.java,1.12,1.13

From: <der...@us...> - 2003-10-02 23:49:03

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1:/tmp/cvs-serv28867/lexer/nodes

Modified Files:
	Attribute.java TagNode.java 
Log Message:
Moved SpecialHashTable to util.
Fixed some attribute bugs and some test cases.



Index: Attribute.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/Attribute.java,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** Attribute.java	22 Sep 2003 02:39:59 -0000	1.8
--- Attribute.java	2 Oct 2003 23:48:53 -0000	1.9
***************
*** 148,156 ****
      {
          if (null == mName)
!             if (0 <= mNameStart)
                  mName = mPage.getText (mNameStart, mNameEnd);
          return (mName);
      }
  
      /**
       * Get the value of the attribute.
--- 148,166 ----
      {
          if (null == mName)
!             if ((null != mPage) && (0 <= mNameStart))
                  mName = mPage.getText (mNameStart, mNameEnd);
          return (mName);
      }
  
+     /**
+      * Predicate to determine if this attribute is whitespace.
+      * @return <code>true</code> if this attribute is whitespace,
+      * <code>false</code> if it is a real attribute.
+      */
+     public boolean isWhitespace ()
+     {
+         return (null == getName ());
+     }
+         
      /**
       * Get the value of the attribute.

Index: TagNode.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** TagNode.java	28 Sep 2003 15:33:57 -0000	1.12
--- TagNode.java	2 Oct 2003 23:48:53 -0000	1.13
***************
*** 32,68 ****
  import java.util.Hashtable;
  import java.util.Vector;
- import org.htmlparser.lexer.Cursor;
  
  import org.htmlparser.lexer.Page;
- import org.htmlparser.parserHelper.SpecialHashtable;
  import org.htmlparser.util.NodeList;
  import org.htmlparser.util.ParserException;
  import org.htmlparser.util.Translate;
  
  /**
!  * TagNode represents a generic tag. This class allows users to register specific
!  * tag scanners, which can identify links, or image references. This tag asks the
!  * scanners to run over the text, and identify. It can be used to dynamically
!  * configure a parser.
!  * @author Kaarle Kaila 23.10.2001
   */
! public class TagNode extends AbstractNode
  {
!     public static final String TYPE = "TAG";
!     /**
!      * Constant used as value for the value of the tag name
!      * in parseParameters  (Kaarle Kaila 3.8.2001)
!      */
!     public final static String TAGNAME = "$<TAGNAME>$";
!     public final static String EMPTYTAG = "$<EMPTYTAG>$";
!     public final static String NULLVALUE = "$<NULL>$";
!     public final static String NOTHING = "$<NOTHING>$";
!     private final static String EMPTY_STRING="";
! 
!     private boolean emptyXmlTag = false;
  
      /**
       * The tag attributes.
!      * Objects of type Attribute.
       */
      protected Vector mAttributes;
--- 32,57 ----
  import java.util.Hashtable;
  import java.util.Vector;
  
+ import org.htmlparser.lexer.Cursor;
+ import org.htmlparser.lexer.Lexer;
  import org.htmlparser.lexer.Page;
  import org.htmlparser.util.NodeList;
  import org.htmlparser.util.ParserException;
+ import org.htmlparser.util.SpecialHashtable;
  import org.htmlparser.util.Translate;
  
  /**
!  * TagNode represents a generic tag.
!  * 
   */
! public class TagNode
!     extends
!         AbstractNode
  {
!     private boolean emptyXmlTag;
  
      /**
       * The tag attributes.
!      * Objects of type {@link Attribute}.
       */
      protected Vector mAttributes;
***************
*** 108,111 ****
--- 97,108 ----
  
      /**
+      * Create an empty tag.
+      */
+     public TagNode ()
+     {
+         this (null, -1, -1, new Vector ());
+     }
+ 
+     /**
       * Create a tag with the location and attributes provided
       * @param page The page this tag was read from.
***************
*** 119,131 ****
          super (page, start, end);
          mAttributes = attributes;
!     }
! 
!     /**
!      * Create an empty tag.
!      */
!     public TagNode ()
!     {
!         super (null, -1, -1);
!         mAttributes = new Vector ();
      }
  
--- 116,120 ----
          super (page, start, end);
          mAttributes = attributes;
!         emptyXmlTag = false;
      }
  
***************
*** 138,165 ****
      public String getAttribute (String name)
      {
-         Vector attributes;
-         int size;
          Attribute attribute;
-         String string;
          String ret;
  
          ret = null;
  
!         attributes = getAttributesEx ();
!         if (name.equalsIgnoreCase (TAGNAME))
!             ret = ((Attribute)attributes.elementAt (0)).getName ();
          else
          {
!             size = attributes.size ();
!             for (int i = 1; i < size; i++)
!             {
!                 attribute = (Attribute)attributes.elementAt (i);
!                 string = attribute.getName ();
!                 if ((null != string) && name.equalsIgnoreCase (string))
!                 {
!                     ret = attribute.getValue ();
!                     i = size; // exit fast
!                 }
!             }
          }
  
--- 127,142 ----
      public String getAttribute (String name)
      {
          Attribute attribute;
          String ret;
  
          ret = null;
  
!         if (name.equalsIgnoreCase (SpecialHashtable.TAGNAME))
!             ret = ((Attribute)getAttributesEx ().elementAt (0)).getName ();
          else
          {
!             attribute = getAttributeEx (name);
!             if (null != attribute)
!                 ret = attribute.getValue ();
          }
  
***************
*** 243,246 ****
--- 220,255 ----
  
      /**
+      * Returns the attribute with the given name.
+      * @param name Name of attribute, case insensitive.
+      * @return The attribute or null if it does
+      * not exist.
+      */
+     public Attribute getAttributeEx (String name)
+     {
+         Vector attributes;
+         int size;
+         Attribute attribute;
+         String string;
+         Attribute ret;
+ 
+         ret = null;
+ 
+         attributes = getAttributesEx ();
+         size = attributes.size ();
+         for (int i = 0; i < size; i++)
+         {
+             attribute = (Attribute)attributes.elementAt (i);
+             string = attribute.getName ();
+             if ((null != string) && name.equalsIgnoreCase (string))
+             {
+                 ret = attribute;
+                 i = size; // exit fast
+             }
+         }
+ 
+         return (ret);
+     }
+ 
+     /**
       * Set an attribute.
       * This replaces an attribute of the same name.
***************
*** 252,255 ****
--- 261,265 ----
          boolean replaced;
          Vector attributes;
+         int length;
          String name;
          Attribute test;
***************
*** 258,262 ****
          replaced = false;
          attributes = getAttributesEx ();
!         if (0 < attributes.size ())
          {
              name = attribute.getName ();
--- 268,273 ----
          replaced = false;
          attributes = getAttributesEx ();
!         length =  attributes.size ();
!         if (0 < length)
          {
              name = attribute.getName ();
***************
*** 274,278 ****
--- 285,294 ----
          }
          if (!replaced)
+         {
+             // add whitespace between attributes
+             if (!((Attribute)attributes.elementAt (length - 1)).isWhitespace ())
+                 attributes.addElement (new Attribute ((String)null, " ", (char)0));
              attributes.addElement (attribute);
+         }
      }
  
***************
*** 296,300 ****
      public Vector getAttributesEx ()
      {
!         return mAttributes;
      }
  
--- 312,316 ----
      public Vector getAttributesEx ()
      {
!         return (mAttributes);
      }
  
***************
*** 317,326 ****
              // special handling for the node name
              attribute = (Attribute)attributes.elementAt (0);
!             ret.put (TAGNAME, attribute.getName ().toUpperCase ());
              // the rest
              for (int i = 1; i < attributes.size (); i++)
              {
                  attribute = (Attribute)attributes.elementAt (i);
!                 if (null != attribute.getName ())
                  {
                      if (0 != attribute.getQuote ())
--- 333,342 ----
              // special handling for the node name
              attribute = (Attribute)attributes.elementAt (0);
!             ret.put (SpecialHashtable.TAGNAME, attribute.getName ().toUpperCase ());
              // the rest
              for (int i = 1; i < attributes.size (); i++)
              {
                  attribute = (Attribute)attributes.elementAt (i);
!                 if (!attribute.isWhitespace ())
                  {
                      if (0 != attribute.getQuote ())
***************
*** 330,337 ****
                          value = attribute.getValue ();
                          if ((null != value) && value.equals (""))
!                             value = NOTHING;
                      }
                      if (null == value)
!                         value = NULLVALUE;
                      ret.put (attribute.getName ().toUpperCase (), value);
                  }
--- 346,353 ----
                          value = attribute.getValue ();
                          if ((null != value) && value.equals (""))
!                             value = SpecialHashtable.NOTHING;
                      }
                      if (null == value)
!                         value = SpecialHashtable.NULLVALUE;
                      ret.put (attribute.getName ().toUpperCase (), value);
                  }
***************
*** 339,343 ****
          }
          else
!             ret.put (TAGNAME, "");
  
          return (ret);
--- 355,359 ----
          }
          else
!             ret.put (SpecialHashtable.TAGNAME, "");
  
          return (ret);
***************
*** 348,356 ****
       * <p>
       * <em>
!      * Note: This value is converted to uppercase.
!      * To get at the original case version of the tag name use:
!      * <pre>
!      * getAttribute (TagNode.TAGNAME);
!      * </pre>
       * </em>
       * @return The tag name.
--- 364,371 ----
       * <p>
       * <em>
!      * Note: This value is converted to uppercase and does not
!      * begin with "/" if it is an end tag.
!      * To get at the original text of the tag name use
!      * {@link #getRawTagName getRawTagName()}.
       * </em>
       * @return The tag name.
***************
*** 358,366 ****
      public String getTagName ()
      {
          String ret;
  
!         ret = getAttribute (TAGNAME).toUpperCase ();
!         if (ret.startsWith ("/")) // end tag
!             ret = ret.substring (1);
  
          return (ret);
--- 373,411 ----
      public String getTagName ()
      {
+         Vector attributes;
          String ret;
  
!         ret = null;
!         
!         attributes = getAttributesEx ();
!         if (0 != attributes.size ())
!         {
!             ret = getRawTagName ();
!             if (null != ret)
!             {
!                 ret = ret.toUpperCase ();
!                 if (ret.startsWith ("/"))
!                     ret = ret.substring (1);
!             }
!         }
! 
!         return (ret);
!     }
! 
!     /**
!      * Return the name of this tag.
!      * @return The tag name or null if this tag contains nothing or only
!      * whitespace.
!      */
!     public String getRawTagName ()
!     {
!         Vector attributes;
!         String ret;
! 
!         ret = null;
!         
!         attributes = getAttributesEx ();
!         if (0 != attributes.size ())
!             ret = ((Attribute)attributes.elementAt (0)).getName ();
  
          return (ret);
***************
*** 401,405 ****
      public String getText ()
      {
!         return (mPage.getText (elementBegin () + 1, elementEnd () - 1));
      }
  
--- 446,456 ----
      public String getText ()
      {
!         String ret;
!         
!         //ret = mPage.getText (elementBegin () + 1, elementEnd () - 1);
!         ret = toHtml ();
!         ret = ret.substring (1, ret.length () - 1);
!         
!         return (ret);
      }
  
***************
*** 433,438 ****
              else
                  quote = (char)0;
!             attribute = new Attribute (key, value, quote);
!             att.addElement (attribute);
          }
          this.mAttributes = att;
--- 484,500 ----
              else
                  quote = (char)0;
!             if (key.equals (SpecialHashtable.TAGNAME))
!             {
!                 attribute = new Attribute (value, null, quote);
!                 att.insertElementAt (attribute, 0);
!             }
!             else
!             {
!                 // add whitespace between attributes
!                 attribute = new Attribute ((String)null, " ", (char)0);
!                 att.addElement (attribute);
!                 attribute = new Attribute (key, value, quote);
!                 att.addElement (attribute);
!             }
          }
          this.mAttributes = att;
***************
*** 489,500 ****
      public void setText (String text)
      {
!         mPage = new Page (text);
!         nodeBegin = 0;
!         nodeEnd = text.length ();
      }
  
      public String toPlainTextString ()
      {
!         return (EMPTY_STRING);
      }
  
--- 551,580 ----
      public void setText (String text)
      {
!         Lexer lexer;
!         TagNode output;
!         
!         lexer = new Lexer (text);
!         try
!         {
!             output = (TagNode)lexer.nextNode ();
!             mPage = output.getPage ();
!             nodeBegin = output.elementBegin ();
!             nodeEnd = output.elementEnd ();
!             mAttributes = output.getAttributesEx ();
!         }
!         catch (ParserException pe)
!         {
!             throw new IllegalArgumentException (pe.getMessage ());
!         }
      }
  
+     /**
+      * Get the plain text from this node.
+      * @return An empty string (tag contents do not display in a browser).
+      * If you want this tags HTML equivalent, use {@link #toHtml toHtml()}.
+      */
      public String toPlainTextString ()
      {
!         return ("");
      }
  
***************
*** 584,592 ****
      }
  
-     public String getType ()
-     {
-         return TYPE;
-     }
- 
      /**
       * Is this an empty xml tag of the form<br>
--- 664,667 ----
***************
*** 604,610 ****
      }
  
      public boolean isEndTag ()
      {
!         return ('/' == getAttribute (TAGNAME).toUpperCase ().charAt (0));
      }
  }
--- 679,693 ----
      }
  
+     /**
+      * Predicate to determine if this tag is an end tag (i.e. &lt;/HTML&gt;).
+      * @return <code>true</code> if this tag is an end tag.
+      */
      public boolean isEndTag ()
      {
!         String raw;
!         
!         raw = getRawTagName ();
! 
!         return ((null == raw) ? false : ('/' == raw.charAt (0)));
      }
  }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests ParserTestCase.java,1.29,1.30

From: <der...@us...> - 2003-10-02 23:49:03

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1:/tmp/cvs-serv28867/tests

Modified Files:
	ParserTestCase.java 
Log Message:
Moved SpecialHashTable to util.
Fixed some attribute bugs and some test cases.



Index: ParserTestCase.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** ParserTestCase.java	28 Sep 2003 15:33:58 -0000	1.29
--- ParserTestCase.java	2 Oct 2003 23:48:53 -0000	1.30
***************
*** 48,51 ****
--- 48,52 ----
  import org.htmlparser.util.ParserException;
  import org.htmlparser.util.ParserUtils;
+ import org.htmlparser.util.SpecialHashtable;
  
  public class ParserTestCase extends TestCase {
***************
*** 325,329 ****
              String actualValue =
                  actualTag.getAttribute(key);
!             if (key==Tag.TAGNAME) {
                  expectedValue = ParserUtils.removeChars(expectedValue,'/');
                  actualValue = ParserUtils.removeChars(actualValue,'/');
--- 326,330 ----
              String actualValue =
                  actualTag.getAttribute(key);
!             if (key==SpecialHashtable.TAGNAME) {
                  expectedValue = ParserUtils.removeChars(expectedValue,'/');
                  actualValue = ParserUtils.removeChars(actualValue,'/');
***************
*** 351,355 ****
              String actualValue =
                  actualTag.getAttribute(key);
!             if (key==Tag.TAGNAME) {
                  expectedValue = ParserUtils.removeChars(expectedValue,'/');
                  actualValue = ParserUtils.removeChars(actualValue,'/');
--- 352,356 ----
              String actualValue =
                  actualTag.getAttribute(key);
!             if (key==SpecialHashtable.TAGNAME) {
                  expectedValue = ParserUtils.removeChars(expectedValue,'/');
                  actualValue = ParserUtils.removeChars(actualValue,'/');

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tags Tag.java,1.51,1.52

From: <der...@us...> - 2003-10-02 23:49:02

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags
In directory sc8-pr-cvs1:/tmp/cvs-serv28867/tags

Modified Files:
	Tag.java 
Log Message:
Moved SpecialHashTable to util.
Fixed some attribute bugs and some test cases.



Index: Tag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/Tag.java,v
retrieving revision 1.51
retrieving revision 1.52
diff -C2 -d -r1.51 -r1.52
*** Tag.java	30 Sep 2003 02:12:34 -0000	1.51
--- Tag.java	2 Oct 2003 23:48:53 -0000	1.52
***************
*** 38,46 ****
  import org.htmlparser.lexer.Page;
  import org.htmlparser.lexer.nodes.TagNode;
- import org.htmlparser.parserHelper.SpecialHashtable;
  import org.htmlparser.scanners.TagScanner;
  import org.htmlparser.tags.data.TagData;
  import org.htmlparser.util.NodeList;
  import org.htmlparser.util.ParserException;
  import org.htmlparser.visitors.NodeVisitor;
  
--- 38,46 ----
  import org.htmlparser.lexer.Page;
  import org.htmlparser.lexer.nodes.TagNode;
  import org.htmlparser.scanners.TagScanner;
  import org.htmlparser.tags.data.TagData;
  import org.htmlparser.util.NodeList;
  import org.htmlparser.util.ParserException;
+ import org.htmlparser.util.SpecialHashtable;
  import org.htmlparser.visitors.NodeVisitor;

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer Page.java,1.17,1.18

From: <der...@us...> - 2003-09-30 02:12:39

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1:/tmp/cvs-serv7647/lexer

Modified Files:
	Page.java 
Log Message:
Doco update. Privatize tag fields leading up to removal.



Index: Page.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -d -r1.17 -r1.18
*** Page.java	29 Sep 2003 00:00:39 -0000	1.17
--- Page.java	30 Sep 2003 02:12:34 -0000	1.18
***************
*** 98,102 ****
  
      /**
!      * Construct an empty page reading.
       */
      public Page ()
--- 98,102 ----
  
      /**
!      * Construct an empty page.
       */
      public Page ()
***************
*** 169,172 ****
--- 169,179 ----
      //
  
+     /**
+      * Serialize the page.
+      * There are two modes to serializing a page based on the connected state.
+      * If connected, the URL and the current offset is saved, while if
+      * disconnected, the underling source is saved.
+      * @param out The object stream to store this object in.
+      */
      private void writeObject (ObjectOutputStream out)
          throws
***************
*** 204,207 ****
--- 211,219 ----
      }
  
+     /**
+      * Deserialize the page.
+      * @see #writeObject
+      * @param in The object stream to decode.
+      */
      private void readObject (ObjectInputStream in)
          throws
***************
*** 275,278 ****
--- 287,292 ----
      /**
       * Set the URLConnection to be used by this page.
+      * Starts reading from the given connection.
+      * This also resets the current url.
       * @param connection The connection to use.
       * It will be connected by this method.
***************
*** 337,341 ****
      /**
       * Get the URL for this page.
!      * @return The url for the connection, or <code>null</code> if there is none.
       */
      public String getUrl ()
--- 351,359 ----
      /**
       * Get the URL for this page.
!      * This is only available if the page has a connection
!      * (<code>getConnection()</code> returns non-null), or the document base has
!      * been set via a call to <code>setUrl()</code>.
!      * @return The url for the connection, or <code>null</code> if there is
!      * no conenction or the document base has not been set.
       */
      public String getUrl ()
***************
*** 618,624 ****
  
      /**
!      * Try and extract the character set from the HTTP header.
!      * @param connection The connection with the charset info.
!      * @return The character set name to use for this HTML page.
       */
      public void setEncoding (String character_set)
--- 636,643 ----
  
      /**
!      * Resets this page and begins reading from the source with the
!      * given character set.
!      * @param character_set The character set to use to convert bytes into
!      * characters.
       */
      public void setEncoding (String character_set)
***************
*** 632,637 ****
          {
              stream.reset ();
!             mIndex = new PageIndex (this);
!             mSource = new Source (stream, character_set);
          }
          catch (IOException ioe)
--- 651,659 ----
          {
              stream.reset ();
!             if (!getEncoding ().equals (character_set))
!             {
!                 mSource = new Source (stream, character_set);
!                 mIndex = new PageIndex (this);
!             }
          }
          catch (IOException ioe)
***************
*** 639,687 ****
              throw new ParserException (ioe.getMessage (), ioe);
          }
-         
- // code from Parser:
- 
- //     /* If there is no connection (getConnection() returns null) it simply sets
- //     * the character set name stored in the parser (Note: the lexer object
- //     * which must have been set in the constructor or by <code>setLexer()</code>,
- //     * may or may not be using this character set).
- ////     * Otherwise (getConnection() doesn't return null) it does this by reopening the
- ////     * input stream of the connection and creating a reader that uses this
- ////     * character set. In this case, this method sets two of the fields in the
- ////     * parser object; <code>character_set</code> and <code>reader</code>.
- ////     * It does not adjust <code>resourceLocn</code>, <code>url_conn</code>,
- ////     * <code>scanners</code> or <code>feedback</code>. The two fields are set
- ////     * atomicly by this method, either they are both set or none of them is set.
- ////     * Trying to set the encoding to null or an empty string is a noop.
- ////     * @exception ParserException If the opening of the reader
- //     */
- //        String chs;
- //        BufferedInputStream in;
- //
- //        if ((null != encoding) && !"".equals (encoding))
- //            if (null == getConnection ())
- //                character_set = encoding;
- //            else
- //            {
- //                chs = getEncoding ();
- //                in = input;
- //                try
- //                {
- //                    character_set = encoding;
- //                    if (null != getLexer ())
- //                        getLexer ().getPage ().setCharset (encoding);
- //                }
- //                catch (IOException ioe)
- //                {
- //                    String msg = "setEncoding() : Error in opening a connection to " + getConnection ().getURL ().toExternalForm ();
- //                    ParserException ex = new ParserException (msg, ioe);
- //                    feedback.error (msg, ex);
- //                    character_set = chs;
- //                    input = in;
- //                    throw ex;
- //                }
- //            }
- //    }
- //
      }
  
--- 661,664 ----
***************
*** 734,737 ****
--- 711,716 ----
       * @return The text from <code>start</code> to <code>end</code>.
       * @see #getText(StringBuffer, int, int)
+      * @exception IllegalArgumentException If an attempt is made to get
+      * characters ahead of the current source offset (character position).
       */
      public String getText (int start, int end)
***************
*** 752,755 ****
--- 731,736 ----
       * (exclusive, i.e. the character at the ending position is not included),
       * zero based.
+      * @exception IllegalArgumentException If an attempt is made to get
+      * characters ahead of the current source offset (character position).
       */
      public void getText (StringBuffer buffer, int start, int end)

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tags Tag.java,1.50,1.51

From: <der...@us...> - 2003-09-30 02:12:38

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags
In directory sc8-pr-cvs1:/tmp/cvs-serv7647/tags

Modified Files:
	Tag.java 
Log Message:
Doco update. Privatize tag fields leading up to removal.



Index: Tag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/Tag.java,v
retrieving revision 1.50
retrieving revision 1.51
diff -C2 -d -r1.50 -r1.51
*** Tag.java	28 Sep 2003 19:30:04 -0000	1.50
--- Tag.java	30 Sep 2003 02:12:34 -0000	1.51
***************
*** 53,58 ****
  public class Tag extends TagNode
  {
!     TagScanner mScanner;
!     TagData mData;
  
      public Tag (TagNode node, TagScanner scanner)
--- 53,58 ----
  public class Tag extends TagNode
  {
!     private TagScanner mScanner;
!     private TagData mData;
  
      public Tag (TagNode node, TagScanner scanner)

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer Cursor.java,1.9,1.10 Lexer.java,1.10,1.11 Page.java,1.16,1.17 PageIndex.java,1.9,1.10 Source.java,1.10,1.11

From: <der...@us...> - 2003-09-29 22:02:39

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1:/tmp/cvs-serv32344/lexer

Modified Files:
	Cursor.java Lexer.java Page.java PageIndex.java Source.java 
Log Message:
Fix broken serializability.



Index: Cursor.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Cursor.java,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** Cursor.java	28 Sep 2003 15:33:57 -0000	1.9
--- Cursor.java	29 Sep 2003 00:00:38 -0000	1.10
***************
*** 33,36 ****
--- 33,37 ----
  package org.htmlparser.lexer;
  
+ import java.io.Serializable;
  import org.htmlparser.util.sort.Ordered;
  
***************
*** 39,43 ****
   * This class remembers the page it came from and its position within the page.
   */
! public class Cursor implements Ordered, Cloneable
  {
      /**
--- 40,48 ----
   * This class remembers the page it came from and its position within the page.
   */
! public class Cursor
!     implements
!         Serializable,
!         Ordered,
!         Cloneable
  {
      /**

Index: Lexer.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** Lexer.java	28 Sep 2003 15:33:57 -0000	1.10
--- Lexer.java	29 Sep 2003 00:00:38 -0000	1.11
***************
*** 34,37 ****
--- 34,38 ----
  
  import java.io.IOException;
+ import java.io.Serializable;
  import java.net.MalformedURLException;
  import java.net.URL;
***************
*** 59,62 ****
--- 60,64 ----
  public class Lexer
      implements
+         Serializable,
          NodeFactory
  {
***************
*** 75,78 ****
--- 77,90 ----
       */
      protected NodeFactory mFactory;
+ 
+     /**
+      * Creates a new instance of a Lexer.
+      */
+     public Lexer ()
+     {
+         setPage (new Page (""));
+         setCursor (new Cursor (getPage (), 0));
+         setNodeFactory (this);
+     }
  
      /**

Index: Page.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
retrieving revision 1.16
retrieving revision 1.17
diff -C2 -d -r1.16 -r1.17
*** Page.java	28 Sep 2003 15:33:57 -0000	1.16
--- Page.java	29 Sep 2003 00:00:39 -0000	1.17
***************
*** 35,38 ****
--- 35,39 ----
  import java.io.*;
  import java.io.IOException;
+ import java.io.Serializable;
  import java.lang.reflect.*;
  import java.net.*;
***************
*** 46,49 ****
--- 47,52 ----
   */
  public class Page
+     implements
+         Serializable
  {
      /**
***************
*** 75,79 ****
       * The connection this page is coming from or <code>null</code>.
       */
!     protected URLConnection mConnection;
  
      /**
--- 78,82 ----
       * The connection this page is coming from or <code>null</code>.
       */
!     protected transient URLConnection mConnection;
  
      /**
***************
*** 95,98 ****
--- 98,109 ----
  
      /**
+      * Construct an empty page reading.
+      */
+     public Page ()
+     {
+         this ("");
+     }
+ 
+     /**
       * Construct a page reading from a URL connection.
       * @param connection A fully conditioned connection. The connect()
***************
*** 154,157 ****
--- 165,257 ----
      }
  
+     //
+     // Serialization support
+     //
+ 
+     private void writeObject (ObjectOutputStream out)
+         throws
+             IOException
+     {
+         String href;
+         Source source;
+         PageIndex index;
+ 
+         // two cases, reading from a URL and not
+         if (null != getConnection ())
+         {
+             out.writeBoolean (true);
+             out.writeInt (mSource.offset ()); // need to preread this much
+             href = getUrl ();
+             out.writeObject (href);
+             setUrl (getConnection ().getURL ().toExternalForm ());
+             source = getSource ();
+             mSource = null; // don't serialize the source if we can avoid it
+             index = mIndex;
+             mIndex = null; // will get recreated; valid for the new page anyway?
+             out.defaultWriteObject ();
+             mSource = source;
+             mIndex = index;
+         }
+         else
+         {
+             out.writeBoolean (false);
+             href = getUrl ();
+             out.writeObject (href);
+             setUrl (null); // don't try and read a bogus URL
+             out.defaultWriteObject ();
+             setUrl (href);
+         }
+     }
+ 
+     private void readObject (ObjectInputStream in)
+         throws
+             IOException,
+             ClassNotFoundException
+     {
+         boolean fromurl;
+         int offset;
+         String href;
+         URL url;
+         Cursor cursor;
+ 
+         fromurl = in.readBoolean ();
+         if (fromurl)
+         {
+             offset = in.readInt ();
+             href = (String)in.readObject ();
+             in.defaultReadObject ();
+             // open the URL
+             if (null != getUrl ())
+             {
+                 url = new URL (getUrl ());
+                 try
+                 {
+                     setConnection (url.openConnection ());
+                 }
+                 catch (ParserException pe)
+                 {
+                     throw new IOException (pe.getMessage ());
+                 }
+             }
+             cursor = new Cursor (this, 0);
+             for (int i = 0; i < offset; i++)
+                 try
+                 {
+                     getCharacter (cursor);
+                 }
+                 catch (ParserException pe)
+                 {
+                     throw new IOException (pe.getMessage ());
+                 }
+             setUrl (href);
+         }
+         else
+         {
+             href = (String)in.readObject ();
+             in.defaultReadObject ();
+             setUrl (href);
+         }
+     }
+ 
      /**
       * Reset the page by resetting the source of characters.
***************
*** 189,193 ****
          
  
-         mUrl = null;
          mConnection = connection;
          try
--- 289,292 ----
***************
*** 232,235 ****
--- 331,335 ----
              throw new ParserException (ioe.getMessage (), ioe);
          }
+         mUrl = connection.getURL ().toExternalForm ();
          mIndex = new PageIndex (this);
      }
***************
*** 241,252 ****
      public String getUrl ()
      {
-         URLConnection connection;
-         if (null == mUrl)
-         {
-             connection = getConnection ();
-             if (null != connection)
-                 mUrl = connection.getURL ().toExternalForm ();
-         }
-         
          return (mUrl);
      }
--- 341,344 ----

Index: PageIndex.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/PageIndex.java,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** PageIndex.java	22 Sep 2003 02:39:59 -0000	1.9
--- PageIndex.java	29 Sep 2003 00:00:39 -0000	1.10
***************
*** 33,36 ****
--- 33,37 ----
  package org.htmlparser.lexer;
  
+ import java.io.Serializable;
  import org.htmlparser.util.sort.Ordered;
  import org.htmlparser.util.sort.Sort;
***************
*** 46,50 ****
   * does not incur the overhead of an <code>Integer</code> object per element.
   */
! public class PageIndex implements Sortable
  {
      /**
--- 47,54 ----
   * does not incur the overhead of an <code>Integer</code> object per element.
   */
! public class PageIndex
!     implements
!         Serializable,
!         Sortable
  {
      /**

Index: Source.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Source.java,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** Source.java	28 Sep 2003 15:33:57 -0000	1.10
--- Source.java	29 Sep 2003 00:00:39 -0000	1.11
***************
*** 29,36 ****
--- 29,40 ----
  package org.htmlparser.lexer;
  
+ import java.io.ByteArrayInputStream;
  import java.io.IOException;
  import java.io.InputStream;
  import java.io.InputStreamReader;
+ import java.io.ObjectInputStream;
+ import java.io.ObjectOutputStream;
  import java.io.Reader;
+ import java.io.Serializable;
  import java.io.UnsupportedEncodingException;
  
***************
*** 46,50 ****
   *
   */
! public class Source extends Reader
  {
      /**
--- 50,58 ----
   *
   */
! public class Source
!     extends
!         Reader
!     implements
!         Serializable
  {
      /**
***************
*** 61,65 ****
       * The stream of bytes.
       */
!     protected InputStream mStream;
  
      /**
--- 69,73 ----
       * The stream of bytes.
       */
!     protected transient InputStream mStream;
  
      /**
***************
*** 71,75 ****
       * The converter from bytes to characters.
       */
!     protected InputStreamReader mReader;
  
      /**
--- 79,83 ----
       * The converter from bytes to characters.
       */
!     protected transient InputStreamReader mReader;
  
      /**
***************
*** 143,146 ****
--- 151,189 ----
      }
  
+     //
+     // Serialization support
+     //
+ 
+     private void writeObject (ObjectOutputStream out)
+         throws
+             IOException
+     {
+         int offset;
+         char[] buffer;
+ 
+         if (null != mStream)
+         {
+             // remember the offset, drain the input stream, restore the offset
+             offset = mOffset;
+             buffer = new char[4096];
+             while (-1 != read (buffer))
+                 ;
+             mOffset = offset;
+         }
+         
+         out.defaultWriteObject ();
+     }
+ 
+     private void readObject (ObjectInputStream in)
+         throws
+             IOException,
+             ClassNotFoundException
+     {
+         in.defaultReadObject ();
+         if (null != mBuffer) // buffer is null when destroy's been called
+             // pretend we're open, mStream goes null when exhausted
+             mStream = new ByteArrayInputStream (new byte[0]);
+     }
+ 
      /**
       * Get the input stream being used.
***************
*** 421,424 ****
--- 464,476 ----
          mOffset = 0;
          mMark = -1;
+     }
+ 
+     /**
+      * Get the position (in characters).
+      * @return The number of characters that have been read.
+      */
+     public int offset ()
+     {
+         return (mOffset);
      }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/utilTests BeanTest.java,1.42,1.43

From: <der...@us...> - 2003-09-29 21:58:10

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests
In directory sc8-pr-cvs1:/tmp/cvs-serv32344/tests/utilTests

Modified Files:
	BeanTest.java 
Log Message:
Fix broken serializability.



Index: BeanTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/BeanTest.java,v
retrieving revision 1.42
retrieving revision 1.43
diff -C2 -d -r1.42 -r1.43
*** BeanTest.java	22 Sep 2003 02:40:13 -0000	1.42
--- BeanTest.java	29 Sep 2003 00:00:39 -0000	1.43
***************
*** 46,49 ****
--- 46,51 ----
  import org.htmlparser.beans.LinkBean;
  import org.htmlparser.beans.StringBean;
+ import org.htmlparser.lexer.Lexer;
+ import org.htmlparser.lexer.Page;
  import org.htmlparser.tests.*;
  import org.htmlparser.util.NodeIterator;
***************
*** 108,112 ****
          {
              out = new PrintWriter (new FileWriter (file));
!             out.println (html);
              out.close ();
              bean.setURL (file.getAbsolutePath ());
--- 110,114 ----
          {
              out = new PrintWriter (new FileWriter (file));
!             out.print (html);
              out.close ();
              bean.setURL (file.getAbsolutePath ());
***************
*** 125,129 ****
      }
  
!     public void testZeroArgConstructor ()
          throws
              IOException,
--- 127,159 ----
      }
  
!     public void testZeroArgPageConstructor ()
!         throws
!             IOException,
!             ClassNotFoundException,
!             ParserException
!     {
!         Page page;
!         byte[] data;
! 
!         page = new Page ();
!         data = pickle (page);
!         page = (Page)unpickle (data);
!     }
! 
!     public void testZeroArgLexerConstructor ()
!         throws
!             IOException,
!             ClassNotFoundException,
!             ParserException
!     {
!         Lexer lexer;
!         byte[] data;
! 
!         lexer = new Lexer ();
!         data = pickle (lexer);
!         lexer = (Lexer)unpickle (data);
!     }
! 
!     public void testZeroArgParserConstructor ()
          throws
              IOException,

1 message has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 35 36 37 38 39 .. 61 > >> (Page 37 of 61)

2003	Jan	Feb	Mar	Apr	May (141)	Jun (108)	Jul (66)	Aug (127)	Sep (155)	Oct (149)	Nov (72)	Dec (72)
2004	Jan (100)	Feb (36)	Mar (21)	Apr (3)	May (87)	Jun (28)	Jul (84)	Aug (5)	Sep (14)	Oct	Nov	Dec
2005	Jan (1)	Feb (39)	Mar (26)	Apr (38)	May (14)	Jun (10)	Jul	Aug	Sep (13)	Oct (8)	Nov (10)	Dec
2006	Jan	Feb (1)	Mar (17)	Apr (20)	May (28)	Jun (24)	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec