Thread: [Htmlparser-developer] HTMLTag patch
Brought to you by:
derrickoswald
From: <tez...@ya...> - 2003-05-20 15:03:46
|
Hi, this is further to my Bug report via the SF site. Basically, setParsed() wasn't effecting the actual output of the Node thereafter. This made it a real pain to highlight HTML, the example here being making tables have a border of 1 to show them. Patch attached. Has some debugging commented out, you'll want to get rid of this. I put a patch for th testing code on the sourceforge bug report. Cheers, Terry. -------------------- *** HTMLTag.java 2003/05/20 12:33:42 1.1 --- HTMLTag.java 2003/05/20 14:52:42 *************** *** 273,283 **** } /** * Sets the parsed. ! * @param parsed The parsed to set */ public void setParsed(Hashtable parsed) { this.parsed = parsed; } /** * Sets the strictTags. * @param strictTags The strictTags to set --- 273,306 ---- } /** * Sets the parsed. ! * Note: There is no guarantee that the attributes will be: ! * in the same order or case as originally. ! * This isn't expected to be a problem, but then again ! * it never is, is it? ! * Also: This currently makes no effort to place the attribute ! * in quotes if necessary. You have to take care of that ! * yourself ! * @param parsed The hash of (key,value) attribute pairs to set */ public void setParsed(Hashtable parsed) { this.parsed = parsed; + + setText((String) parsed.get(this.TAGNAME)); //Set the tag first + for(Enumeration e = parsed.keys(); e.hasMoreElements();) { + String temp = (String) e.nextElement(); + if (!temp.equals(this.TAGNAME)) { //Don't add the tagname again + append(" " + temp + '=' + ((String) parsed.get(temp))); + + //Debug + //System.out.println("setParsed appending key: " + temp + " to value: " + ((String) parsed.get(temp))); + } + } + + //Debug + //System.out.println("setParsed: completed, now text is:" + getText()); + } + /** * Sets the strictTags. * @param strictTags The strictTags to set ===== ------------------------------------------------------------ Terry Alexis Lurie | 'Something witty that doesn't Freelance Computer Engineer | look good with variable United Kingdom | width fonts' - Most nerds __________________________________________________ It's Samaritans' Week. Help Samaritans help others. Call 08709 000032 to give or donate online now at http://www.samaritans.org/support/donations.shtm |
From: <tez...@ya...> - 2003-05-20 16:14:38
|
Hmm, well that breaks everything under the sun.. I have re-corrected it on my side by changing this addition into a new method resetParsed(). So more of a helper function than a major change... Obviously I've blundered in here half-cocked. Should I submit further stuff off the CVS or the 1.2 code base? I'm a bit loathe to use the CVS in production, so any patches I do I'm inclined to do off 1.2 Thoughts? If you want the diff that implements the resetParsed() and appropriate test, just email me. Cheers, Terry. --- Terry Alexis Lurie <tez...@ya...> wrote: > Hi, this is further to my Bug report via the SF > site. > > Basically, setParsed() wasn't effecting the actual > output of the Node thereafter. This made it a real > pain to highlight HTML, the example here being > making > tables have a border of 1 to show them. > > Patch attached. Has some debugging commented out, > you'll want to get rid of this. I put a patch for th > testing code on the sourceforge bug report. > > Cheers, > > Terry. > > -------------------- > > *** HTMLTag.java 2003/05/20 12:33:42 1.1 > --- HTMLTag.java 2003/05/20 14:52:42 > *************** > *** 273,283 **** > } > /** > * Sets the parsed. > ! * @param parsed The parsed to set > */ > public void setParsed(Hashtable parsed) { > this.parsed = parsed; > } > /** > * Sets the strictTags. > * @param strictTags The strictTags to set > --- 273,306 ---- > } > /** > * Sets the parsed. > ! * Note: There is no guarantee that the > attributes > will be: > ! * in the same order or case as originally. > ! * This isn't expected to be a problem, but > then again > ! * it never is, is it? > ! * Also: This currently makes no effort to place > the attribute > ! * in quotes if necessary. You have to take > care of that > ! * yourself > ! * @param parsed The hash of (key,value) > attribute > pairs to set > */ > public void setParsed(Hashtable parsed) { > this.parsed = parsed; > + > + setText((String) parsed.get(this.TAGNAME)); > //Set > the tag first > + for(Enumeration e = parsed.keys(); > e.hasMoreElements();) { > + String temp = (String) e.nextElement(); > + if (!temp.equals(this.TAGNAME)) { //Don't > add > the tagname again > + append(" " + temp + '=' + ((String) > parsed.get(temp))); > + > + //Debug > + //System.out.println("setParsed appending key: > " > + temp + " to value: " + ((String) > parsed.get(temp))); > + } > + } > + > + //Debug > + //System.out.println("setParsed: completed, now > text is:" + getText()); > + > } > + > /** > * Sets the strictTags. > * @param strictTags The strictTags to set > > > ===== > ------------------------------------------------------------ > Terry Alexis Lurie | 'Something witty that > doesn't > Freelance Computer Engineer | look good with > variable > United Kingdom | width fonts' - Most > nerds > > __________________________________________________ > It's Samaritans' Week. Help Samaritans help others. > Call 08709 000032 to give or donate online now at > http://www.samaritans.org/support/donations.shtm > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your > application fit in a > relational database is painful, don't do it! Check > out ObjectStore. > Now part of Progress Software. > http://www.objectstore.net/sourceforge > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer ===== ------------------------------------------------------------ Terry Alexis Lurie | 'Something witty that doesn't Freelance Computer Engineer | look good with variable United Kingdom | width fonts' - Most nerds __________________________________________________ It's Samaritans' Week. Help Samaritans help others. Call 08709 000032 to give or donate online now at http://www.samaritans.org/support/donations.shtm |
From: Somik R. <so...@ya...> - 2003-05-21 03:03:57
|
Hi Terry Just curious - why do you need to call setParsed() ? Are you trying to take all tables and ensure that they have a border "1" ? Regards, Somik ----- Original Message ----- From: "Terry Alexis Lurie" <tez...@ya...> To: <htm...@li...> Sent: Tuesday, May 20, 2003 11:03 AM Subject: [Htmlparser-developer] HTMLTag patch > Hi, this is further to my Bug report via the SF site. > > Basically, setParsed() wasn't effecting the actual > output of the Node thereafter. This made it a real > pain to highlight HTML, the example here being making > tables have a border of 1 to show them. > > Patch attached. Has some debugging commented out, > you'll want to get rid of this. I put a patch for th > testing code on the sourceforge bug report. > > Cheers, > > Terry. > > -------------------- > > *** HTMLTag.java 2003/05/20 12:33:42 1.1 > --- HTMLTag.java 2003/05/20 14:52:42 > *************** > *** 273,283 **** > } > /** > * Sets the parsed. > ! * @param parsed The parsed to set > */ > public void setParsed(Hashtable parsed) { > this.parsed = parsed; > } > /** > * Sets the strictTags. > * @param strictTags The strictTags to set > --- 273,306 ---- > } > /** > * Sets the parsed. > ! * Note: There is no guarantee that the attributes > will be: > ! * in the same order or case as originally. > ! * This isn't expected to be a problem, but > then again > ! * it never is, is it? > ! * Also: This currently makes no effort to place > the attribute > ! * in quotes if necessary. You have to take > care of that > ! * yourself > ! * @param parsed The hash of (key,value) attribute > pairs to set > */ > public void setParsed(Hashtable parsed) { > this.parsed = parsed; > + > + setText((String) parsed.get(this.TAGNAME)); //Set > the tag first > + for(Enumeration e = parsed.keys(); > e.hasMoreElements();) { > + String temp = (String) e.nextElement(); > + if (!temp.equals(this.TAGNAME)) { //Don't add > the tagname again > + append(" " + temp + '=' + ((String) > parsed.get(temp))); > + > + //Debug > + //System.out.println("setParsed appending key: " > + temp + " to value: " + ((String) parsed.get(temp))); > + } > + } > + > + //Debug > + //System.out.println("setParsed: completed, now > text is:" + getText()); > + > } > + > /** > * Sets the strictTags. > * @param strictTags The strictTags to set > > > ===== > ------------------------------------------------------------ > Terry Alexis Lurie | 'Something witty that doesn't > Freelance Computer Engineer | look good with variable > United Kingdom | width fonts' - Most nerds > > __________________________________________________ > It's Samaritans' Week. Help Samaritans help others. > Call 08709 000032 to give or donate online now at http://www.samaritans.org/support/donations.shtm > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your application fit in a > relational database is painful, don't do it! Check out ObjectStore. > Now part of Progress Software. http://www.objectstore.net/sourceforge > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: <tez...@ya...> - 2003-05-21 09:10:46
|
Yes, I'd like to be able to programmatically set certain attributes. Its for a highlighted step-by-step through a web-rip, so the focus table is border=1 or whatever [very uncommon these days], the rest is as is. I've been doing this in Perl's HTML::Parse for a while, but now shifting to Java because of work. Terry. --- Somik Raha <so...@ya...> wrote: > Hi Terry > Just curious - why do you need to call > setParsed() ? > Are you trying to take all tables and ensure > that they have a border "1" > ? > > Regards, > Somik > ----- Original Message ----- > From: "Terry Alexis Lurie" <tez...@ya...> > To: <htm...@li...> > Sent: Tuesday, May 20, 2003 11:03 AM > Subject: [Htmlparser-developer] HTMLTag patch > > > > Hi, this is further to my Bug report via the SF > site. > > > > Basically, setParsed() wasn't effecting the actual > > output of the Node thereafter. This made it a real > > pain to highlight HTML, the example here being > making > > tables have a border of 1 to show them. > > > > Patch attached. Has some debugging commented out, > > you'll want to get rid of this. I put a patch for > th > > testing code on the sourceforge bug report. > > > > Cheers, > > > > Terry. > > > > -------------------- > > > > *** HTMLTag.java 2003/05/20 12:33:42 1.1 > > --- HTMLTag.java 2003/05/20 14:52:42 > > *************** > > *** 273,283 **** > > } > > /** > > * Sets the parsed. > > ! * @param parsed The parsed to set > > */ > > public void setParsed(Hashtable parsed) { > > this.parsed = parsed; > > } > > /** > > * Sets the strictTags. > > * @param strictTags The strictTags to set > > --- 273,306 ---- > > } > > /** > > * Sets the parsed. > > ! * Note: There is no guarantee that the > attributes > > will be: > > ! * in the same order or case as originally. > > ! * This isn't expected to be a problem, but > > then again > > ! * it never is, is it? > > ! * Also: This currently makes no effort to place > > the attribute > > ! * in quotes if necessary. You have to take > > care of that > > ! * yourself > > ! * @param parsed The hash of (key,value) > attribute > > pairs to set > > */ > > public void setParsed(Hashtable parsed) { > > this.parsed = parsed; > > + > > + setText((String) parsed.get(this.TAGNAME)); > //Set > > the tag first > > + for(Enumeration e = parsed.keys(); > > e.hasMoreElements();) { > > + String temp = (String) e.nextElement(); > > + if (!temp.equals(this.TAGNAME)) { //Don't > add > > the tagname again > > + append(" " + temp + '=' + ((String) > > parsed.get(temp))); > > + > > + //Debug > > + //System.out.println("setParsed appending key: " > > + temp + " to value: " + ((String) > parsed.get(temp))); > > + } > > + } > > + > > + //Debug > > + //System.out.println("setParsed: completed, now > > text is:" + getText()); > > + > > } > > + > > /** > > * Sets the strictTags. > > * @param strictTags The strictTags to set > > > > > > ===== > > > ------------------------------------------------------------ > > Terry Alexis Lurie | 'Something witty > that doesn't > > Freelance Computer Engineer | look good with > variable > > United Kingdom | width fonts' - Most > nerds > > > > __________________________________________________ > > It's Samaritans' Week. Help Samaritans help > others. > > Call 08709 000032 to give or donate online now at > http://www.samaritans.org/support/donations.shtm > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: ObjectStore. > > If flattening out C++ or Java code to make your > application fit in a > > relational database is painful, don't do it! Check > out ObjectStore. > > Now part of Progress Software. > http://www.objectstore.net/sourceforge > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your > application fit in a > relational database is painful, don't do it! Check > out ObjectStore. > Now part of Progress Software. > http://www.objectstore.net/sourceforge > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer ===== ------------------------------------------------------------ Terry Alexis Lurie | 'Something witty that doesn't Freelance Computer Engineer | look good with variable United Kingdom | width fonts' - Most nerds __________________________________________________ It's Samaritans' Week. Help Samaritans help others. Call 08709 000032 to give or donate online now at http://www.samaritans.org/support/donations.shtm |
From: Derrick O. <Der...@ro...> - 2003-05-21 12:07:24
|
Terry, You should really switch to the 1.3 codebase, version 1.2 is very long in the tooth and a final release of 1.3 is imminent. These problems you are encountering don't seem to be present any more and you would have a more sympathetic ear. Derrick Terry Alexis Lurie wrote: >Yes, I'd like to be able to programmatically set >certain attributes. Its for a highlighted step-by-step >through a web-rip, so the focus table is border=1 or >whatever [very uncommon these days], the rest is as >is. > >I've been doing this in Perl's HTML::Parse for a >while, but now shifting to Java because of work. > >Terry. > > --- Somik Raha <so...@ya...> wrote: > Hi Terry > > >> Just curious - why do you need to call >>setParsed() ? >> Are you trying to take all tables and ensure >>that they have a border "1" >>? >> >>Regards, >>Somik >>----- Original Message ----- >>From: "Terry Alexis Lurie" <tez...@ya...> >>To: <htm...@li...> >>Sent: Tuesday, May 20, 2003 11:03 AM >>Subject: [Htmlparser-developer] HTMLTag patch >> >> >> |
From: <tez...@ya...> - 2003-05-21 12:15:28
|
Right. That was definitely the answer I was looking for. Hopefully be able to use my talents for good rather than evil. I'm just avers to using bleeding edge in production, but now I'm sort of familiar with the scope of the project, I think it is worth the small risk. Terry. --- Derrick Oswald <Der...@ro...> wrote: > Terry, > > You should really switch to the 1.3 codebase, > version 1.2 is very long > in the tooth and a final release of 1.3 is imminent. > These problems you are encountering don't seem to be > present any more > and you would have a more sympathetic ear. > > Derrick > > Terry Alexis Lurie wrote: > > >Yes, I'd like to be able to programmatically set > >certain attributes. Its for a highlighted > step-by-step > >through a web-rip, so the focus table is border=1 > or > >whatever [very uncommon these days], the rest is as > >is. > > > >I've been doing this in Perl's HTML::Parse for a > >while, but now shifting to Java because of work. > > > >Terry. > > > > --- Somik Raha <so...@ya...> wrote: > Hi Terry > > > > > >> Just curious - why do you need to call > >>setParsed() ? > >> Are you trying to take all tables and ensure > >>that they have a border "1" > >>? > >> > >>Regards, > >>Somik > >>----- Original Message ----- > >>From: "Terry Alexis Lurie" <tez...@ya...> > >>To: <htm...@li...> > >>Sent: Tuesday, May 20, 2003 11:03 AM > >>Subject: [Htmlparser-developer] HTMLTag patch > >> > >> > >> > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your > application fit in a > relational database is painful, don't do it! Check > out ObjectStore. > Now part of Progress Software. > http://www.objectstore.net/sourceforge > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer ===== ------------------------------------------------------------ Terry Alexis Lurie | 'Something witty that doesn't Freelance Computer Engineer | look good with variable United Kingdom | width fonts' - Most nerds __________________________________________________ It's Samaritans' Week. Help Samaritans help others. Call 08709 000032 to give or donate online now at http://www.samaritans.org/support/donations.shtm |
From: Somik R. <so...@ya...> - 2003-05-21 23:44:19
|
You should not be using setParsed. Instead, all you have to do is use setAttribute on TableTag, like so: tableTag.setAttribute("BORDER",1); Then, make a call to tableTag.toHtml(), and it should show up. Regards, Somik ----- Original Message ----- From: "Terry Alexis Lurie" <tez...@ya...> To: <htm...@li...> Sent: Wednesday, May 21, 2003 5:10 AM Subject: Re: [Htmlparser-developer] HTMLTag patch > Yes, I'd like to be able to programmatically set > certain attributes. Its for a highlighted step-by-step > through a web-rip, so the focus table is border=1 or > whatever [very uncommon these days], the rest is as > is. > > I've been doing this in Perl's HTML::Parse for a > while, but now shifting to Java because of work. > > Terry. > > --- Somik Raha <so...@ya...> wrote: > Hi Terry > > Just curious - why do you need to call > > setParsed() ? > > Are you trying to take all tables and ensure > > that they have a border "1" > > ? > > > > Regards, > > Somik > > ----- Original Message ----- > > From: "Terry Alexis Lurie" <tez...@ya...> > > To: <htm...@li...> > > Sent: Tuesday, May 20, 2003 11:03 AM > > Subject: [Htmlparser-developer] HTMLTag patch > > > > > > > Hi, this is further to my Bug report via the SF > > site. > > > > > > Basically, setParsed() wasn't effecting the actual > > > output of the Node thereafter. This made it a real > > > pain to highlight HTML, the example here being > > making > > > tables have a border of 1 to show them. > > > > > > Patch attached. Has some debugging commented out, > > > you'll want to get rid of this. I put a patch for > > th > > > testing code on the sourceforge bug report. > > > > > > Cheers, > > > > > > Terry. > > > > > > -------------------- > > > > > > *** HTMLTag.java 2003/05/20 12:33:42 1.1 > > > --- HTMLTag.java 2003/05/20 14:52:42 > > > *************** > > > *** 273,283 **** > > > } > > > /** > > > * Sets the parsed. > > > ! * @param parsed The parsed to set > > > */ > > > public void setParsed(Hashtable parsed) { > > > this.parsed = parsed; > > > } > > > /** > > > * Sets the strictTags. > > > * @param strictTags The strictTags to set > > > --- 273,306 ---- > > > } > > > /** > > > * Sets the parsed. > > > ! * Note: There is no guarantee that the > > attributes > > > will be: > > > ! * in the same order or case as originally. > > > ! * This isn't expected to be a problem, but > > > then again > > > ! * it never is, is it? > > > ! * Also: This currently makes no effort to place > > > the attribute > > > ! * in quotes if necessary. You have to take > > > care of that > > > ! * yourself > > > ! * @param parsed The hash of (key,value) > > attribute > > > pairs to set > > > */ > > > public void setParsed(Hashtable parsed) { > > > this.parsed = parsed; > > > + > > > + setText((String) parsed.get(this.TAGNAME)); > > //Set > > > the tag first > > > + for(Enumeration e = parsed.keys(); > > > e.hasMoreElements();) { > > > + String temp = (String) e.nextElement(); > > > + if (!temp.equals(this.TAGNAME)) { //Don't > > add > > > the tagname again > > > + append(" " + temp + '=' + ((String) > > > parsed.get(temp))); > > > + > > > + //Debug > > > + //System.out.println("setParsed appending key: " > > > + temp + " to value: " + ((String) > > parsed.get(temp))); > > > + } > > > + } > > > + > > > + //Debug > > > + //System.out.println("setParsed: completed, now > > > text is:" + getText()); > > > + > > > } > > > + > > > /** > > > * Sets the strictTags. > > > * @param strictTags The strictTags to set > > > > > > > > > ===== > > > > > > ------------------------------------------------------------ > > > Terry Alexis Lurie | 'Something witty > > that doesn't > > > Freelance Computer Engineer | look good with > > variable > > > United Kingdom | width fonts' - Most > > nerds > > > > > > __________________________________________________ > > > It's Samaritans' Week. Help Samaritans help > > others. > > > Call 08709 000032 to give or donate online now at > > http://www.samaritans.org/support/donations.shtm > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.net email is sponsored by: ObjectStore. > > > If flattening out C++ or Java code to make your > > application fit in a > > > relational database is painful, don't do it! Check > > out ObjectStore. > > > Now part of Progress Software. > > http://www.objectstore.net/sourceforge > > > _______________________________________________ > > > Htmlparser-developer mailing list > > > Htm...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: ObjectStore. > > If flattening out C++ or Java code to make your > > application fit in a > > relational database is painful, don't do it! Check > > out ObjectStore. > > Now part of Progress Software. > > http://www.objectstore.net/sourceforge > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > ===== > ------------------------------------------------------------ > Terry Alexis Lurie | 'Something witty that doesn't > Freelance Computer Engineer | look good with variable > United Kingdom | width fonts' - Most nerds > > __________________________________________________ > It's Samaritans' Week. Help Samaritans help others. > Call 08709 000032 to give or donate online now at http://www.samaritans.org/support/donations.shtm > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your application fit in a > relational database is painful, don't do it! Check out ObjectStore. > Now part of Progress Software. http://www.objectstore.net/sourceforge > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: <tez...@ya...> - 2003-05-21 10:32:08
|
A patch for HTMLTagTest.java. When you call registerScanners, they don't print the attributes properly. Here in this test case you get <A EN="" =="" HREF="http://www.google.com/webhp?hl"></A> from <a href=http://www.google.com/webhp?hl=en> See how you get the bogus atttributes EN="" and =="" ? This doesn't occur if you don't call registerScanners(); Terry ------- public void testHTMLOutputOfDifficultLinksWithRegisterScanners() throws HTMLParserException { createParser("<a href=http://www.google.com/webhp?hl=en>"); //Straight out of a real world example // assertTrue("Node should be a HTMLLinkTag",node[0] instanceof HTMLLinkTag); parser.registerScanners(); // Register standard scanners (Very Important) String stringTemp=""; for (HTMLEnumeration e = parser.elements(); e.hasMoreNodes();) { HTMLNode newNode = e.nextHTMLNode(); // Get the next HTML Node stringTemp = newNode.toHTML(); System.out.println(stringTemp); } assertEquals("Parsed text should be","<a href=http://www.google.com/webhp?hl=en>",stringTemp); } ===== ------------------------------------------------------------ Terry Alexis Lurie | 'Something witty that doesn't Freelance Computer Engineer | look good with variable United Kingdom | width fonts' - Most nerds __________________________________________________ It's Samaritans' Week. Help Samaritans help others. Call 08709 000032 to give or donate online now at http://www.samaritans.org/support/donations.shtm |