htmlparser-developer Mailing List for HTML Parser (Page 8)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Derrick O. <der...@au...> - 2003-10-01 14:06:07
|
Are there any opinions regarding Peter Lin's proposal to make htmlparser an official Jakarta project? -----Original Message----- From: peter lin [mailto:jmw...@ya...]=20 Sent: September 30, 2003 11:39 PM To: Derrick Oswald Subject: RE: question about using HTMLParser in Apache JMeter =20 I haven't found out the exact policy. Assuming the policy as I described is the official policy, is that OK with the developers of HTMLParser? I would like to help make HTMLParser an official Jakarta project. Does that sound appealing to you? I don't know the process for making it an official jakarta project, but I can look into it and get the details to you. =20 thanks again for your kindness and assistance. I know you and the other developers have put alot of blood and sweat into the code. Plus having it as an official jakarta project would give it a ton of exposure, since jakarta now accounts for a huge percentage of Apache's traffic. Also, I believe the ScrapeTags in taglib project could benefit from HTMLParser. If I remember correctly, it uses Tidy also and suffers from the same performance limitations. =20 peter lin Derrick Oswald <der...@au...> wrote: =20 So, you're taking a snapshot? I would have thought you would just include the jar file, and build it into the JMeter project, i.e. use Ant's zipfileset. If not, what's the procedure for updates? -----Original Message----- From: peter lin [mailto:jmw...@ya...]=20 Sent: September 30, 2003 9:48 AM To: Derrick Oswald Subject: RE: question about using HTMLParser in Apache JMeter =20 Hi derrick, =20 I talked to the maintainer of JMeter and got the information on the process. From my understanding of Apache guidelines as explained by Mike stover, it goes something like this. =20 1. Add Apache license to the source files 2. make sure all license and copyright information required by HTMLParser developers are present 3. big huge thanks to HTMLParser developers posted on JMeter 4. I will do code clean up so it conforms to JMeter code guidelines 5. check in code to JMeter cvs 6. change relevant code in JMeter to use HTMLParser =20 =20 Basically, Apache requires that donations give the foundation a non-exclusive license to the software. If that is ok with all the developers, I will continue with process. I've started running some benchmarks. When I am done I will send you the full results with source, so you can post it on HTMLParser site. Thanks again for your generousity. =20 =20 peter lin |
From: Derrick O. <der...@au...> - 2003-10-01 13:58:41
|
Peter Lin, who works on the JMeter project has performed some benchmarks that indicate htmlparser is 40% to 600% faster than Jtidy. See the wiki item http://htmlparser.sourceforge.net/docs/index.php/Benchmarks. |
From: Derrick O. <Der...@Ro...> - 2003-10-01 11:57:39
|
Working through some of the unit test failures from the lexer integration I came upon this one: /** * Bug reported by Gordon Deudney 2002-03-15 * Nested JSP Tags were not working */ public void testNestedTags() throws ParserException { String s = "input type=\"text\" value=\"<%=\"test\"%>\" name=\"text\""; String line = "<"+s+">"; createParser(line); parseAndAssertNodeCount(1); assertTrue("The node found should have been an Tag",node[0] instanceof Tag); Tag tag = (Tag) node[0]; assertEquals("Tag Contents",s,tag.getText()); } This implies that handling jsp would need a different mode in the lexer, presumably because the JSP processing happens prior to HTML processing. It might also mean that attribute values, string and remark nodes could have children, if we wish to actually parse jsp tags rather than just pass them through. Looking at two tags: <tagname attribute="<%= some text"> <tagname attribute="<%= "hello world" %>"> The first is legitimate HTML syntax, but illegal JSP. The second is legitimate JSP syntax but illegal HTML. Either the test is wrong and it should use single quotes: String s = "input type=\"text\" value=\"<%='test'%>\" name=\"text\""; or there will need to be a toggle on the lexer to switch to 'handling jsp' mode. Any thoughts. |
From: Derrick O. <der...@au...> - 2003-09-29 20:00:13
|
Peter, =20 Yes, you have permission. In fact we would be honoured and endeavor to assist you in any way necessary. =20 It's funny you should mention images and DOM. The latest versions of htmlparser includes an example application that does a very similar task; getting the images behind thumbnails (see lib/thumbelina.jar or package org.htmlparser.lexerapplications.thumbelina). It uses the low level Lexer package to avoid having to form the entire document model. I would check to see if something like this meets your needs. =20 If you need more than that (i.e. table parsing, balancing end tags, etc.) you'll have to go with the full parser. Unfortunately, the Lexer hasn't been completely integrated into the parser yet and the current CVS snapshot is a bit of a mess. With a bit of patience, this too will come to pass. =20 As far as performance comparisons go, I've only heard anecdotal evidence that htmlparser is faster. I suppose this could be an area of investigation. =20 Derrick -----Original Message----- From: peter lin [mailto:jmw...@ya...]=20 Sent: September 29, 2003 8:53 AM To: Derrick Oswald Subject: question about using HTMLParser in Apache JMeter =20 Hi derrick, =20 =20 I am a commiter on Apache's Jakarta JMeter project. I was wondering if we can get permission to use it. Since Apache foundation can't use LGPL code without permission, I'm hoping you're open to the idea. =20 here is a quick description of how I want to use it. JMeter currently is a load testing tool for HTTP, FTP, JDBC and Java. The HTTP plugin uses JTidy to parse the HTML and extract the images for download. =20 test plans with more than 20 clients performs poorly because of the high cost of DOM. JTidy generates DOM documents. One trick is to turn off download images in JMeter, but that doesn't solve the real problem. I want to replace JTidy with HTMLParser. I haven't done any performance comparison yet, but I'm guessing it should use less memory. =20 has anyone done a performance comparison between JTidy and HTMLParser? =20 peter lin =20 =20 =20 =20 _____ =20 Do you Yahoo!? The <http://shopping.yahoo.com/?__yltc=3Ds%3A150000443%2Cd%3A22708228%2Cslk%3= A text%2Csec%3Amail> New Yahoo! Shopping - with improved product search |
From: Derrick O. <Der...@Ro...> - 2003-09-29 19:55:06
|
OK, it's started... I've integrated the low level lexer code into the main parser code. Many things aren't working anymore Of the 448 unit tests 213 of them fail and 14 show exception faults. But the upside is 211 of the tests pass. So I'm dropping my current snapshot, opening it up to those who may wish to assist. See the TODO section. Big changes =========== A lot of files have been removed -------------------------------- htmlparser/NodeReader.java this is the primary class that's being replaced by Lexer, the method nextNode() replaces readElement() htmlparser/RemarkNodeParser.java remark nodes are now parsed in the Lexer main loop htmlparser/parserHelper/AttributeParser.java attributes are now parsed by the lexer before the tag is created, manipulated as a Vector of Attribute objects htmlparser/parserHelper/StringParser.java string nodes are now parsed by the lexer htmlparser/parserHelper/TagParser.java tags are now parsed by the lexer htmlparser/tags/EndTag.java this class was replaced by a call to the new isEndTag() method on the Tag class I labeled the repository with tag "PriorToLexerIntegration" just in case you want to retreive a file that's no longer there. Class Derivations ----------------- The StringNode, RemarkNode and tags.Tag class now derive from their lexeme counterparts in lexer.nodes instead of the other way around. NodeFactory ----------- The beginnings of a node factory interface are included. This was added so the lexer could return 'visitable' nodes to the parser. The parser acts as it's own node factory, as does the Lexer. NodeCount --------- The node count for parsing goes up in most cases because every whitespace (i.e. newline) now counts as a StringNode. This has whacked out a lot of the tests that were expecting fewer nodes or a certain type of node at a particular index. Attributes ---------- Attributes now maintain their order and case. The count of attributes also went up because whitespace is maintained within tags too. The storage in a Vector means the element 0 Attribute is actually the name of the tag, rather than having the $TAGNAME entry in a HashTable. TODO ===== visitEndTag() ----------------- The visitEndNode() method on the visitor interface should be put back. I shouldn't have removed it when EndTag was removed. Instead the accept() in Tag should dispatch to visitTag() or visitEndTag() based on isEndTag(). Serializable -------------- The Parser needs to be made serializable again. This involves a transient field down on the Source, I think, rather than having the whole Lexer transient in the Parser. TagData ------- This has been reworked to allow it to limp along under the new system, but it should really be removed. I think the reason for it (reduce the number of arguments to tag constructors) no longer applies, and a lot of the code could be easier to read if the Tag was more bean-like and had a zero args constructor with appropriate accessors. Helpers ------- I desparately want to get rid of these 'helper' classes. They are just obfuscating the code. Node Factory ------------ The factory concept needs to be extended with a TagFactory (extending NodeFactory) that has the signatures for creating all the possible types of tags there are, and then this needs to be used by all the scanners to create their specific tags. Scanners -------- The scanners may not be working, hard to tell without the unit tests running. I'm not sure that CompositeTagScanner is completely all right yet, It probably needs to be reworked based on the lexer. Unit Tests ---------- As mentioned, many of the unit tests expect toHtml() to produce capitalized and rearranged output. And parseAndAssertNodeCount() is expected not to include so many whitespace nodes. These need to be addressed. Documentation ------------- As of now, it's more likely that the javadocs are lying to you than providing any helpful advice. This needs to be reworked completely. As you can see there's lots of work to do, so anyone with a death wish can jump in. I'll be working my way from top to bottom of the TODO list and commiting and notifying the developer list after each of them. So go ahead and do a take from CVS and jump in the middle with anything that appeals. Keep the list posted and update your CVS tree often (or subscribe to the htmlparsre-cvs mailing list for interrupt driven notification rather than polled notification). Derrick |
From: Derrick O. <Der...@Ro...> - 2003-09-29 17:38:09
|
Fixed up the serializability. TODO ===== TagData ------- This has been reworked to allow it to limp along under the new system, but it should really be removed. I think the reason for it (reduce the number of arguments to tag constructors) no longer applies, and a lot of the code could be easier to read if the Tag was more bean-like and had a zero args constructor with appropriate accessors. Helpers ------- I desparately want to get rid of these 'helper' classes. They are just obfuscating the code. Node Factory ------------ The factory concept needs to be extended with a TagFactory (extending NodeFactory) that has the signatures for creating all the possible types of tags there are, and then this needs to be used by all the scanners to create their specific tags. Scanners -------- The scanners may not be working, hard to tell without the unit tests running. I'm not sure that CompositeTagScanner is completely all right yet, It probably needs to be reworked based on the lexer. Unit Tests ---------- As mentioned, many of the unit tests expect toHtml() to produce capitalized and rearranged output. And parseAndAssertNodeCount() is expected not to include so many whitespace nodes. These need to be addressed. Documentation ------------- As of now, it's more likely that the javadocs are lying to you than providing any helpful advice. This needs to be reworked completely. As you can see there's lots of work to do, so anyone with a death wish can jump in. I'll be working my way from top to bottom of the TODO list and commiting and notifying the developer list after each of them. So go ahead and do a take from CVS and jump in the middle with anything that appeals. Keep the list posted and update your CVS tree often (or subscribe to the htmlparsre-cvs mailing list for interrupt driven notification rather than polled notification). |
From: Derrick O. <Der...@Ro...> - 2003-09-29 11:52:49
|
Fixed up the broken visitor logic. Added some docos on NodeVisitor. TODO ===== Serializable -------------- The Parser needs to be made serializable again. This involves a transient field down on the Source, I think, rather than having the whole Lexer transient in the Parser. TagData ------- This has been reworked to allow it to limp along under the new system, but it should really be removed. I think the reason for it (reduce the number of arguments to tag constructors) no longer applies, and a lot of the code could be easier to read if the Tag was more bean-like and had a zero args constructor with appropriate accessors. Helpers ------- I desparately want to get rid of these 'helper' classes. They are just obfuscating the code. Node Factory ------------ The factory concept needs to be extended with a TagFactory (extending NodeFactory) that has the signatures for creating all the possible types of tags there are, and then this needs to be used by all the scanners to create their specific tags. Scanners -------- The scanners may not be working, hard to tell without the unit tests running. I'm not sure that CompositeTagScanner is completely all right yet, It probably needs to be reworked based on the lexer. Unit Tests ---------- As mentioned, many of the unit tests expect toHtml() to produce capitalized and rearranged output. And parseAndAssertNodeCount() is expected not to include so many whitespace nodes. These need to be addressed. Documentation ------------- As of now, it's more likely that the javadocs are lying to you than providing any helpful advice. This needs to be reworked completely. As you can see there's lots of work to do, so anyone with a death wish can jump in. I'll be working my way from top to bottom of the TODO list and commiting and notifying the developer list after each of them. So go ahead and do a take from CVS and jump in the middle with anything that appeals. Keep the list posted and update your CVS tree often (or subscribe to the htmlparsre-cvs mailing list for interrupt driven notification rather than polled notification). |
From: du du <tel...@ya...> - 2003-09-20 06:48:33
|
I want to write a piece of code to implement auto-fill web page form.I try to use NodeVisitor.But I puzzled at : 1)String [] tagsToBeFound = {"FORM","INPUT"}; TagFindingVisitor visitor = new TagFindingVisitor(tagsToBeFound); parser.visitAllNodesWith(visitor); Node [] allformTags = visitor.getTags(0); FormTag formtag = (FormTag)allformTags[0]; Node [] allinputTags = visitor.getTags(1); InputTag inputtag = (InputTag)allinputTags[0]; there is an java.lang.ClassCastException: org.htmlparser.tags.Tag why? 2)if I write a customized visitor how do i write visitFormTag and visitInputTag so as to collect all the form tag and input tag together? 3)if I use RemarkNode to mark a form tag its relative input tags together, how to decide the parameter tagContents? thanks for any hints --------------------------------- Post your free ad now! Yahoo! Canada Personals |
From: Derrick O. <Der...@ro...> - 2003-09-03 11:05:38
|
The LineCount property indicates the line being processed and will increase from 1 when lines are read as nodes are retrieved. It's not a count of the number of lines in a file or page. That should be available after reading all nodes. Derrick zheng zhen wrote: >I'm a beginner of htmlparser developer,It will be >appreciate if sb. can give me some hints.Here is the >code: >NodeReader nodeR = new NodeReader(new FileReader(new >File("C:/temp/b.html")),1000); > >System.out.println("nodeR.getLineCount():"+nodeR.getLineCount()); > > >problem is why nodeR.getLineCount() always 1. > >thans again > >zz > > > |
From: Derrick O. <Der...@ro...> - 2003-09-01 13:16:32
|
Please welcome Christopher Bird. Chris has been programming since '67, using such languages as IBM 360 Assembler, Basic, PL/I, Pascal, C, Smalltalk, Java and most, recently, C#. Chris has taught database design and system development methods and practicies. He makes his living doing IT work - matching technologies to business strategy - including b2b integration, IP telephony and business continuity. He is also retained by an investment bank to look at technology deals - from the technology's perspective. He uses HTMLParser in several projects, including one that sends HTML page text content to his cell phone via SMS. He is a member of the IEEE Computer society with a special interest in Software Engineering and Model Driven Development. Derrick |
From: Derrick O. <Der...@ro...> - 2003-09-01 02:28:44
|
I've uploaded a draft java coding standards for your perusal: http://htmlparser.sourceforge.net/articles/Java%20Coding%20Standards.doc http://htmlparser.sourceforge.net/articles/Java%20Coding%20Standards.pdf http://htmlparser.sourceforge.net/articles/Java%20Coding%20Standards.html Comments? |
From: Derrick O. <Der...@ro...> - 2003-08-30 23:35:16
|
Chris, Maybe you misconstrued the open source paradigm, it's only slightly organized anarchy. You can do whatever you want, with or without my, or any one elses, permission. It's not my code, I only rent. BTW, you don't lose the power of inheritance, you only constrain it to an interface driven methdology. Derrick Christopher Bird wrote: > Great, thanks. > > Yes I had thought that a factory mechanism would be a good way as well > - almost a decorator pattern at that point, I think. Actually, that > whole idea suggests a development paradigm for OO projects in general. > Of course you lose the power of inheritance (and the native engine > performance opportunities), but you gain a great deal of flexibility. > > I would love to hear some consensus (or at least informed opinions) on > this. > > I also plan (with your permission) to write to the IEEE Sofware > Engineering magazine (I am an IEEE member) and ask for opinions there > too. I would like your permission because I would like to reference > this concrete example. I would be glad to submit the letter to you > before sending it - I am not interested in making waves, but am always > interested in finding ways to make our profession better. Since the > HTMLParser is such a well executed piece of software, it strikes me > that it would make a good example for the letter. > > Regards > > Chris > > > > >> From: Derrick Oswald <Der...@ro...> >> To: sea...@ho... >> CC: htm...@li... >> Subject: Re: Adding methods to Tag >> Date: Fri, 29 Aug 2003 22:20:09 -0400 >> MIME-Version: 1.0 >> Received: from fep02-mail.bloor.is.net.cable.rogers.com >> ([66.185.86.72]) by mc8-f13.law1.hotmail.com with Microsoft >> SMTPSVC(5.0.2195.5600); Fri, 29 Aug 2003 19:20:12 -0700 >> Received: from rogers.com ([24.102.205.244]) by >> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >> <200...@ro...>; >> Fri, 29 Aug 2003 22:20:10 -0400 >> X-Message-Info: JGTYoYF78jHaxjh7Y9B8uHCMhasyqgjM >> Message-ID: <3F5...@ro...> >> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >> Gecko/20030225 >> X-Accept-Language: en-us, en >> References: <Sea...@ho...> >> In-Reply-To: <Sea...@ho...> >> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] using >> ID <der...@ro...> at Fri, 29 Aug 2003 22:20:10 -0400 >> Return-Path: Der...@ro... >> X-OriginalArrivalTime: 30 Aug 2003 02:20:12.0495 (UTC) >> FILETIME=[3DF5DDF0:01C36E9D] >> >> Chris, >> >> I'm opening this up to a wider audience, because it may have been >> solved before, or might be of interest to others with the same problem. >> >> The basic problem is how to add functionality like supportsColor() to >> base classes, like Tag, without recompiling the whole class heirarchy. >> >> One way would be to join the htmlparser project as a developer and >> just add it, if it's germane to others besides yourself. If it's not, >> then a bolt-on is needed. >> >> One way to handle this problem is a 'Factory' mechanism. A >> 'deep-in-the-bowels' class would ask the 'factory' for a tag, i.e. >> factory.makeTag ("Form"). So you would wedge your own factory in >> there. Choosing the factory is usually done with a Class.forName() >> where the string specifying the class comes from a configuration >> setting. With some design effort, we should be able to come up with >> a definition for a factory class and a suitable set of interfaces >> which the whole project would be refactored to use, i.e. the IFormTag >> interface extends the ICompositeTag interface and adds form related >> methods; the ICompositeTag interface extends the IBaseTag interface >> and adds child accessors; and nothing references FormTag directly >> except the factory. >> >> So then there is the problem of your factory supplying your special >> tag that implements IFormTag *and* IColorSupport when makeTag >> ("Form") is called. Most of what you need is already written in >> FormTag, you just need to add a couple of methods. I think this is >> where dynamic proxies come in: >> http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html. The >> InvocationHandler would determine if the target method comes from >> IColorSupport, and if so perform the needful directly. Otherwise it >> would delegate to the wrapped tag object. This means the whole >> htmlparser project shuttles wrapped tag objects around and doesn't >> know it, till they bubble up to your code where you cast them to an >> IColorSupport and invoke the supportsColor() method: >> >> Parser.setTagFactory ("ChrisBirdFactory"); >> Parser parser = new Parser (url); >> parser.registerScanners (); >> for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) >> { >> Node node = e.nextNode (); >> if (node instanceof Tag) // I presume all tags, but not nodes, >> support IColorSupport >> ((IColorSupport)node).supportsColor (); >> } >> >> Derrick >> >> Christopher Bird wrote: >> >>> Thanks for the reply, that is my dilemma. >>> >>> I am an old Smalltalk programmer from years gone by and have always >>> used what is sometimes called responsibility driven development. So >>> (at least in my head), the responsibilty for knowing that a tag >>> supports color or BG color is the tag's responsibilty, and not the >>> responsibility of some agent acting on the tag. >>> >>> The trouble with that style of development, especially for >>> "packaged" software is that you(I) find yourself(myself) in a bind >>> like this one. >>> >>> Indeed I had to recompile the whole package! But since I have the >>> source in my project (to help me learn the intricacies of certain >>> behaviors - especially the creation of handlers for very complex >>> tags) that was no big deal for me. However now I am in violation of >>> protocol for OpenSource, I am sure. >>> >>> This really gets to the crux of OOness and Open Source development. >>> When there are requirements for classes high in the inheritance >>> hierarchy and they do "rightfully" belong there how does one get >>> them there - short term to overcome a specific issue, and long term >>> as part of the overall release cycle of the product. >>> >>> I am probably not the first to wonder this! >>> >>> BTW, I love the implementation. It took some mind-bending to get >>> used to it at first - again separating the responsibilities out so I >>> can factor my solutions properly was initialy a challenge, but I >>> have become very productive. >>> >>> Thank you so much for an excellent piece of technology. >>> >>> Regards >>> >>> Chris >>> >>> >>>> From: Derrick Oswald <Der...@ro...> >>>> To: Christopher Bird <se...@us...> >>>> Subject: Re: Adding methods to Tag >>>> Date: Fri, 29 Aug 2003 07:41:08 -0400 >>>> MIME-Version: 1.0 >>>> Received: from sc8-sf-mx1.sourceforge.net ([66.35.250.206]) by >>>> mc4-f42.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); >>>> Fri, 29 Aug 2003 04:41:47 -0700 >>>> Received: from fep02-mail.bloor.is.net.cable.rogers.com >>>> ([66.185.86.72])by sc8-sf-mx1.sourceforge.net with esmtp (Exim >>>> 4.22)id 19shdQ-0004My-0Sfor se...@us...; Fri, 29 >>>> Aug 2003 04:41:44 -0700 >>>> Received: from rogers.com ([24.102.205.244]) by >>>> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >>>> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >>>> <200...@ro...> >>>> for <se...@us...>; Fri, 29 Aug >>>> 2003 07:41:12 -0400 >>>> X-Message-Info: JGTYoYF78jGnyWgKUPy676KmG5L9JDoH >>>> Message-ID: <3F4...@ro...> >>>> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >>>> Gecko/20030225 >>>> X-Accept-Language: en-us, en >>>> References: <E19...@sc...> >>>> In-Reply-To: <E19...@sc...> >>>> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >>>> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] >>>> using ID <der...@ro...> at Fri, 29 Aug 2003 07:41:12 -0400 >>>> X-Spam-Score: -2.1 (--) >>>> X-Spam-Report: -2.1/5.0The original message has been attached along >>>> with this report, soyou can recognize or block similar mail in >>>> future.See http://spamassassin.org/tag/ for more details.Content >>>> preview: Chris, It's unclear how your ChrisTag method would >>>> workwithout recompiling the whole package. The Tag class >>>> extendsAbstractNode, so presumably ChrisTag would extend >>>> AbstractNode and addthe methods you want, then Tag would extend >>>> ChrisTag. You would stillneed to 'fix' each new release by >>>> doctoring Tag. [...] Content analysis details: (-2.10 points, 5 >>>> required)USER_AGENT_MOZILLA_UA (0.0 points) User-Agent header >>>> indicates a non-spam MUA (Mozilla)IN_REP_TO (-0.5 points) >>>> Has a In-Reply-To headerX_ACCEPT_LANG (-0.1 points) Has a >>>> X-Accept-Language headerREFERENCES (-0.5 points) Has a >>>> valid-looking References headerEMAIL_ATTRIBUTION (-0.5 points) >>>> BODY: Contains what looks like an email >>>> attributionREPLY_WITH_QUOTES (-0.5 points) Reply with quoted text >>>> Return-Path: Der...@ro... >>>> X-OriginalArrivalTime: 29 Aug 2003 11:41:50.0512 (UTC) >>>> FILETIME=[89217300:01C36E22] >>>> >>>> Chris, >>>> >>>> It's unclear how your ChrisTag method would work without >>>> recompiling the whole package. The Tag class extends AbstractNode, >>>> so presumably ChrisTag would extend AbstractNode and add the >>>> methods you want, then Tag would extend ChrisTag. You would still >>>> need to 'fix' each new release by doctoring Tag. >>>> >>>> The best way is probably to have a class external to everything >>>> with the static methods needed (see Tag.breaksFlow() for example >>>> code): >>>> class ColorKnowledge { >>>> public static boolean supportsColor (Node node) >>>> { return >>>> (listofNodesSupportingForegroundColor.contains(node.getText().toUpperCase()));} >>>> >>>> >>>> ... >>>> >>>> If it's generic enough, submit it and we'll add it to Node. >>>> >>>> Derrick >>>> >>>> Christopher Bird wrote: >>>> >>>>> I am new to using OpenSource code. I have found it very >>>>> >>>>> helpful, and am using the HTMLParser for a number of >>>>> >>>>> purposes. >>>>> >>>>> >>>>> >>>>> I am wanting to add some code to Tag - especially the >>>>> >>>>> following two methods: >>>>> >>>>> >>>>> >>>>> public boolean supportsColor() >>>>> >>>>> /* returns true iff the color attribute is valid for this tag >>>>> >>>>> >>>>> >>>>> public boolean supportsBGColor () >>>>> >>>>> >>>>> >>>>> /* Returns true iff the bgColor attribute is valid for this tag >>>>> >>>>> >>>>> >>>>> I would be happy if you guys were to add that, but failing >>>>> >>>>> that what is the process if I have to do it myself? There may >>>>> >>>>> be a bunch of other things that I will want to add to Tag - >>>>> >>>>> for handling some of my own custom behaviors. >>>>> >>>>> >>>>> >>>>> I can see a couple of ways of doing this (none pretty). One is >>>>> >>>>> to create a new ChrisTag superclass and change Tag's >>>>> >>>>> implements clause to implements ChrisTag. I can then define >>>>> >>>>> my methods there. Of course the community doesn't get the >>>>> >>>>> benefit (? dubious in some cases, I fear) of my additions. >>>>> >>>>> >>>>> >>>>> The other obvious way is simply to add the methods to Tag >>>>> >>>>> itself. I am not wild about doing that either because as I >>>>> >>>>> download new editions of HTMLParser, my changes get lost - >>>>> >>>>> especially since I am a solo practitioner at the moment and >>>>> >>>>> am not using a source code management system. >>>>> >>>>> >>>>> >>>>> Any assistance would be gratefully appreciated - both to the >>>>> >>>>> short term (immediate) problem and to the general question. >>>>> >>>>> >>>>> >>>>> Thanks in advance >>>>> >>>>> >>>>> >>>>> Chris Bird >>>> |
From: Joshua K. <jo...@in...> - 2003-08-30 23:28:52
|
Derrick, Thanks for sharing this email. I have some opinions on this -- can't email them at the moment. Will do soon... regards jk Derrick Oswald wrote: > Chris, > > I'm opening this up to a wider audience, because it may have been solved > before, or might be of interest to others with the same problem. > > The basic problem is how to add functionality like supportsColor() to > base classes, like Tag, without recompiling the whole class heirarchy. > > One way would be to join the htmlparser project as a developer and just > add it, if it's germane to others besides yourself. If it's not, then a > bolt-on is needed. > > One way to handle this problem is a 'Factory' mechanism. A > 'deep-in-the-bowels' class would ask the 'factory' for a tag, i.e. > factory.makeTag ("Form"). So you would wedge your own factory in there. > Choosing the factory is usually done with a Class.forName() where the > string specifying the class comes from a configuration setting. With > some design effort, we should be able to come up with a definition for a > factory class and a suitable set of interfaces which the whole project > would be refactored to use, i.e. the IFormTag interface extends the > ICompositeTag interface and adds form related methods; the ICompositeTag > interface extends the IBaseTag interface and adds child accessors; and > nothing references FormTag directly except the factory. > > So then there is the problem of your factory supplying your special tag > that implements IFormTag *and* IColorSupport when makeTag ("Form") is > called. Most of what you need is already written in FormTag, you just > need to add a couple of methods. I think this is where dynamic proxies > come in: http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html. > The InvocationHandler would determine if the target method comes from > IColorSupport, and if so perform the needful directly. Otherwise it > would delegate to the wrapped tag object. This means the whole > htmlparser project shuttles wrapped tag objects around and doesn't know > it, till they bubble up to your code where you cast them to an > IColorSupport and invoke the supportsColor() method: > > Parser.setTagFactory ("ChrisBirdFactory"); > Parser parser = new Parser (url); > parser.registerScanners (); > for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) > { > Node node = e.nextNode (); > if (node instanceof Tag) // I presume all tags, but not nodes, > support IColorSupport > ((IColorSupport)node).supportsColor (); > } > > Derrick > > Christopher Bird wrote: > >> Thanks for the reply, that is my dilemma. >> >> I am an old Smalltalk programmer from years gone by and have always >> used what is sometimes called responsibility driven development. So >> (at least in my head), the responsibilty for knowing that a tag >> supports color or BG color is the tag's responsibilty, and not the >> responsibility of some agent acting on the tag. >> >> The trouble with that style of development, especially for "packaged" >> software is that you(I) find yourself(myself) in a bind like this one. >> >> Indeed I had to recompile the whole package! But since I have the >> source in my project (to help me learn the intricacies of certain >> behaviors - especially the creation of handlers for very complex tags) >> that was no big deal for me. However now I am in violation of protocol >> for OpenSource, I am sure. >> >> This really gets to the crux of OOness and Open Source development. >> When there are requirements for classes high in the inheritance >> hierarchy and they do "rightfully" belong there how does one get them >> there - short term to overcome a specific issue, and long term as part >> of the overall release cycle of the product. >> >> I am probably not the first to wonder this! >> >> BTW, I love the implementation. It took some mind-bending to get used >> to it at first - again separating the responsibilities out so I can >> factor my solutions properly was initialy a challenge, but I have >> become very productive. >> >> Thank you so much for an excellent piece of technology. >> >> Regards >> >> Chris >> >> >>> From: Derrick Oswald <Der...@ro...> >>> To: Christopher Bird <se...@us...> >>> Subject: Re: Adding methods to Tag >>> Date: Fri, 29 Aug 2003 07:41:08 -0400 >>> MIME-Version: 1.0 >>> Received: from sc8-sf-mx1.sourceforge.net ([66.35.250.206]) by >>> mc4-f42.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Fri, >>> 29 Aug 2003 04:41:47 -0700 >>> Received: from fep02-mail.bloor.is.net.cable.rogers.com >>> ([66.185.86.72])by sc8-sf-mx1.sourceforge.net with esmtp (Exim >>> 4.22)id 19shdQ-0004My-0Sfor se...@us...; Fri, 29 >>> Aug 2003 04:41:44 -0700 >>> Received: from rogers.com ([24.102.205.244]) by >>> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >>> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >>> <200...@ro...> >>> for <se...@us...>; Fri, 29 Aug >>> 2003 07:41:12 -0400 >>> X-Message-Info: JGTYoYF78jGnyWgKUPy676KmG5L9JDoH >>> Message-ID: <3F4...@ro...> >>> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >>> Gecko/20030225 >>> X-Accept-Language: en-us, en >>> References: <E19...@sc...> >>> In-Reply-To: <E19...@sc...> >>> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >>> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] using >>> ID <der...@ro...> at Fri, 29 Aug 2003 07:41:12 -0400 >>> X-Spam-Score: -2.1 (--) >>> X-Spam-Report: -2.1/5.0The original message has been attached along >>> with this report, soyou can recognize or block similar mail in >>> future.See http://spamassassin.org/tag/ for more details.Content >>> preview: Chris, It's unclear how your ChrisTag method would >>> workwithout recompiling the whole package. The Tag class >>> extendsAbstractNode, so presumably ChrisTag would extend AbstractNode >>> and addthe methods you want, then Tag would extend ChrisTag. You >>> would stillneed to 'fix' each new release by doctoring Tag. [...] >>> Content analysis details: (-2.10 points, 5 >>> required)USER_AGENT_MOZILLA_UA (0.0 points) User-Agent header >>> indicates a non-spam MUA (Mozilla)IN_REP_TO (-0.5 points) >>> Has a In-Reply-To headerX_ACCEPT_LANG (-0.1 points) Has a >>> X-Accept-Language headerREFERENCES (-0.5 points) Has a >>> valid-looking References headerEMAIL_ATTRIBUTION (-0.5 points) BODY: >>> Contains what looks like an email attributionREPLY_WITH_QUOTES (-0.5 >>> points) Reply with quoted text >>> Return-Path: Der...@ro... >>> X-OriginalArrivalTime: 29 Aug 2003 11:41:50.0512 (UTC) >>> FILETIME=[89217300:01C36E22] >>> >>> Chris, >>> >>> It's unclear how your ChrisTag method would work without recompiling >>> the whole package. The Tag class extends AbstractNode, so presumably >>> ChrisTag would extend AbstractNode and add the methods you want, then >>> Tag would extend ChrisTag. You would still need to 'fix' each new >>> release by doctoring Tag. >>> >>> The best way is probably to have a class external to everything with >>> the static methods needed (see Tag.breaksFlow() for example code): >>> class ColorKnowledge { >>> public static boolean supportsColor (Node node) >>> { return >>> (listofNodesSupportingForegroundColor.contains(node.getText().toUpperCase()));} >>> >>> ... >>> >>> If it's generic enough, submit it and we'll add it to Node. >>> >>> Derrick >>> >>> Christopher Bird wrote: >>> >>>> I am new to using OpenSource code. I have found it very >>>> >>>> helpful, and am using the HTMLParser for a number of >>>> >>>> purposes. >>>> >>>> >>>> >>>> I am wanting to add some code to Tag - especially the >>>> >>>> following two methods: >>>> >>>> >>>> >>>> public boolean supportsColor() >>>> >>>> /* returns true iff the color attribute is valid for this tag >>>> >>>> >>>> >>>> public boolean supportsBGColor () >>>> >>>> >>>> >>>> /* Returns true iff the bgColor attribute is valid for this tag >>>> >>>> >>>> >>>> I would be happy if you guys were to add that, but failing >>>> >>>> that what is the process if I have to do it myself? There may >>>> >>>> be a bunch of other things that I will want to add to Tag - >>>> >>>> for handling some of my own custom behaviors. >>>> >>>> >>>> >>>> I can see a couple of ways of doing this (none pretty). One is >>>> >>>> to create a new ChrisTag superclass and change Tag's >>>> >>>> implements clause to implements ChrisTag. I can then define >>>> >>>> my methods there. Of course the community doesn't get the >>>> >>>> benefit (? dubious in some cases, I fear) of my additions. >>>> >>>> >>>> >>>> The other obvious way is simply to add the methods to Tag >>>> >>>> itself. I am not wild about doing that either because as I >>>> >>>> download new editions of HTMLParser, my changes get lost - >>>> >>>> especially since I am a solo practitioner at the moment and >>>> >>>> am not using a source code management system. >>>> >>>> >>>> >>>> Any assistance would be gratefully appreciated - both to the >>>> >>>> short term (immediate) problem and to the general question. >>>> >>>> >>>> >>>> Thanks in advance >>>> >>>> >>>> >>>> Chris Bird >>>> >>>> > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-08-30 02:20:43
|
Chris, I'm opening this up to a wider audience, because it may have been solved before, or might be of interest to others with the same problem. The basic problem is how to add functionality like supportsColor() to base classes, like Tag, without recompiling the whole class heirarchy. One way would be to join the htmlparser project as a developer and just add it, if it's germane to others besides yourself. If it's not, then a bolt-on is needed. One way to handle this problem is a 'Factory' mechanism. A 'deep-in-the-bowels' class would ask the 'factory' for a tag, i.e. factory.makeTag ("Form"). So you would wedge your own factory in there. Choosing the factory is usually done with a Class.forName() where the string specifying the class comes from a configuration setting. With some design effort, we should be able to come up with a definition for a factory class and a suitable set of interfaces which the whole project would be refactored to use, i.e. the IFormTag interface extends the ICompositeTag interface and adds form related methods; the ICompositeTag interface extends the IBaseTag interface and adds child accessors; and nothing references FormTag directly except the factory. So then there is the problem of your factory supplying your special tag that implements IFormTag *and* IColorSupport when makeTag ("Form") is called. Most of what you need is already written in FormTag, you just need to add a couple of methods. I think this is where dynamic proxies come in: http://java.sun.com/j2se/1.3/docs/guide/reflection/proxy.html. The InvocationHandler would determine if the target method comes from IColorSupport, and if so perform the needful directly. Otherwise it would delegate to the wrapped tag object. This means the whole htmlparser project shuttles wrapped tag objects around and doesn't know it, till they bubble up to your code where you cast them to an IColorSupport and invoke the supportsColor() method: Parser.setTagFactory ("ChrisBirdFactory"); Parser parser = new Parser (url); parser.registerScanners (); for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) { Node node = e.nextNode (); if (node instanceof Tag) // I presume all tags, but not nodes, support IColorSupport ((IColorSupport)node).supportsColor (); } Derrick Christopher Bird wrote: > Thanks for the reply, that is my dilemma. > > I am an old Smalltalk programmer from years gone by and have always > used what is sometimes called responsibility driven development. So > (at least in my head), the responsibilty for knowing that a tag > supports color or BG color is the tag's responsibilty, and not the > responsibility of some agent acting on the tag. > > The trouble with that style of development, especially for "packaged" > software is that you(I) find yourself(myself) in a bind like this one. > > Indeed I had to recompile the whole package! But since I have the > source in my project (to help me learn the intricacies of certain > behaviors - especially the creation of handlers for very complex tags) > that was no big deal for me. However now I am in violation of protocol > for OpenSource, I am sure. > > This really gets to the crux of OOness and Open Source development. > When there are requirements for classes high in the inheritance > hierarchy and they do "rightfully" belong there how does one get them > there - short term to overcome a specific issue, and long term as part > of the overall release cycle of the product. > > I am probably not the first to wonder this! > > BTW, I love the implementation. It took some mind-bending to get used > to it at first - again separating the responsibilities out so I can > factor my solutions properly was initialy a challenge, but I have > become very productive. > > Thank you so much for an excellent piece of technology. > > Regards > > Chris > > >> From: Derrick Oswald <Der...@ro...> >> To: Christopher Bird <se...@us...> >> Subject: Re: Adding methods to Tag >> Date: Fri, 29 Aug 2003 07:41:08 -0400 >> MIME-Version: 1.0 >> Received: from sc8-sf-mx1.sourceforge.net ([66.35.250.206]) by >> mc4-f42.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Fri, >> 29 Aug 2003 04:41:47 -0700 >> Received: from fep02-mail.bloor.is.net.cable.rogers.com >> ([66.185.86.72])by sc8-sf-mx1.sourceforge.net with esmtp (Exim >> 4.22)id 19shdQ-0004My-0Sfor se...@us...; Fri, 29 >> Aug 2003 04:41:44 -0700 >> Received: from rogers.com ([24.102.205.244]) by >> fep02-mail.bloor.is.net.cable.rogers.com (InterMail >> vM.5.01.05.12 201-253-122-126-112-20020820) with ESMTP id >> <200...@ro...> >> for <se...@us...>; Fri, 29 Aug >> 2003 07:41:12 -0400 >> X-Message-Info: JGTYoYF78jGnyWgKUPy676KmG5L9JDoH >> Message-ID: <3F4...@ro...> >> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) >> Gecko/20030225 >> X-Accept-Language: en-us, en >> References: <E19...@sc...> >> In-Reply-To: <E19...@sc...> >> X-Authentication-Info: Submitted using SMTP AUTH PLAIN at >> fep02-mail.bloor.is.net.cable.rogers.com from [24.102.205.244] using >> ID <der...@ro...> at Fri, 29 Aug 2003 07:41:12 -0400 >> X-Spam-Score: -2.1 (--) >> X-Spam-Report: -2.1/5.0The original message has been attached along >> with this report, soyou can recognize or block similar mail in >> future.See http://spamassassin.org/tag/ for more details.Content >> preview: Chris, It's unclear how your ChrisTag method would >> workwithout recompiling the whole package. The Tag class >> extendsAbstractNode, so presumably ChrisTag would extend AbstractNode >> and addthe methods you want, then Tag would extend ChrisTag. You >> would stillneed to 'fix' each new release by doctoring Tag. [...] >> Content analysis details: (-2.10 points, 5 >> required)USER_AGENT_MOZILLA_UA (0.0 points) User-Agent header >> indicates a non-spam MUA (Mozilla)IN_REP_TO (-0.5 points) >> Has a In-Reply-To headerX_ACCEPT_LANG (-0.1 points) Has a >> X-Accept-Language headerREFERENCES (-0.5 points) Has a >> valid-looking References headerEMAIL_ATTRIBUTION (-0.5 points) BODY: >> Contains what looks like an email attributionREPLY_WITH_QUOTES (-0.5 >> points) Reply with quoted text >> Return-Path: Der...@ro... >> X-OriginalArrivalTime: 29 Aug 2003 11:41:50.0512 (UTC) >> FILETIME=[89217300:01C36E22] >> >> Chris, >> >> It's unclear how your ChrisTag method would work without recompiling >> the whole package. The Tag class extends AbstractNode, so presumably >> ChrisTag would extend AbstractNode and add the methods you want, then >> Tag would extend ChrisTag. You would still need to 'fix' each new >> release by doctoring Tag. >> >> The best way is probably to have a class external to everything with >> the static methods needed (see Tag.breaksFlow() for example code): >> class ColorKnowledge { >> public static boolean supportsColor (Node node) >> { return >> (listofNodesSupportingForegroundColor.contains(node.getText().toUpperCase()));} >> >> ... >> >> If it's generic enough, submit it and we'll add it to Node. >> >> Derrick >> >> Christopher Bird wrote: >> >>> I am new to using OpenSource code. I have found it very >>> >>> helpful, and am using the HTMLParser for a number of >>> >>> purposes. >>> >>> >>> >>> I am wanting to add some code to Tag - especially the >>> >>> following two methods: >>> >>> >>> >>> public boolean supportsColor() >>> >>> /* returns true iff the color attribute is valid for this tag >>> >>> >>> >>> public boolean supportsBGColor () >>> >>> >>> >>> /* Returns true iff the bgColor attribute is valid for this tag >>> >>> >>> >>> I would be happy if you guys were to add that, but failing >>> >>> that what is the process if I have to do it myself? There may >>> >>> be a bunch of other things that I will want to add to Tag - >>> >>> for handling some of my own custom behaviors. >>> >>> >>> >>> I can see a couple of ways of doing this (none pretty). One is >>> >>> to create a new ChrisTag superclass and change Tag's >>> >>> implements clause to implements ChrisTag. I can then define >>> >>> my methods there. Of course the community doesn't get the >>> >>> benefit (? dubious in some cases, I fear) of my additions. >>> >>> >>> >>> The other obvious way is simply to add the methods to Tag >>> >>> itself. I am not wild about doing that either because as I >>> >>> download new editions of HTMLParser, my changes get lost - >>> >>> especially since I am a solo practitioner at the moment and >>> >>> am not using a source code management system. >>> >>> >>> >>> Any assistance would be gratefully appreciated - both to the >>> >>> short term (immediate) problem and to the general question. >>> >>> >>> >>> Thanks in advance >>> >>> >>> >>> Chris Bird >>> >>> |
From: Couball, J. <jam...@co...> - 2003-08-27 17:01:13
|
Although I personally prefer tabs, I would +1 any consistent coding style. FWIW, you may want to loosely enforce coding standards through the use of the Checkstyle ant task (see http://checkstyle.sourceforge.net/). This could produce a report of violations without really impacting the project. An example report is here: http://maven.apache.org/checkstyle-report.html. Sincerely, James. -----Original Message----- From: Fernando Machado [mailto:fn...@ne...]=20 Sent: Tuesday, August 26, 2003 11:13 PM To: htm...@li... Hi all, +1 for Sun Coding Standard Regards, -fmc > Subject: AW: [Htmlparser-developer] tabs > Date: Tue, 26 Aug 2003 09:37:49 +0200 > From: "Holger Stenzhorn" <Hol...@xt...> > To: <htm...@li...> > Reply-To: htm...@li... (...) > How about the original one from Sun =3D > (http://java.sun.com/docs/codeconv/)? (...) >=20 > Cheers, > Holger >=20 > From: "Somik Raha" <so...@ya...> > To: <htm...@li...> > Subject: Re: [Htmlparser-developer] Re: Htmlparser-developer digest, Vol 1 #255 - 1 msg > Date: Tue, 26 Aug 2003 20:32:31 -0400 > Reply-To: htm...@li... >=20 > Hi Folks, (...) >=20 > I would personally prefer to maintain the tabs, and follow the Sun > Microsystems java coding standard. > http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html >=20 (...) >=20 > Regards > Somik >=20 ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: zheng z. <mon...@ya...> - 2003-08-27 14:46:39
|
I'm a beginner of htmlparser developer,It will be appreciate if sb. can give me some hints.Here is the code: NodeReader nodeR = new NodeReader(new FileReader(new File("C:/temp/b.html")),1000); System.out.println("nodeR.getLineCount():"+nodeR.getLineCount()); problem is why nodeR.getLineCount() always 1. thans again zz ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca |
From: Derrick O. <Der...@ro...> - 2003-08-27 11:21:03
|
Tabs are an issue because of the ambiguity. A space is a space. A tab can be anything. The original intent of tabs (in typewriters) was to allow quick columnar alignment. This obviously doesn't work in electronic documents where the interpretation is dependant on the program used. Try opening a file formatted with a tabstop setting of 4 in a program (like notepad) that has a hard-coded tabstop spacing of 8. Some think that tabs conserve disk space (one tab is worth 8 spaces right) but hard disk space is pennies a megabyte and compression programs handle lots of spaces very nicely for transmission. I just think they are an anachronism that's long since lost it's usefulness. The Sun "Code Conventions for the Java Programming Language" available at http://java.sun.com/docs/codeconv is a good basis from which to start. I'm adjusting it a bit to account for open source, cvs and htmlparser specifics. I disagree with some of it's suggestions though, like: Four spaces should be used as the unit of indentation. The exact construction of the indentation (spaces vs. tabs) is unspecified. Tabs must be set exactly every 8 spaces (not 4). I mean this is just sloppy. It should be: Four spaces should be used as the unit of indentation. The use of tabs, vertical tab, form-feed, carriage-return and other control characters, other than newline, to control displayed formatting is forbidden. I hope to provide a rationale for where I have differed within the document. Derrick Somik Raha wrote: >Hi Derrick, > Hmm.. If you mean that Eclipse will auto-convert tabs to spaces, maybe I >misunderstood. As long as one does not have to press space four times... > > Bytway, just curious as to why tabs are a problem in the first place.. > > > >>I'm working on a Java Coding Standards document. >> >> > > What do you think of the Sun coding standard? > >Cheers, >Somik >----- Original Message ----- >From: "Derrick Oswald" <Der...@ro...> >To: <htm...@li...> >Sent: Tuesday, August 26, 2003 9:09 PM >Subject: Re: [Htmlparser-developer] tabs > > > > >>Somik, >> >>The (rather meager) response to an earlier poll indicated that Eclipse >>was the most popular by far, followed by a few NetBeans and JBuilder users. >> >> >>Replacing tabs with spaces is automatic in modern editors and IDEs. For >>Eclipse, you need to make the settings in the Java/Editor Typing tab. >>You also need to make the setting in the Java/Code Formatter Style tab. >>For Netbeans use Tools-Options-Editing-Editor Settings-Java Editor-Java >>Indentation Engine-...-Expand Tabs To Spaces-True. I don't know how to >>do it in JBuilder, but I know it can be done. >> >>I'm working on a Java Coding Standards document. >> >>Derrick >> >> |
From: Fernando M. <fn...@ne...> - 2003-08-27 06:12:32
|
Hi all, +1 for Sun Coding Standard Regards, -fmc > Subject: AW: [Htmlparser-developer] tabs > Date: Tue, 26 Aug 2003 09:37:49 +0200 > From: "Holger Stenzhorn" <Hol...@xt...> > To: <htm...@li...> > Reply-To: htm...@li... (...) > How about the original one from Sun = > (http://java.sun.com/docs/codeconv/)? (...) > > Cheers, > Holger > > From: "Somik Raha" <so...@ya...> > To: <htm...@li...> > Subject: Re: [Htmlparser-developer] Re: Htmlparser-developer digest, Vol 1 #255 - 1 msg > Date: Tue, 26 Aug 2003 20:32:31 -0400 > Reply-To: htm...@li... > > Hi Folks, (...) > > I would personally prefer to maintain the tabs, and follow the Sun > Microsystems java coding standard. > http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html > (...) > > Regards > Somik > |
From: Somik R. <so...@ya...> - 2003-08-27 02:11:18
|
Hi Derrick, Hmm.. If you mean that Eclipse will auto-convert tabs to spaces, maybe I misunderstood. As long as one does not have to press space four times... Bytway, just curious as to why tabs are a problem in the first place.. > I'm working on a Java Coding Standards document. What do you think of the Sun coding standard? Cheers, Somik ----- Original Message ----- From: "Derrick Oswald" <Der...@ro...> To: <htm...@li...> Sent: Tuesday, August 26, 2003 9:09 PM Subject: Re: [Htmlparser-developer] tabs > Somik, > > The (rather meager) response to an earlier poll indicated that Eclipse > was the most popular by far, followed by a few NetBeans and JBuilder users. > > Replacing tabs with spaces is automatic in modern editors and IDEs. For > Eclipse, you need to make the settings in the Java/Editor Typing tab. > You also need to make the setting in the Java/Code Formatter Style tab. > For Netbeans use Tools-Options-Editing-Editor Settings-Java Editor-Java > Indentation Engine-...-Expand Tabs To Spaces-True. I don't know how to > do it in JBuilder, but I know it can be done. > > I'm working on a Java Coding Standards document. > > Derrick > > Somik Raha wrote: > > >Hi Folks, > > > >For what its worth, tabs are an incredibly useful and standard way of > >formatting code. A lot of folks use Eclipse, and pressing the tab key every > >so many seconds comes really naturally. It also reduces the risk of RSI > >(pressing space four times as opposed to tab once). Note that the space key > >is a big killer - it really hurts your thumb in the long run. A > >state-of-the-art ergonomic keyboard that I am trying to adjust to takes the > >space key away from the thumb. (Does this reason sound silly? Look at your > >fingers, do you feel any pain in your thumbs? Or your shoulder or your neck? > >Do you want to avoid surgery?) > > > >It would also be good to know what IDE most developers on this project use. > > > >I would personally prefer to maintain the tabs, and follow the Sun > >Microsystems java coding standard. > >http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html > > > >There are pieces of code where the braces are not consistent. > > > >I agree with Fernando - we should have an "official" coding standard that is > >clearly communicated on the site. > > > >Finally, the coding standard is for active developers who must feel > >comfortable with it. My views are secondary to the active developers, as I > >have ceased to contribute beyond an occasional code-cleanup. > > > >Regards > >Somik > > > > I n d u s t r i a l L o g i c , I n c . > >Somik Raha > >Extreme Programmer & Coach > >http://industriallogic.com > >http://industrialxp.org > >866-540-8336 (toll free) > >510-540-8336 (phone) > > > >.. the major danger in vertical thinking is not that of being trapped > >by the obvious but of failing to realize that one may be trapped by > >the obvious. It is not a matter of avoiding vertical thinking but of > >using it and at the same time being aware that it might be > >necessary to escape from a particular way of looking at a situation. > > > >--- Edward De Bono in Lateral Thinking, Chapter 16, Analogies > > > > > > > > > >------------------------------------------------------- > >This sf.net email is sponsored by:ThinkGeek > >Welcome to geek heaven. > >http://thinkgeek.com/sf > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-08-27 01:10:13
|
Somik, The (rather meager) response to an earlier poll indicated that Eclipse was the most popular by far, followed by a few NetBeans and JBuilder users. Replacing tabs with spaces is automatic in modern editors and IDEs. For Eclipse, you need to make the settings in the Java/Editor Typing tab. You also need to make the setting in the Java/Code Formatter Style tab. For Netbeans use Tools-Options-Editing-Editor Settings-Java Editor-Java Indentation Engine-...-Expand Tabs To Spaces-True. I don't know how to do it in JBuilder, but I know it can be done. I'm working on a Java Coding Standards document. Derrick Somik Raha wrote: >Hi Folks, > >For what its worth, tabs are an incredibly useful and standard way of >formatting code. A lot of folks use Eclipse, and pressing the tab key every >so many seconds comes really naturally. It also reduces the risk of RSI >(pressing space four times as opposed to tab once). Note that the space key >is a big killer - it really hurts your thumb in the long run. A >state-of-the-art ergonomic keyboard that I am trying to adjust to takes the >space key away from the thumb. (Does this reason sound silly? Look at your >fingers, do you feel any pain in your thumbs? Or your shoulder or your neck? >Do you want to avoid surgery?) > >It would also be good to know what IDE most developers on this project use. > >I would personally prefer to maintain the tabs, and follow the Sun >Microsystems java coding standard. >http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html > >There are pieces of code where the braces are not consistent. > >I agree with Fernando - we should have an "official" coding standard that is >clearly communicated on the site. > >Finally, the coding standard is for active developers who must feel >comfortable with it. My views are secondary to the active developers, as I >have ceased to contribute beyond an occasional code-cleanup. > >Regards >Somik > > I n d u s t r i a l L o g i c , I n c . >Somik Raha >Extreme Programmer & Coach >http://industriallogic.com >http://industrialxp.org >866-540-8336 (toll free) >510-540-8336 (phone) > >.. the major danger in vertical thinking is not that of being trapped >by the obvious but of failing to realize that one may be trapped by >the obvious. It is not a matter of avoiding vertical thinking but of >using it and at the same time being aware that it might be >necessary to escape from a particular way of looking at a situation. > >--- Edward De Bono in Lateral Thinking, Chapter 16, Analogies > > > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: Somik R. <so...@ya...> - 2003-08-27 00:32:15
|
Hi Folks, For what its worth, tabs are an incredibly useful and standard way of formatting code. A lot of folks use Eclipse, and pressing the tab key every so many seconds comes really naturally. It also reduces the risk of RSI (pressing space four times as opposed to tab once). Note that the space key is a big killer - it really hurts your thumb in the long run. A state-of-the-art ergonomic keyboard that I am trying to adjust to takes the space key away from the thumb. (Does this reason sound silly? Look at your fingers, do you feel any pain in your thumbs? Or your shoulder or your neck? Do you want to avoid surgery?) It would also be good to know what IDE most developers on this project use. I would personally prefer to maintain the tabs, and follow the Sun Microsystems java coding standard. http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html There are pieces of code where the braces are not consistent. I agree with Fernando - we should have an "official" coding standard that is clearly communicated on the site. Finally, the coding standard is for active developers who must feel comfortable with it. My views are secondary to the active developers, as I have ceased to contribute beyond an occasional code-cleanup. Regards Somik I n d u s t r i a l L o g i c , I n c . Somik Raha Extreme Programmer & Coach http://industriallogic.com http://industrialxp.org 866-540-8336 (toll free) 510-540-8336 (phone) .. the major danger in vertical thinking is not that of being trapped by the obvious but of failing to realize that one may be trapped by the obvious. It is not a matter of avoiding vertical thinking but of using it and at the same time being aware that it might be necessary to escape from a particular way of looking at a situation. --- Edward De Bono in Lateral Thinking, Chapter 16, Analogies |
From: Holger S. <Hol...@xt...> - 2003-08-26 07:38:33
|
Hi, The proposed standard of an indent to 4 sounds good.=20 We at our company actually use an indent of 2. Would that be ok too? I also support Fernando in his view of more complete coding standards. How about the original one from Sun = (http://java.sun.com/docs/codeconv/)? Also: Many if not most classes are pretty well documented, but some = aren't. I would like to actively join this project again, so that might help a = lot. ;-) Cheers, Holger -----Urspr=FCngliche Nachricht----- Von: Derrick Oswald [mailto:Der...@ro...]=20 Gesendet: Dienstag, 26. August 2003 04:56 An: htm...@li... Betreff: [Htmlparser-developer] tabs I'm thinking of making a gratuitous change to nearly all the htmlparser=20 source files -- replace tabs with spaces. I've been using a tabstop of 4 and my guess is some others have been=20 using 8. This is too much in my opinion, but the point is there seems to = be too much ambiguity in the repository at the moment about whether to=20 use tabs or not and how many spaces they represent and hence how much=20 indent is applied when entering a block of code. Maybe it's my fault.=20 I've been a 4 space person ever since moving away from the old DOS text=20 screens, where it was two spaces, and only because screen real-estate=20 was so precious. So the code I've inserted must look horrendous for=20 those with an 8 spacing. How about arbitrarily dictating that no tabs are allowed, and the indent = is 4? Just set a standard and adhere to it. I know every editor in use has a 'replace tabs with spaces' option and=20 it's just a matter of some people turning that feature on. I can=20 correct the existing files in a few minutes (correctly adding the number = of spaces to get to the next tabs stop, not just globally substituting=20 spaces for tabs). I know this is a religious issue, so I'll gladly offer to convince=20 anyone my way is correct and theirs is wrong, and trump anyone's code=20 drop with one that doesn't contain tabs until they give up. Harrumph! Derrick ------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. = WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the = same time. Free trial click here:http://www.vmware.com/wl/offer/358/0 _______________________________________________ Htmlparser-developer mailing list = Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Fernando M. <fn...@ne...> - 2003-08-26 03:39:47
|
Hi, What do you think about set a complete conding standards? Not only spaces but comments, functions, for's, if's etc. My $0.02. Regards, -fmac > Message: 1 > Date: Mon, 25 Aug 2003 22:55:34 -0400 > From: Derrick Oswald <Der...@ro...> > To: htm...@li... > Subject: [Htmlparser-developer] tabs > Reply-To: htm...@li... > > > I'm thinking of making a gratuitous change to nearly all the htmlparser > source files -- replace tabs with spaces. > > I've been using a tabstop of 4 and my guess is some others have been > using 8. This is too much in my opinion, but the point is there seems to > be too much ambiguity in the repository at the moment about whether to > use tabs or not and how many spaces they represent and hence how much > indent is applied when entering a block of code. Maybe it's my fault. > I've been a 4 space person ever since moving away from the old DOS text > screens, where it was two spaces, and only because screen real-estate > was so precious. So the code I've inserted must look horrendous for > those with an 8 spacing. > > How about arbitrarily dictating that no tabs are allowed, and the indent > is 4? Just set a standard and adhere to it. > > I know every editor in use has a 'replace tabs with spaces' option and > it's just a matter of some people turning that feature on. I can > correct the existing files in a few minutes (correctly adding the number > of spaces to get to the next tabs stop, not just globally substituting > spaces for tabs). > > I know this is a religious issue, so I'll gladly offer to convince > anyone my way is correct and theirs is wrong, and trump anyone's code > drop with one that doesn't contain tabs until they give up. Harrumph! > > Derrick |
From: Derrick O. <Der...@ro...> - 2003-08-26 02:56:14
|
I'm thinking of making a gratuitous change to nearly all the htmlparser source files -- replace tabs with spaces. I've been using a tabstop of 4 and my guess is some others have been using 8. This is too much in my opinion, but the point is there seems to be too much ambiguity in the repository at the moment about whether to use tabs or not and how many spaces they represent and hence how much indent is applied when entering a block of code. Maybe it's my fault. I've been a 4 space person ever since moving away from the old DOS text screens, where it was two spaces, and only because screen real-estate was so precious. So the code I've inserted must look horrendous for those with an 8 spacing. How about arbitrarily dictating that no tabs are allowed, and the indent is 4? Just set a standard and adhere to it. I know every editor in use has a 'replace tabs with spaces' option and it's just a matter of some people turning that feature on. I can correct the existing files in a few minutes (correctly adding the number of spaces to get to the next tabs stop, not just globally substituting spaces for tabs). I know this is a religious issue, so I'll gladly offer to convince anyone my way is correct and theirs is wrong, and trump anyone's code drop with one that doesn't contain tabs until they give up. Harrumph! Derrick |
From: zheng z. <mon...@ya...> - 2003-08-22 18:13:44
|
hello everyone: I want to get a DOM tree after parsing a web page. I saw the Parser could register with registerDomScanners() ,but there is no difference with recreateReader() , does anybody know how to use it? ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca |