Re: [Htmlparser-user] IFRAME and LINK tags
Brought to you by:
derrickoswald
From: Ian M. <ian...@gm...> - 2006-02-23 15:12:19
|
This project is still alive, if under slow development. There are still are number of checkins being made fairly often, and we are possibly going to branch for a 1.6 release. The name LinkTag has indeed been taken for anchor tag, but we can't change it now due to backwards compatibility reasons. I think we might want to make LinkTag support <link> tags, and have a boolean method that says if it's an anchor or not. In fact, reading the W3C spec on this (http://www.w3.org/TR/REC-html40/struct/links.html) this seems like it might be the right thing to do. Can I get some feedback from some of the other devs on this? If it seems like a good idea to do it this way? It looks to me like it probably is the best way to do it semantically and practically. Other things that look like they should be done (devs: please shout if you don't want any of this done): - add support for the data: and view-source: protocols - deprecate setMailLink and setJavascriptLink in favour of setLink - add get/set for rel and rev attributes Ian On 23/02/06, Lu=EDs Manuel dos Santos Gomes <lui...@gm...> wrote: > Hello, > > I cannot migrate all my work to the C#/.NET platform, although HTML > parsing is a core functionality of my project. > I'm coding a crawler to feed our natural language research group with > corpus from the web. Currently I'm still evaluating options for the > HTML parsing module. I have developed my own HTML scanner based on > Java regexps, but it is too much difficult to maintain and extend > (after all, it can be a project by itself). > > My needs are far beyond the simple link extraction/modification. I > must handle every single tag that may reference an external resource > (and that includes IFrame). This includes parsing embedded CSS > imports. Embedded Javascript is still a problem... > > Anyway, the BIG question is: is this project alive? > I know it is an open source project that is supported by people free > will, and I find that _very_ _meritorious_. > I'm putting this question because I will make a decision now. > > I still would appreciate some feedback on subject of this thread (the > original post follows) > > Lu=EDs > > On Feb 15, 2006, at 4:30 PM, Third Eye wrote: > > > Hi! > > We did implement IFrameTag and named the class as IFrameTag. Our > > implementation is .Net port of this library and we have added some of > > our own enhancements. > > If you are interested, you can download it from > > > > http://www.netomatix.com > > > > Naveen > > > > On 2/15/06, Lu=EDs Manuel dos Santos Gomes <lui...@gm...> > > wrote: > >> Hi everybody. > >> > >> This is my first post to this list. > >> I'm replacing my own html processing code (regex based) with > >> HTMLParser. > >> The examples have been a great help! > >> > >> I need to handle IFRAME and LINK tags. The link tag is often used to > >> include external CSS. > >> The name "LinkTag" has already been taken for the anchor tags! How > >> should I name the class to handle the LINK tags? > >> Have anybody implemented the IframeTag and the "TrueLinkTag" classes? > >> I could do this and would be glad to contribute it to the project. > >> I'm using the version 20051112. I've not checked out from CVS because > >> I need a stable package. > >> > >> Cheers! > >> > >> Lu=EDs Gomes > >> (from Portugal) > >> > >> > >> ------------------------------------------------------- > >> This SF.net email is sponsored by: Splunk Inc. Do you grep through > >> log files > >> for problems? Stop! Download the new AJAX search engine that makes > >> searching your log files as easy as surfing the web. DOWNLOAD > >> SPLUNK! > >> http://sel.as-us.falkag.net/sel?cmdlnk&kid=103432&bid#0486&dat=121642 > >> _______________________________________________ > >> Htmlparser-user mailing list > >> Htm...@li... > >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >> > > > > > > -- > > Naveen K Kohli > > http://www.netomatix.com > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. Do you grep through > > log files > > for problems? Stop! Download the new AJAX search engine that makes > > searching your log files as easy as surfing the web. DOWNLOAD > > SPLUNK! > > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=103432&bid#0486&dat=12164= 2 > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting langua= ge > that extends applications into web and mobile media. Attend the live webc= ast > and join the prime developer group breaking into this new coding territor= y! > http://sel.as-us.falkag.net/sel?cmdlnk&kid=110944&bid$1720&dat=121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |