Re: [Htmlparser-user] IFRAME and LINK tags
Brought to you by:
derrickoswald
From: <lui...@gm...> - 2006-02-23 03:03:32
|
Hello, I cannot migrate all my work to the C#/.NET platform, although HTML =20 parsing is a core functionality of my project. I'm coding a crawler to feed our natural language research group with =20= corpus from the web. Currently I'm still evaluating options for the =20 HTML parsing module. I have developed my own HTML scanner based on =20 Java regexps, but it is too much difficult to maintain and extend =20 (after all, it can be a project by itself). My needs are far beyond the simple link extraction/modification. I =20 must handle every single tag that may reference an external resource =20 (and that includes IFrame). This includes parsing embedded CSS =20 imports. Embedded Javascript is still a problem... Anyway, the BIG question is: is this project alive? I know it is an open source project that is supported by people free =20 will, and I find that _very_ _meritorious_. I'm putting this question because I will make a decision now. I still would appreciate some feedback on subject of this thread (the =20= original post follows) Lu=EDs On Feb 15, 2006, at 4:30 PM, Third Eye wrote: > Hi! > We did implement IFrameTag and named the class as IFrameTag. Our > implementation is .Net port of this library and we have added some of > our own enhancements. > If you are interested, you can download it from > > http://www.netomatix.com > > Naveen > > On 2/15/06, Lu=EDs Manuel dos Santos Gomes <lui...@gm...> =20 > wrote: >> Hi everybody. >> >> This is my first post to this list. >> I'm replacing my own html processing code (regex based) with =20 >> HTMLParser. >> The examples have been a great help! >> >> I need to handle IFRAME and LINK tags. The link tag is often used to >> include external CSS. >> The name "LinkTag" has already been taken for the anchor tags! How >> should I name the class to handle the LINK tags? >> Have anybody implemented the IframeTag and the "TrueLinkTag" classes? >> I could do this and would be glad to contribute it to the project. >> I'm using the version 20051112. I've not checked out from CVS because >> I need a stable package. >> >> Cheers! >> >> Lu=EDs Gomes >> (from Portugal) >> >> >> ------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. Do you grep through =20= >> log files >> for problems? Stop! Download the new AJAX search engine that makes >> searching your log files as easy as surfing the web. DOWNLOAD =20 >> SPLUNK! >> http://sel.as-us.falkag.net/sel?cmdlnk&kid=103432&bid#0486&dat=121642 >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> > > > -- > Naveen K Kohli > http://www.netomatix.com > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through =20 > log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD =20 > SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=103432&bid#0486&dat=121642= > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |