htmlparser-user Mailing List for HTML Parser (Page 42)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

This project is still alive, if under slow development. There are
still are number of checkins being made fairly often, and we are
possibly going to branch for a 1.6 release.

The name LinkTag has indeed been taken for anchor tag, but we can't
change it now due to backwards compatibility reasons.

I think we might want to make LinkTag support <link> tags, and have a
boolean method that says if it's an anchor or not. In fact, reading
the W3C spec on this
(http://www.w3.org/TR/REC-html40/struct/links.html) this seems like it
might be the right thing to do.

Can I get some feedback from some of the other devs on this? If it
seems like a good idea to do it this way? It looks to me like it
probably is the best way to do it semantically and practically.

Other things that look like they should be done (devs: please shout if
you don't want any of this done):

- add support for the data: and view-source: protocols
- deprecate setMailLink and setJavascriptLink in favour of setLink
- add get/set for rel and rev attributes

Ian

On 23/02/06, Lu=EDs Manuel dos Santos Gomes <lui...@gm...> wrote:
> Hello,
>
> I cannot migrate all my work to the C#/.NET platform, although HTML
> parsing is a core functionality of my project.
> I'm coding a crawler to feed our natural language research group with
> corpus from the web. Currently I'm still evaluating options for the
> HTML parsing module. I have developed my own HTML scanner based on
> Java regexps, but it is too much difficult to maintain and extend
> (after all, it can be a project by itself).
>
> My needs are far beyond the simple link extraction/modification. I
> must handle every single tag that may reference an external resource
> (and that includes IFrame). This includes parsing embedded CSS
> imports. Embedded Javascript is still a problem...
>
> Anyway, the BIG question is: is this project alive?
> I know it is an open source project that is supported by people free
> will, and I find that _very_ _meritorious_.
> I'm putting this question because I will make a decision now.
>
> I still would appreciate some feedback on subject of this thread (the
> original post follows)
>
> Lu=EDs
>
> On Feb 15, 2006, at 4:30 PM, Third Eye wrote:
>
> > Hi!
> > We did implement IFrameTag and named the class as IFrameTag. Our
> > implementation is .Net port of this library and we have added some of
> > our own enhancements.
> > If you are interested, you can download it from
> >
> > http://www.netomatix.com
> >
> > Naveen
> >
> > On 2/15/06, Lu=EDs Manuel dos Santos Gomes <lui...@gm...>
> > wrote:
> >> Hi everybody.
> >>
> >> This is my first post to this list.
> >> I'm replacing my own html processing code (regex based) with
> >> HTMLParser.
> >> The examples have been a great help!
> >>
> >> I need to handle IFRAME and LINK tags. The link tag is often used to
> >> include external CSS.
> >> The name "LinkTag" has already been taken for the anchor tags! How
> >> should I name the class to handle the LINK tags?
> >> Have anybody implemented the IframeTag and the "TrueLinkTag" classes?
> >> I could do this and would be glad to contribute it to the project.
> >> I'm using the version 20051112. I've not checked out from CVS because
> >> I need a stable package.
> >>
> >> Cheers!
> >>
> >> Lu=EDs Gomes
> >> (from Portugal)
> >>
> >>
> >> -------------------------------------------------------
> >> This SF.net email is sponsored by: Splunk Inc. Do you grep through
> >> log files
> >> for problems?  Stop!  Download the new AJAX search engine that makes
> >> searching your log files as easy as surfing the  web.  DOWNLOAD
> >> SPLUNK!
> >> http://sel.as-us.falkag.net/sel?cmdlnk&kid=103432&bid#0486&dat=121642
> >> _______________________________________________
> >> Htmlparser-user mailing list
> >> Htm...@li...
> >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >>
> >
> >
> > --
> > Naveen K Kohli
> > http://www.netomatix.com
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc. Do you grep through
> > log files
> > for problems?  Stop!  Download the new AJAX search engine that makes
> > searching your log files as easy as surfing the  web.  DOWNLOAD
> > SPLUNK!
> > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=103432&bid#0486&dat=12164=
2
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting langua=
ge
> that extends applications into web and mobile media. Attend the live webc=
ast
> and join the prime developer group breaking into this new coding territor=
y!
> http://sel.as-us.falkag.net/sel?cmdlnk&kid=110944&bid$1720&dat=121642
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct (1)	Nov	Dec

htmlparser-user Mailing List for HTML Parser (Page 42)

htmlparser-user — The user mailing list for users of the htmlparser library