htmlparser-user Mailing List for HTML Parser (Page 13)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Sandeep K. G. <san...@gm...> - 2010-06-14 17:19:56
|
Thanks Derrick... I shifted to HTMLParser in Python in the mean time. So now, i make an invocation to the python code and then redirect its output back to my java file . :) Sandeep. On Mon, Jun 14, 2010 at 10:16 PM, Derrick Oswald <der...@gm...>wrote: > Add also the htmllexer.jar to your classpath. > > On Mon, Jun 14, 2010 at 8:12 AM, Sandeep Kumar Gupta < > san...@gm...> wrote: > >> Hello Everyone, >> I need to parse JSP file so I am starting with an HTML parser, but it >> seems the binary distribution that i downloaded from Sourceforge does not >> have all the classes( i put htmlparser.jar in classpath of my eclipse >> project) as mentioned in the JavaDoc. ? What is it that I am doing wrong .. >> >> And are there initial snippets that would get me started with the HTML >> parser ? >> >> -- >> Sandeep >> >> >> ------------------------------------------------------------------------------ >> ThinkGeek and WIRED's GeekDad team up for the Ultimate >> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the >> lucky parental unit. See the prize list and enter to win: >> http://p.sf.net/sfu/thinkgeek-promo >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -- Sandeep |
From: Derrick O. <der...@gm...> - 2010-06-14 16:46:31
|
Add also the htmllexer.jar to your classpath. On Mon, Jun 14, 2010 at 8:12 AM, Sandeep Kumar Gupta <san...@gm... > wrote: > Hello Everyone, > I need to parse JSP file so I am starting with an HTML parser, but it seems > the binary distribution that i downloaded from Sourceforge does not have all > the classes( i put htmlparser.jar in classpath of my eclipse project) as > mentioned in the JavaDoc. ? What is it that I am doing wrong .. > > And are there initial snippets that would get me started with the HTML > parser ? > > -- > Sandeep > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Sandeep K. G. <san...@gm...> - 2010-06-14 06:13:24
|
Hello Everyone, I need to parse JSP file so I am starting with an HTML parser, but it seems the binary distribution that i downloaded from Sourceforge does not have all the classes( i put htmlparser.jar in classpath of my eclipse project) as mentioned in the JavaDoc. ? What is it that I am doing wrong .. And are there initial snippets that would get me started with the HTML parser ? -- Sandeep |
From: Derrick O. <der...@gm...> - 2010-05-31 15:18:44
|
Is it null or empty. If it's null, it may be because that textInPage variable is local to that block. On Mon, May 31, 2010 at 12:08 PM, karanjit cheema <kar...@gm...>wrote: > hi > i tried extracting text using the following code: > > > try { > Parser parser = new Parser (urlConnection); > TextExtractingVisitor visitor = new TextExtractingVisitor(); > parser.visitAllNodesWith(visitor); > String textInPage = visitor.getExtractedText(); > } > catch (ParserException pe) > { > pe.printStackTrace (); > } > > > the field textInPage is always returning to be empty. can any one tell > what the problem is? > > > warm regards > Karanjit Cheema > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: karanjit c. <kar...@gm...> - 2010-05-31 10:17:18
|
hi i tried extracting text using the following code: try { Parser parser = new Parser (urlConnection); TextExtractingVisitor visitor = new TextExtractingVisitor(); parser.visitAllNodesWith(visitor); String textInPage = visitor.getExtractedText(); } catch (ParserException pe) { pe.printStackTrace (); } the field textInPage is always returning to be empty. can any one tell what the problem is? warm regards Karanjit Cheema |
From: Derrick O. <der...@gm...> - 2010-05-21 16:35:14
|
It should be found under HTMLParser-2.0-SNAPSHOT. On Fri, May 21, 2010 at 7:15 AM, Akihiko M <ams...@gm...> wrote: > Hi > > Htmlparser is a wonderful library. It always uses it. > By the way,I want to use htmlparser2.0 by maven2. > But it is not registered to maven2 central > repository(http://repo2.maven.org/maven2/). > > I hope that htmlparser2.0 or 2.x will regist soon. > > When it will be registed? > > Thanks, > > Akihiko > > -- > -------------------------------- > > Akihiko > ams...@gm... > > ------------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Akihiko M <ams...@gm...> - 2010-05-21 05:15:50
|
Hi Htmlparser is a wonderful library. It always uses it. By the way,I want to use htmlparser2.0 by maven2. But it is not registered to maven2 central repository(http://repo2.maven.org/maven2/). I hope that htmlparser2.0 or 2.x will regist soon. When it will be registed? Thanks, Akihiko -- -------------------------------- Akihiko ams...@gm... |
From: Akihiko M <ams...@gm...> - 2010-05-21 05:02:51
|
From: Misha K. <mis...@gm...> - 2010-05-13 14:59:56
|
Dear All: I have successfully gotten the SAX parser of HTML Parser to work with dom4j so that I can use Xpath expressions with HTMLParser. I like HTMLParser as it is quite faster than other frameworks. However, I have two issues: 1) Is it possible, in any way, to change the SAX parser to report _lowercase_ tag names? 2) For some reason, although I get a valid tree in dom4j, I cannot seem to find elements that I can see exist by browsing the tree using XPath? (for example a TD element with a certain classpath, no matter whether or not I capitalize TD). Anyone used HTMLParser with dom4j successfully? Thank you Misha |
From: Misha K. <mis...@gm...> - 2010-05-12 15:41:41
|
[java] nu.xom.XMLException: org.htmlparser.sax.XMLReader does not support the entity resolution features XOM requires. Any ideas? Has anyone tried with dom4j? I would love to have XPath support with HTMLParser, as it is quite fast. Thank you! Misha |
From: Misha K. <mis...@gm...> - 2010-05-12 09:56:27
|
Dear All: I am currently using TagSoup with XOM to get XPath support as described here: http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/ seems to work well except the following namespace problem: http://www.supermind.org/blog/613/dom4j-xpath-tagsoup-namespaces-sweet I noticed HTMLParser is, in my test, the fastest available, and has SAX Parser support: http://htmlparser.sourceforge.net/javadoc/org/htmlparser/sax/package-summary.html Has anyone used this with XOM? Any luck? Is it better/worse (i.e., slower/faster) than Tagsoup or other alternatives? Thank you Misha |
From: Pony N. <nth...@gm...> - 2010-04-25 02:25:34
|
http://secure-power.org/home/index.php -- Pony Onthusitse Nthatsi P O Box 26496 Game City GABORONE +267 3133832 +267 71467530 |
From: Kesavanarayanan, R. <Ram...@Pe...> - 2010-03-25 21:20:02
|
Hi all, I have an XML file which we parse using htmlParser in conjunction with CSRF. It seems that the following line in the XMl file         I.<span class="math">x^2 + x - 2 </span><br /> Is being displayed as junk characters and not translated into spaces. This is happening only in my LINUX box and not in my WINDOWS box. I am using FF3.6 / IE7 / IE8 etc., Let me know if I need to do anything special to ensure that the Unicode characters are displayed correctly. Regards | Ramesh Kesavanarayanan | 319-354-9200 ext 215785 / 215972 (O) | / 319-621-7641 (M) | ram...@pe... |
From: Derrick O. <der...@gm...> - 2010-03-23 22:04:31
|
Also include the lexer.jar. On 3/23/10, Gazihan Işıldak <gaz...@gm...> wrote: > hi, > > i'm developing a rest web service in java. > > i'm using htmlparser library on it. > > but when i try to run service i'm getting this exception. i can build it > successfully. and org.htmlparser.beans.StringBean class exists in project. > >> exception >> >> javax.servlet.ServletException: java.lang.RuntimeException: WEB9033: >> Unable to load class with name [org.htmlparser.beans.StringBean], >> reason:java >> .lang.NoClassDefFoundError: org/htmlparser/visitors/NodeVisitor >> >> root cause >> >> java.lang.RuntimeException: WEB9033: Unable to load class with name [org. >> htmlparser.beans.StringBean], reason: java.lang.NoClassDefFoundError: org/ >> htmlparser/visitors/NodeVisitor >> >> root cause >> >> java.lang.NoClassDefFoundError: org/htmlparser/visitors/NodeVisitor >> >> root cause >> >> java.lang.ClassNotFoundException: org.htmlparser.visitors.NodeVisitor >> > > i checked that htmlparser.jar exists in server. > > what should i do to achieve this? > |
From: Gazihan I. <gaz...@gm...> - 2010-03-23 21:51:13
|
hi, i'm developing a rest web service in java. i'm using htmlparser library on it. but when i try to run service i'm getting this exception. i can build it successfully. and org.htmlparser.beans.StringBean class exists in project. > exception > > javax.servlet.ServletException: java.lang.RuntimeException: WEB9033: > Unable to load class with name [org.htmlparser.beans.StringBean], reason:java > .lang.NoClassDefFoundError: org/htmlparser/visitors/NodeVisitor > > root cause > > java.lang.RuntimeException: WEB9033: Unable to load class with name [org. > htmlparser.beans.StringBean], reason: java.lang.NoClassDefFoundError: org/ > htmlparser/visitors/NodeVisitor > > root cause > > java.lang.NoClassDefFoundError: org/htmlparser/visitors/NodeVisitor > > root cause > > java.lang.ClassNotFoundException: org.htmlparser.visitors.NodeVisitor > i checked that htmlparser.jar exists in server. what should i do to achieve this? |
From: Derrick O. <der...@gm...> - 2010-02-18 18:13:35
|
Not in the node list. But there is a Tag getEndTag (); method to get it. On Thu, Feb 18, 2010 at 11:08 AM, Rajorshi Biswas <raj...@in...> wrote: > Hi, > For 'known' tags, it seems that HTMLParser does not visit the 'end' tags > (e.g. "P" tag). But for tags that arent directly supported, such as "STRONG" > tag, the parser does return the end tag in the nodelist. > > Is there a way to ask the parser to retrieve the "end" tags for known > classes as well? > > Thanks, > Raj > > > Dear *htmlparser-user !* Get Yourself a cool, short *@in.com* Email ID > now!<http://mail.in.com/mails/new_reg.php?utm_source=invite&utm_medium=outgoing> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Rajorshi B. <raj...@in...> - 2010-02-18 10:08:48
|
Hi, For 'known' tags, it seems that HTMLParser does not visit the 'end' tags (e.g. "P" tag). But for tags that arent directly supported, such as "STRONG" tag, the parser does return the end tag in the nodelist.Is there a way to ask the parser to retrieve the "end" tags for known classes as well?Thanks,RajDear htmlparseruser ! Get Yourself a cool, short @in.com Email ID now! |
From: Rajorshi B. <raj...@in...> - 2010-02-18 06:01:49
|
Thanks much! TextNode is something I missed out on. (I didnt realize text inside a node was modeled as a Node silly me) Original message From:Derrick Oswald< der...@gm... >Date: 18 Feb 10 11:04:51Subject:Re: [Htmlparseruser] query on how to read "data" for a particular TagNodeTo: Rajorshi Biswas , htmlparser user list If you have the div tag, then since it is a composite node, " foo " will be the first child:divtag.getChildren ()[0]On Thu, Feb 18, 2010 at 5:16 AM, Rajorshi Biswaswrote: Hello, I am new to htmlparser, so please forgive me if this is a naive question. I have an HTML fragment for which I need to determine if the first visible text is in bold or not.For this, I am trying to get the 'first' text content of the fragment. Suppose the fragment is of the following form: foo something something else My question is: how do I get the "data" portion of the 'div'. That is, when I arrive at the "div" node (Div object), I wish to retrieve the content of the div WITHOUT its children elements I wish to retrieve "foo" in this case.I could not find an API in the Node/TagNode classes for this. Could anyone please help me out here?Thanks in advance!Raj Dear htmlparseruser! Get Yourself a cool, short @in.com Email ID now! Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and finetune appli cations for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intelswdev Htmlparseruser mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparseruser |
From: Derrick O. <der...@gm...> - 2010-02-18 05:34:59
|
If you have the div tag, then since it is a composite node, "\nfoo\n" will be the first child: divtag.getChildren ()[0] On Thu, Feb 18, 2010 at 5:16 AM, Rajorshi Biswas <raj...@in...> wrote: > Hello, > I am new to htmlparser, so please forgive me if this is a naive question. I > have an HTML fragment for which I need to determine if the first visible > text is in bold or not. > > For this, I am trying to get the 'first' text content of the fragment. > Suppose the fragment is of the following form: > > <div> > foo > <p>something</p> > <span>something else</span> > </div> > > My question is: how do I get the "data" portion of the 'div'. That is, when > I arrive at the "div" node (Div object), I wish to retrieve the content of > the div WITHOUT its children elements - I wish to retrieve "foo" in this > case. > > I could not find an API in the Node/TagNode classes for this. Could anyone > please help me out here? > > > Thanks in advance! > Raj > > > Dear *htmlparser-user!* Get Yourself a cool, short *@in.com* Email ID now!<http://mail.in.com/mails/new_reg.php?utm_source=invite&utm_medium=outgoin+g> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Rajorshi B. <raj...@in...> - 2010-02-18 04:17:11
|
Hello, I am new to htmlparser, so please forgive me if this is a naive question. I have an HTML fragment for which I need to determine if the first visible text is in bold or not. For this, I am trying to get the 'first' text content of the fragment. Suppose the fragment is of the following form: foo something something elseMy question is: how do I get the "data" portion of the 'div'. That is, when I arrive at the "div" node (Div object), I wish to retrieve the content of the div WITHOUT its children elements I wish to retrieve "foo" in this case. I could not find an API in the Node/TagNode classes for this. Could anyone please help me out here?Thanks in advance!RajDear htmlparseruser! Get Yourself a cool, short @in.com Email ID now! |
From: Derrick O. <der...@gm...> - 2010-02-17 20:47:35
|
Hava a look at the site capturer code. It basically does what you want I think. 2010/2/17 Wagner Montalvão Camarão <wag...@gm...> > Hello everyone, > > I'm new here and I'd like to get some directions about the htmlparser > usage. I need to go through a html code and update each link (href content) > with a new one. Like: <a href="www.google.com"> will become <a href=" > www.mysite.com/click?link=www.google.com"> > > First I tried using the javax.xml.parsers and org.w3c.dom but the problem > is I get an exception if I try to parse an invalid xhtml by w3c. I can't > work like this because some users may post html codes generated by designer > tools which formatting may not be valid by w3c. > > I could write some regex to do this but I heard about htmlparser and I > would like to know if it can help me with this. > > Any suggestion will be appreciated. > > Thank you > > Wagner Montalvão Camarão > > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Wagner M. C. <wag...@gm...> - 2010-02-17 19:35:04
|
Hello everyone, I'm new here and I'd like to get some directions about the htmlparser usage. I need to go through a html code and update each link (href content) with a new one. Like: <a href="www.google.com"> will become <a href=" www.mysite.com/click?link=www.google.com"> First I tried using the javax.xml.parsers and org.w3c.dom but the problem is I get an exception if I try to parse an invalid xhtml by w3c. I can't work like this because some users may post html codes generated by designer tools which formatting may not be valid by w3c. I could write some regex to do this but I heard about htmlparser and I would like to know if it can help me with this. Any suggestion will be appreciated. Thank you Wagner Montalvão Camarão |
From: Joshua K. <jo...@in...> - 2010-02-12 23:09:55
|
On Sat, Dec 12, 2009 at 10:02 AM, Derrick Oswald <der...@gm...>wrote: > This has been replaced by the main program in > org.htmlparser.beans.StringBean. > I never did get that name - StringBean. It extracts strings but it isn't called a StringExtractor. Hmmmm... best jk |
From: David P. C. <dav...@gm...> - 2010-02-12 22:00:22
|
Great, thanks for the answer! (I just saw it now) The library seems great! One question, it seems that it does not handle the div elements correctly. A div element is a block element<http://www.webdesignfromscratch.com/html-css/css-block-and-inline.php> (by default), and thus it should render a new line. For example, with this html file: ++++++++++++++++++ <html> <body> test1 test2 <div>test3</div> test4 <span>test5</span> <span>test6</span> </body> </html> ++++++++++++++++++ if should produce: ++++++++++++++++++ test1 test2 test3 test4 test5 test6 ++++++++++++++++++ note the new line between test3 and test4. However, StringBean produces the following: ++++++++++++++++++ test1 test2 test3 test4 test5 test6 ++++++++++++++++++ It handles correctly the new lines for text and span nodes, but not for divs. Is that the intended effect? if so, is it possible to override this (add a new line for block elements)? Regards, David Portabella On Sat, Dec 12, 2009 at 10:02 AM, Derrick Oswald <der...@gm...>wrote: > This has been replaced by the main program in > org.htmlparser.beans.StringBean. > > Sorry for the misdirection > > On Wed, Dec 9, 2009 at 11:18 PM, David Portabella Clotet < > dav...@gm...> wrote: > >> Hello, >> >> In the website: http://htmlparser.sourceforge.net/samples.html >> there is info about the "StringExtractor" example: >> ++++++++++++++++++ >> String Extractor >> Extract text from a web page. >> org.htmlparser.parserapplications.StringExtractor >> bin/stringextractor http://website_url >> ++++++++++++++++++ >> >> However, I did not find this example in any of this two downloads: >> HTMLParser-2.0-SNAPSHOT-src.zip >> HTMLParser-2.0-SNAPSHOT-bin.zip >> >> Can you please tell me where to find the StringExtractor example? >> >> >> Best regards, >> DAvid Portabella >> >> |
From: Derrick O. <der...@gm...> - 2010-02-10 06:03:33
|
Silly question, but did you change mIds to be "B" ? It has to be upper case. On Wed, Feb 10, 2010 at 12:55 AM, Ben Rose <be...@in...> wrote: > Greetings, > > I would like to use bold tags ("b") with org.htmlparser, but have been > unable to get them to work as expected. > > I added and registered a BoldTag class that is an exact copy of > org.htmlparser.tags.ParagraphTag (with the "B" added to mEnders) but > have not been able to make it work as expected. > > When looking at the children of the body node from the following html > string, there are 3 children (<b>, <a>, </b>) instead of the single <b> > as I would expect. > > "<html><body><b><a href='test.com'>Test</a></b></body></html>" > > Am I missing something here? > > -Ben Rose > > > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |