htmlparser-user Mailing List for HTML Parser (Page 35)
Brought to you by:
derrickoswald
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
| 2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
| 2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
| 2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
| 2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
| 2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
| 2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
| 2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
| 2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
| 2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
| 2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
| 2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
| 2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
| 2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
| 2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
| 2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
| 2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
|
From: Eugeny N D. <bo...@re...> - 2006-07-28 21:30:11
|
Hello, I'm trying to parse page http://www.vu.lt/lt/naujienos/337/ but HtmlParser fails with this error: ERROR org.htmlparser.util.EncodingChangeException: character mismatch (new: ? [0x2013] != old: [0xe2?]) for encoding change from ISO-8859-1 to UTF-8 at character offset 218 [junit] org.htmlparser.util.EncodingChangeException: character mismatch (new: ? [0x2013] != old: [0xe2?]) for encoding change from ISO-8859-1 to UTF-8 at character offset 218 [junit] at org.htmlparser.lexer.InputStreamSource.setEncoding(InputStreamSource.java:280) [junit] at org.htmlparser.lexer.Page.setEncoding(Page.java:865) [junit] at org.htmlparser.tags.MetaTag.doSemanticAction(MetaTag.java:150) [junit] at org.htmlparser.scanners.TagScanner.scan(TagScanner.java:69) [junit] at org.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner.java:160) [junit] at org.htmlparser.util.IteratorImpl.nextNode(IteratorImpl.java:92) [junit] at org.htmlparser.Parser.extractAllNodesThatMatch(Parser.java:768) at this line: Lexer lexer = new Lexer(new Page(document, encoding)); Parser parser = new Parser(lexer); ---->NodeList list = parser.extractAllNodesThatMatch(new InterestedTagsFilter());<---- I don't know the document encoding initially, and thus it's null. Could somebody please advice? -- Eugene N Dzhurinsky |
|
From: Eugeny N D. <bo...@re...> - 2006-07-28 21:24:31
|
Hello! I'm trying to parse this page and extract all links there: http://www.vu.lt/lt/naujienos/337/ for some reason the link to PDF file looks like: http://www.vu.lt/site_files/InfS/Naujienos/istorik??%20dienos.pdf which is wrong. It seems like some wrong charset was used? Here is part of my code which does the parsing: public LinkedList parseDocument(InputStream document, String encoding) { try { Lexer lexer = new Lexer(new Page(document, encoding)); String href; try { lexer.reset(); if (banner != null) validateBanner(lexer); lexer.reset(); Parser parser = new Parser(lexer); NodeList list = null; try { list = parser .extractAllNodesThatMatch(new InterestedTagsFilter()); } catch (EncodingChangeException e) { log.warn(e); lexer.reset(); lexer.getPage().setEncoding(parser.getEncoding()); list = parser .extractAllNodesThatMatch(new InterestedTagsFilter()); } for (SimpleNodeIterator it = list.elements(); it.hasMoreNodes();) { TagNode node = (TagNode) it.nextNode(); href = null; if (LinkTag.class.equals(node.getClass()) && validateLink((LinkTag) node)) { href = ((LinkTag) node).getLink(); } else if (ImageTag.class.equals(node.getClass()) || FrameTag.class.equals(node.getClass())) { href = node.getAttribute("src"); } else if (TitleTag.class.equals(node.getClass())) { title = ((TitleTag) node).getTitle(); } else if (BaseHrefTag.class.equals(node.getClass())) { try { baseTag = getBaseURL(new URI(((BaseHrefTag) node) .getBaseUrl(), false)); } catch (URIException e2) { } } else if (MetaTag.class.equals(node.getClass()) && "refresh".equalsIgnoreCase(((MetaTag) node) .getHttpEquiv())) { String URL = ((MetaTag) node).getMetaContent(); if (URL != null && URL.length() > 0) { String arr[] = URL.split("URL="); if (arr != null && arr.length == 2) href = arr[1]; } } if (href != null && href.length() > 0) { if (log.isDebugEnabled()) -------> log.debug(href); <----------- results.add(getURL(StringEscapeUtils .unescapeHtml(getEscapedURL(href.trim())))); } } this.encoding = parser.getEncoding(); if (log.isDebugEnabled()) log.debug(this.encoding); } catch (ParserException e1) { log.error(e1, e1); } } catch (UnsupportedEncodingException e) { log.error(e, e); } return results; } And on marked line application logs /site_files/InfS/Naujienos/istorik??%20dienos.pdf what could be wrong there? -- Eugene N Dzhurinsky |
|
From: Xue-Feng Y. <jus...@ya...> - 2006-07-28 21:14:58
|
I am trying to modify for the TextNodes in a lexer by TextNode.setText(String). Then I tried to print the lexer by Page toPage=lexer.getPage(); String toString=toPage.getText(); System.out.println(toString); The page was unchanged. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: kavorka <the...@gm...> - 2006-07-28 20:53:56
|
Hi Oswald,
I have another question. In HTMLPARSER, is it possible to extract only the
text in the webpage. In the stringextractor program, it extract also link
text in the page, i want to extract "pure" text. can i do it?
thanks
Murat
On 7/25/06, kavorka <the...@gm...> wrote:
>
> Hi Oswald,
>
> Thanks a lot for your help.
>
> Murat
>
>
> On 7/24/06, Derrick Oswald <Der...@ro...> wrote:
> >
> > Kavorka,
> >
> > This should give you the meta tag, from which you can get the
> > information you want:
> >
> > NodeList nodes = parser.parse (null);
> > NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter
> > ("META"));
> > MetaTag meta = (MetaTag)metas.elementAt (0);
> > System.out.println (meta);
> >
> > Derrick
> >
> > kavorka wrote:
> >
> > > Hi all,
> > > I'm new to HTML-parser. I used sample programs to understand how can i
> > > find the meta data of the page but i could't use it. Do you have any
> > > code samples that finds meta data of the page using HTMLparser.
> > > Thank you
> > > best regards
> > >
> >
> >
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share
> > your
> > opinions on IT & business topics through brief surveys -- and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
>
>
|
|
From: Xue-Feng Y. <jus...@ya...> - 2006-07-28 19:43:07
|
I am trying to modify for the TextNodes in a lexer by TextNode.setText(String). Then I tried to print the lexer by Page toPage=lexer.getPage(); String toString=toPage.getText(); System.out.println(toString); The page was unchanged. Does any one have idea how to modify a lexer or simply a html page? Thanks, Xue-Feng __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: kavorka <the...@gm...> - 2006-07-25 08:49:52
|
Hi Oswald,
Thanks a lot for your help.
Murat
On 7/24/06, Derrick Oswald <Der...@ro...> wrote:
>
> Kavorka,
>
> This should give you the meta tag, from which you can get the
> information you want:
>
> NodeList nodes = parser.parse (null);
> NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter
> ("META"));
> MetaTag meta = (MetaTag)metas.elementAt (0);
> System.out.println (meta);
>
> Derrick
>
> kavorka wrote:
>
> > Hi all,
> > I'm new to HTML-parser. I used sample programs to understand how can i
> > find the meta data of the page but i could't use it. Do you have any
> > code samples that finds meta data of the page using HTMLparser.
> > Thank you
> > best regards
> >
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
|
|
From: Derrick O. <Der...@Ro...> - 2006-07-24 03:25:32
|
Kavorka,
This should give you the meta tag, from which you can get the
information you want:
NodeList nodes = parser.parse (null);
NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter
("META"));
MetaTag meta = (MetaTag)metas.elementAt (0);
System.out.println (meta);
Derrick
kavorka wrote:
> Hi all,
> I'm new to HTML-parser. I used sample programs to understand how can i
> find the meta data of the page but i could't use it. Do you have any
> code samples that finds meta data of the page using HTMLparser.
> Thank you
> best regards
>
|
|
From: Derrick O. <Der...@Ro...> - 2006-07-24 03:20:33
|
Eigeny, In general, you probably want to look at the filter package. Try running the filterbuilder application (startup script is in the bin directory) and read the help and tutorial. Using this application you can create a Java program that selects only the 'sometext' you want. Derrick Eugeny N Dzhurinsky wrote: >Hello! >I need to search for HTML code in a page, for instance the code to search >looks like this: > ><div class="someclass"><a href="somelocation" ><img src="image/here" >border="0"></a></div><span style="style2">sometext</span> > >This code could be placed as single line or formatted somehow, containing one >or more linebreaks. > >I need also to track situation while this code is commented out, or placed >outside <body> section. > >For now I created a Lexer instance for document and for this code, comparing >them token by token, but may be there is some better way? > > > |
|
From: Ian M. <ian...@gm...> - 2006-07-19 15:23:22
|
HTMLParser is usually capable of parsing just an HTML fragment.
Parser.setInputHTML("html") and then Parser.parse(null).
Ian
On 7/14/06, Dennis Gesker <ge...@al...> wrote:
> Since it was just a string I added html and body tags and it seems I'm
> on my way.
>
> str = "<head><body> + str + "</head><;body>";
>
> --Dennis
>
> Dennis Gesker wrote:
> > I would like to parse a portion of html that I have in a buffer
> > (String), that is to say not a complete page. The string contains an
> > html table only.
> >
> > Could someone point to or provide some sample code for how to parse just
> > a fragment of html?
> >
> > Dennis
> >
> >
> >
>
> --
> Dennis R. Gesker
> email: de...@al...
> Key Id: 0xEFA10A51
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
|
|
From: kavorka <the...@gm...> - 2006-07-19 11:44:23
|
Hi all, I'm new to HTML-parser. I used sample programs to understand how can i find the meta data of the page but i could't use it. Do you have any code samples that finds meta data of the page using HTMLparser. Thank you best regards |
|
From: Eugeny N D. <bo...@re...> - 2006-07-17 07:30:43
|
Hello! I need to search for HTML code in a page, for instance the code to search looks like this: <div class="someclass"><a href="somelocation" ><img src="image/here" border="0"></a></div><span style="style2">sometext</span> This code could be placed as single line or formatted somehow, containing one or more linebreaks. I need also to track situation while this code is commented out, or placed outside <body> section. For now I created a Lexer instance for document and for this code, comparing them token by token, but may be there is some better way? -- Eugene N Dzhurinsky |
|
From: Dennis G. <ge...@al...> - 2006-07-14 20:15:09
|
Since it was just a string I added html and body tags and it seems I'm on my way. str = "<head><body> + str + "</head><;body>"; --Dennis Dennis Gesker wrote: > I would like to parse a portion of html that I have in a buffer > (String), that is to say not a complete page. The string contains an > html table only. > > Could someone point to or provide some sample code for how to parse just > a fragment of html? > > Dennis > > > -- Dennis R. Gesker email: de...@al... Key Id: 0xEFA10A51 |
|
From: Dennis G. <ge...@al...> - 2006-07-14 20:07:47
|
I would like to parse a portion of html that I have in a buffer (String), that is to say not a complete page. The string contains an html table only. Could someone point to or provide some sample code for how to parse just a fragment of html? Dennis -- Dennis R. Gesker email: de...@al... Key Id: 0xEFA10A51 |
|
From: Derrick O. <Der...@Ro...> - 2006-07-01 23:34:18
|
This should give you the "Content":
NodeList nodes = parser.parse (null);
NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter
("META"));
System.out.println (metas.elementAt (0).getMetaContent ());
vasantha reddy wrote:
> Hi,
>
> I am using HTML parser in my project.The HTML
> parser doesn't give the contents of meta tag as its output.I need the
> content of the meta tag.Is there any method that I can use to get the
> content of a particular tag by giving the tag name as input?
>
> Thank you,
> Regards,
> Vasantha
>
>
|
|
From: vasantha r. <hi_...@ya...> - 2006-06-29 09:32:50
|
Hi,
I am using HTML parser in my project.The HTML parser doesn't give the contents of meta tag as its output.I need the content of the meta tag.Is there any method that I can use to get the content of a particular tag by giving the tag name as input?
Thank you,
Regards,
Vasantha
---------------------------------
Yahoo! India Answers: Share what you know. Learn something new Click here
Catch all the FIFA World Cup 2006 action on Yahoo! India Click here |
|
From: Derrick O. <Der...@Ro...> - 2006-06-28 16:42:34
|
Using the NotFilter directly like that probably won't help you. As written it would return a nodelist of top level nodes (no Remarks of course), but these would include Remarks as children. What you probably want to do is override RemarkNode to return nothing from toHtml() and set this as the default node for Remarks on the PrototypicalNodeFactory via setRemarkPrototype(). Then when you issue the toHtml() on the NodeList returned from a straight parse, the contents will have the Remarks removed. Mark Stark wrote: >hello, > >can anyone explain how to usw a NodeFilter? > >NodeList l = p.parse(new NotFilter()); > >i would like to parse all nodes/tags but RemarkTags/RemarkNodes > >thanks a lot > > > > > |
|
From: Mark S. <htm...@ey...> - 2006-06-28 09:42:19
|
hello, can anyone explain how to usw a NodeFilter? NodeList l = p.parse(new NotFilter()); i would like to parse all nodes/tags but RemarkTags/RemarkNodes thanks a lot |
|
From: Mark S. <htm...@ey...> - 2006-06-27 13:45:57
|
Thanks, it works Derrick Oswald schrieb: > I think you need something like this... > > Parser parser = new Parser (); > parser.setInputHtml (html); > NodeList list = parser.parse (null); > list.visitAllNodesWith (new SegmentReplacingVisitor ()); > System.out.println (list.toHtml ()); > > ...assuming the SegmentReplacingVisitor does the right stuff to the > right nodes. > > Mark Stark wrote: > >> I dont understand. I've got a html file, manipulate some tags with my >> visitor, perhaps change an attribute value. and then? serialize the >> toHtml() output into a file and thats it? >> >> Ian Macfarlane schrieb: >> >> >>> If you do Parser.parse() you get a NodeList, and then if you call the >>> toHtml() method it will return a String of the reparsed code. >>> >>> Ian >>> >>> On 6/26/06, Mark Stark <htm...@ey...> wrote: >>> >>> >>>> hi, >>>> >>>> is it possible to change certain stirngs in a html file, without >>>> "rebuilding" it? I would like to change some string in a html file, like >>>> a dictionary lookup, but i dont want to rebuild the html file like its >>>> done in URLModifyingVisitor. >>>> >>>> Can anybody give me a hint? I've still got >>>> >>>> Parser p = new Parser(); >>>> p.setInputHTML(html); >>>> SegmentReplacingVisitor s = new SegmentReplacingVisitor(); >>>> p.visitAllNodesWith(s); >>>> >>>> and do some text.setText() within the SegmentReplacingVisitor each time >>>> i visit a StringNode. How to save it into a translated.htm file without >>>> rebuilding it? >>>> >>>> thanks a lot >>>> >>>> >>>> Using Tomcat but need to do more? Need to support web services, security? >>>> Get stuff done quickly with pre-integrated technology to make your job easier >>>> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>> _______________________________________________ >>>> Htmlparser-user mailing list >>>> Htm...@li... >>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>>> >>>> >>>> >>> Using Tomcat but need to do more? Need to support web services, security? >>> Get stuff done quickly with pre-integrated technology to make your job easier >>> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>> _______________________________________________ >>> Htmlparser-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >>> >>> >> >> >> Using Tomcat but need to do more? Need to support web services, security? >> Get stuff done quickly with pre-integrated technology to make your job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> > > > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
|
From: Derrick O. <Der...@Ro...> - 2006-06-26 22:14:59
|
I think you need something like this... Parser parser = new Parser (); parser.setInputHtml (html); NodeList list = parser.parse (null); list.visitAllNodesWith (new SegmentReplacingVisitor ()); System.out.println (list.toHtml ()); ...assuming the SegmentReplacingVisitor does the right stuff to the right nodes. Mark Stark wrote: >I dont understand. I've got a html file, manipulate some tags with my >visitor, perhaps change an attribute value. and then? serialize the >toHtml() output into a file and thats it? > >Ian Macfarlane schrieb: > > >>If you do Parser.parse() you get a NodeList, and then if you call the >>toHtml() method it will return a String of the reparsed code. >> >>Ian >> >>On 6/26/06, Mark Stark <htm...@ey...> wrote: >> >> >>>hi, >>> >>>is it possible to change certain stirngs in a html file, without >>>"rebuilding" it? I would like to change some string in a html file, like >>>a dictionary lookup, but i dont want to rebuild the html file like its >>>done in URLModifyingVisitor. >>> >>>Can anybody give me a hint? I've still got >>> >>>Parser p = new Parser(); >>>p.setInputHTML(html); >>>SegmentReplacingVisitor s = new SegmentReplacingVisitor(); >>>p.visitAllNodesWith(s); >>> >>>and do some text.setText() within the SegmentReplacingVisitor each time >>>i visit a StringNode. How to save it into a translated.htm file without >>>rebuilding it? >>> >>>thanks a lot >>> >>> >>>Using Tomcat but need to do more? Need to support web services, security? >>>Get stuff done quickly with pre-integrated technology to make your job easier >>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>_______________________________________________ >>>Htmlparser-user mailing list >>>Htm...@li... >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >>> >>Using Tomcat but need to do more? Need to support web services, security? >>Get stuff done quickly with pre-integrated technology to make your job easier >>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>_______________________________________________ >>Htmlparser-user mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> >> > > > >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > |
|
From: Mark S. <htm...@ey...> - 2006-06-26 18:35:47
|
I tried to use it this way:
Parser p = new Parser();
p.setInputHTML(normalHTML);
System.out.println(">>>>>>> "+p.parse(null).toHtml());
TestVisitor s = new TestVisitor();
p.parse(null).visitAllNodesWith(s);
System.out.println("<<<<<<<" +p.parse(null).toHtml());
In TestVisitor
public void visitStringNode (Text string)
{
string.setText("XXX");
}
The first System.out prints the original html, second System.out prints
out nothing but the "<<<<". Have you any suggestions? :)
Thanks a lot
Ian Macfarlane schrieb:
> If you do Parser.parse() you get a NodeList, and then if you call the
> toHtml() method it will return a String of the reparsed code.
>
> Ian
>
> On 6/26/06, Mark Stark <htm...@ey...> wrote:
>> hi,
>>
>> is it possible to change certain stirngs in a html file, without
>> "rebuilding" it? I would like to change some string in a html file, like
>> a dictionary lookup, but i dont want to rebuild the html file like its
>> done in URLModifyingVisitor.
>>
>> Can anybody give me a hint? I've still got
>>
>> Parser p = new Parser();
>> p.setInputHTML(html);
>> SegmentReplacingVisitor s = new SegmentReplacingVisitor();
>> p.visitAllNodesWith(s);
>>
>> and do some text.setText() within the SegmentReplacingVisitor each time
>> i visit a StringNode. How to save it into a translated.htm file without
>> rebuilding it?
>>
>> thanks a lot
>>
>>
>> Using Tomcat but need to do more? Need to support web services, security?
>> Get stuff done quickly with pre-integrated technology to make your job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>> _______________________________________________
>> Htmlparser-user mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>
>
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|
|
From: Mark S. <htm...@ey...> - 2006-06-26 16:54:30
|
I dont understand. I've got a html file, manipulate some tags with my visitor, perhaps change an attribute value. and then? serialize the toHtml() output into a file and thats it? Ian Macfarlane schrieb: > If you do Parser.parse() you get a NodeList, and then if you call the > toHtml() method it will return a String of the reparsed code. > > Ian > > On 6/26/06, Mark Stark <htm...@ey...> wrote: >> hi, >> >> is it possible to change certain stirngs in a html file, without >> "rebuilding" it? I would like to change some string in a html file, like >> a dictionary lookup, but i dont want to rebuild the html file like its >> done in URLModifyingVisitor. >> >> Can anybody give me a hint? I've still got >> >> Parser p = new Parser(); >> p.setInputHTML(html); >> SegmentReplacingVisitor s = new SegmentReplacingVisitor(); >> p.visitAllNodesWith(s); >> >> and do some text.setText() within the SegmentReplacingVisitor each time >> i visit a StringNode. How to save it into a translated.htm file without >> rebuilding it? >> >> thanks a lot >> >> >> Using Tomcat but need to do more? Need to support web services, security? >> Get stuff done quickly with pre-integrated technology to make your job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> > > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
|
From: Ian M. <ian...@gm...> - 2006-06-26 08:11:06
|
If you do Parser.parse() you get a NodeList, and then if you call the toHtml() method it will return a String of the reparsed code. Ian On 6/26/06, Mark Stark <htm...@ey...> wrote: > hi, > > is it possible to change certain stirngs in a html file, without > "rebuilding" it? I would like to change some string in a html file, like > a dictionary lookup, but i dont want to rebuild the html file like its > done in URLModifyingVisitor. > > Can anybody give me a hint? I've still got > > Parser p = new Parser(); > p.setInputHTML(html); > SegmentReplacingVisitor s = new SegmentReplacingVisitor(); > p.visitAllNodesWith(s); > > and do some text.setText() within the SegmentReplacingVisitor each time > i visit a StringNode. How to save it into a translated.htm file without > rebuilding it? > > thanks a lot > > > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
|
From: Mark S. <htm...@ey...> - 2006-06-26 07:54:29
|
hi, is it possible to change certain stirngs in a html file, without "rebuilding" it? I would like to change some string in a html file, like a dictionary lookup, but i dont want to rebuild the html file like its done in URLModifyingVisitor. Can anybody give me a hint? I've still got Parser p = new Parser(); p.setInputHTML(html); SegmentReplacingVisitor s = new SegmentReplacingVisitor(); p.visitAllNodesWith(s); and do some text.setText() within the SegmentReplacingVisitor each time i visit a StringNode. How to save it into a translated.htm file without rebuilding it? thanks a lot |
|
From: Ian M. <ian...@gm...> - 2006-06-23 14:14:59
|
Dear Ryan,
This was discussed, and even the constructors/methods were hammered out in
this dev mailing list about a month or two ago. However I've not had the
time to implement it yet.
Please feel free to implement the agreed-upon design (or comment on it).
Unfortunately it looks like the SourceForge message archives are down for
now, so I'll forward you the entire thread to your own address (no need to
copy it all here).
Best wishes
Ian Macfarlane
On 6/22/06, Ryan Smith <rs...@li...> wrote:
>
> <div a=foobar b=aybabtu></div>
> <div a=foodbar b=aybabtu></div>
>
>
> Is there a way to use HasAttributeFilter("a","foo")
>
> But have the value match *NOT EXACT* ?
>
> Meaning, it would be nice if the value could be a regular expression, so
> i could match all attributes with the word "foo"
> instead of having to match thte attribute EXACTLY
>
> Is this possible? can i add a patch?
>
>
> Also, while im at it, the HtmlParserUtils has nice methods like trim
> tags etc...
> But i use a custom parser with my own tags defined, and the util class
> provides no way to give your own parser, (it called new Parser() :( )
> Can support be added to this? ( i have a patch if anyone wants it )
>
>
> Thanks
> -Ryan Smith
> Software Developer
> Foreclosure.com
>
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
|
|
From: Ryan S. <rs...@li...> - 2006-06-22 15:00:33
|
<div a=foobar b=aybabtu></div>
<div a=foodbar b=aybabtu></div>
Is there a way to use HasAttributeFilter("a","foo")
But have the value match *NOT EXACT* ?
Meaning, it would be nice if the value could be a regular expression, so
i could match all attributes with the word "foo"
instead of having to match thte attribute EXACTLY
Is this possible? can i add a patch?
Also, while im at it, the HtmlParserUtils has nice methods like trim
tags etc...
But i use a custom parser with my own tags defined, and the util class
provides no way to give your own parser, (it called new Parser() :( )
Can support be added to this? ( i have a patch if anyone wants it )
Thanks
-Ryan Smith
Software Developer
Foreclosure.com
|