htmlparser-user Mailing List for HTML Parser (Page 43)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Marc C. <mc...@ja...> - 2006-01-24 12:13:53
|
Hi, Thanks for your response. I tried this code below in an attempt to see if it would work given your comment: StringBuffer finalContents = new StringBuffer(); //Generate final output for (NodeIterator e = list.elements (); e.hasMoreNodes (); ) { Node node = e.nextNode (); if ( node.getEndPosition() == node.getStartPosition() ) { log.debug ( " IGNORED node : " + node.toHtml()); continue; } if (node instanceof TagNode) { if ( ((TagNode)node).getTagEnd() == ((TagNode)node). getTagBegin() ) { log.debug ( " IGNORED node : " + node.toHtml()); continue; } } finalContents.append(node.toHtml()); } This didn't seem to make any different. The positions of the virtual tags must've been corrected at an earlier stage in htmlparser. I have started looking at the htmlparser source to see where this occurs. Kind Regards, Mark -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald Sent: 23 January 2006 12:37 To: htm...@li... Subject: Re: [Htmlparser-user] Parsing malformed HTML whilst still leaving it intact This has been a requested task for two years now: http://sourceforge.net/pm/task.php?group_project_id=21601&group_id=24399&fun c=browse The virtual tags that are added have the start position the same as the end position, so a smarter toHtml() could recognize them that way and avoid outputting them. Marc Candle wrote: >Hi, > >I'm parsing snippets of HTML pages at a time, making some changes and then >outputting back to HTML. The problem with HTML snippets is that they will be >malformed since some closing tags, for example, will be missing. > >The Parser seems to automatically correct the malformed HTML by adding >closing tags. Is it possible to prevent it from doing so? Or at least it can >notify me when it does so, so that before reconstructing the modified HTML >output I can simply delete them. > >An alternative would be to use the Lexer but then I loose all the >hierarchical features of the Parser, which not an option. > >This is similar to the general problem brought up in > <http://sourceforge.net/mailarchive/message.php?msg_id=12635550> >http://sourceforge.net/mailarchive/message.php?msg_id=12635550 . > >Kind Regards > >Mark > > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Flynn, L. A. (Lori) <la...@lu...> - 2006-01-23 17:16:39
|
Greetings. I am new to HTML. I need to login to an https webpage via a script. I have got as far as using HTMLParser to get the webpage, then POST to back to it with the received cookie and the four input fields required. (Two of the fields were hidden variables.) However, login is not working since I just get back the same initial webpage. Is it possible to use HTMLParser for https connections? Regards, Lori |
From: Derrick O. <Der...@Ro...> - 2006-01-23 12:37:23
|
This has been a requested task for two years now: http://sourceforge.net/pm/task.php?group_project_id=21601&group_id=24399&func=browse The virtual tags that are added have the start position the same as the end position, so a smarter toHtml() could recognize them that way and avoid outputting them. Marc Candle wrote: >Hi, > >I'm parsing snippets of HTML pages at a time, making some changes and then >outputting back to HTML. The problem with HTML snippets is that they will be >malformed since some closing tags, for example, will be missing. > >The Parser seems to automatically correct the malformed HTML by adding >closing tags. Is it possible to prevent it from doing so? Or at least it can >notify me when it does so, so that before reconstructing the modified HTML >output I can simply delete them. > >An alternative would be to use the Lexer but then I loose all the >hierarchical features of the Parser, which not an option. > >This is similar to the general problem brought up in > <http://sourceforge.net/mailarchive/message.php?msg_id=12635550> >http://sourceforge.net/mailarchive/message.php?msg_id=12635550 . > >Kind Regards > >Mark > > > > |
From: Marc C. <mc...@ja...> - 2006-01-21 22:48:05
|
Hi, I'm parsing snippets of HTML pages at a time, making some changes and then outputting back to HTML. The problem with HTML snippets is that they will be malformed since some closing tags, for example, will be missing. The Parser seems to automatically correct the malformed HTML by adding closing tags. Is it possible to prevent it from doing so? Or at least it can notify me when it does so, so that before reconstructing the modified HTML output I can simply delete them. An alternative would be to use the Lexer but then I loose all the hierarchical features of the Parser, which not an option. This is similar to the general problem brought up in <http://sourceforge.net/mailarchive/message.php?msg_id=12635550> http://sourceforge.net/mailarchive/message.php?msg_id=12635550 . Kind Regards Mark |
From: Derrick O. <Der...@Ro...> - 2006-01-20 03:53:01
|
The recursive extract should get *all" nodes matching the filter in a flattened list. That is, there may be parents and their children all in the same NodeList... you have to traverse the parent child relations to find out which is which. Fuhrmann, Michael wrote: >Hi it's me again. > >Is there a possibility to get all nodes in an array no matter if parent or child? >Or do I have to loop through all nodes from parent to child in order to change their attributes? > >Thanks for your help >Michael > >-----Original Message----- >From: htm...@li... [mailto:htm...@li...] On Behalf Of Fuhrmann, Michael >Sent: Mittwoch, 18. Januar 2006 14:28 >To: htm...@li... >Subject: RE: [Htmlparser-user] Change Attributes of TDs and TRs > >Hi, > >When I use the method you suggested me the tables nodelist contains nothing. >Do you have an idea why? > >NodeList all_nodes = parser.parse(null); >NodeFilter all_tables = new NodeClassFilter(TableTag.class); >NodeList tables = all_nodes.extractAllNodesThatMatch(all_tables); > >The list all_nodes contains the whole site but when I use the nodefilter nothing stays in it...... > >Thanks and best regards >Michael > >-----Original Message----- >From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >Sent: Donnerstag, 12. Januar 2006 15:46 >To: htm...@li... >Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs > >Gather all the nodes into a list using no filter: > NodeList all_nodes = parser.parse (null); > >Then use the table filter on the whole list, process the nodes, and then >turn it back into a string: > NodeList tables = all_nodes.extractAllNodesThatMatch (all_tables); > ... process the tables list... > System.out.println (all_nodes.toHtml ()); > > > >Fuhrmann, Michael wrote: > > > >>Thanx for you support! >>But actually I don't want to parse the whole thing twice. >>My problem is that the page I want to parse contains many tables. >>Unfortunately these tables contain other tables and so on....... >>Now what I want to do is to change several attributes of the tds and trs for all tables. >>The aim is to cleanup the "dirty" html code in order to generate a pdf finally. >>My thought was to make a for loop which goes through all table tags. >>Or do you know a better solution? >> >>-----Original Message----- >>From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >>Sent: Donnerstag, 12. Januar 2006 01:24 >>To: htm...@li... >>Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs >> >>By the way, after this call: >> NodeList list = parser.parse (all_tables); >>the parser will be at the end of the page and return no more nodes. >>So, this: >> // Seperate all table tags >> * for* (NodeIterator e = parser.elements (); >>e.hasMoreNodes ();) >> e.nextNode ().collectInto (list,all_tables); >>doesn't do anything. >> >>You can use: >> parser.reset (); >>to start again, if that is what you really want to do, but in your case >>you would get duplicates of everything. >> >> >>Third Eye wrote: >> >> >> >> >> >>>Table tag object already has a fucntion to get the rows and TableRow >>>has function to get columns. You don't need to iterate yourself. >>> >>>On 1/11/06, Fuhrmann, Michael <mic...@sa...> wrote: >>> >>> >>> >>> >>> >>> >>>>Hi All! >>>> >>>>I want to change several attributes of the td and tr tags of certain tables >>>>but I don't know if do it the right way. >>>>The problem is that I find the right table (only tables with ids) but I >>>>don't reach the td or tr tags…. >>>>My code looks like that: >>>> >>>>public void cleanDokument(HttpServletRequest >>>>request,HttpServletResponse response) throws IOException >>>> { >>>> // Get the calling HTML Document define the Writer and open >>>>the connection >>>> URLConnection connection; >>>> URL request_url = new >>>>URL(request.getHeader("referer").toString()); >>>> >>>> PrintWriter out = response.getWriter(); >>>> connection = >>>>(HttpURLConnection)request_url.openConnection (); >>>> >>>> try >>>> { >>>> Parser parser = new Parser (); >>>> parser.setConnection(connection); >>>> >>>> NodeFilter all_tables = new TagNameFilter("table"); >>>> NodeList list = parser.parse (all_tables); >>>> Node[] nodelist; >>>> >>>> // Seperate all table tags >>>> for (NodeIterator e = parser.elements (); e.hasMoreNodes >>>>();) >>>> e.nextNode ().collectInto (list,all_tables); >>>> >>>> nodelist=list.toNodeArray(); >>>> >>>> for (int h=0; h<nodelist.length;h++) >>>> { >>>> if (nodelist[h] instanceof TableTag) >>>> { >>>> //for schleife f r die td's und tr's >>>> >>>>if(((TableTag)nodelist[h]).getAttribute("id")!= null) >>>> { >>>> for (int i=0; i<nodelist.length; >>>>i++) >>>> { >>>> >>>>out.println(nodelist.toString()); >>>> if(nodelist[i] instanceof >>>>TableRow) >>>> { >>>> out.println("Row >>>>found!"); >>>> >>>>((TableRow)nodelist[i]).removeAttribute ("nowrap"); >>>> } >>>> else if (nodelist[i] >>>>instanceof TableColumn) >>>> { >>>> out.println("Column >>>>found!"); >>>> >>>>((TableColumn)nodelist[i]).removeAttribute ("nowrap"); >>>> } >>>> } >>>> out.println(nodelist[h].toHtml()); >>>> } >>>> } >>>> else if(nodelist[h] instanceof TableRow || >>>>nodelist[h] instanceof TableColumn) >>>> { >>>> out.println("Else erreicht!"); >>>> >>>>out.println(((TableRow)nodelist[h]).getText()); >>>> } >>>> } >>>> //makePdf(out,response); >>>> } >>>> catch(Exception e) >>>> { >>>> out.println("Fehler beim Parsen!"); >>>> e.printStackTrace(out); >>>> } >>>> } >>>> >>>>Does my nodelist contain the tr and td tags? Is it right to say instanceof >>>>TableRow???? >>>> >>>>Many thanks and best regards >>>>Michael >>>> >>>> >>>> >>>> >>>> >>>> >>>-- >>>Naveen K Kohli >>>http://www.netomatix.com >>>N?HY隊X???'???u???[??????? >>>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >>> >>> >>> >>> >>a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= >> >> >> >> >> >>------------------------------------------------------- >>This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >>for problems? Stop! Download the new AJAX search engine that makes >>searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >>http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >>_______________________________________________ >>Htmlparser-user mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>N?HY隊X???'???u???[??????? >>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >> >> >a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user >NHY隊X'u[ >ަk!W~鮆zkC 塧m@^ǚ^zZfzj!x2 ɫ, >a{ ,H4mZjYwǥrgy >N?HY隊X???'???u???[??????? >ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2???????ɫ,???a{??,?H??4?m?????Z??jY?w??ǥrg?y$???~7ٸ?m?Νj??^??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > |
From: Fuhrmann, M. <mic...@sa...> - 2006-01-19 10:57:51
|
SGkgaXQncyBtZSBhZ2Fpbi4NCg0KSXMgdGhlcmUgYSBwb3NzaWJpbGl0eSB0byBnZXQgYWxsIG5v ZGVzIGluIGFuIGFycmF5IG5vIG1hdHRlciBpZiBwYXJlbnQgb3IgY2hpbGQ/DQpPciBkbyBJIGhh dmUgdG8gbG9vcCB0aHJvdWdoIGFsbCBub2RlcyBmcm9tIHBhcmVudCB0byBjaGlsZCBpbiBvcmRl ciB0byBjaGFuZ2UgdGhlaXIgYXR0cmlidXRlcz8NCg0KVGhhbmtzIGZvciB5b3VyIGhlbHANCk1p Y2hhZWwgDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQpGcm9tOiBodG1scGFyc2VyLXVz ZXItYWRtaW5AbGlzdHMuc291cmNlZm9yZ2UubmV0IFttYWlsdG86aHRtbHBhcnNlci11c2VyLWFk bWluQGxpc3RzLnNvdXJjZWZvcmdlLm5ldF0gT24gQmVoYWxmIE9mIEZ1aHJtYW5uLCBNaWNoYWVs DQpTZW50OiBNaXR0d29jaCwgMTguIEphbnVhciAyMDA2IDE0OjI4DQpUbzogaHRtbHBhcnNlci11 c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA0KU3ViamVjdDogUkU6IFtIdG1scGFyc2VyLXVzZXJd IENoYW5nZSBBdHRyaWJ1dGVzIG9mIFREcyBhbmQgVFJzDQoNCkhpLA0KDQpXaGVuIEkgdXNlIHRo ZSBtZXRob2QgeW91IHN1Z2dlc3RlZCBtZSB0aGUgdGFibGVzIG5vZGVsaXN0IGNvbnRhaW5zIG5v dGhpbmcuDQpEbyB5b3UgaGF2ZSBhbiBpZGVhIHdoeT8NCg0KTm9kZUxpc3QgYWxsX25vZGVzID0g cGFyc2VyLnBhcnNlKG51bGwpOw0KTm9kZUZpbHRlciBhbGxfdGFibGVzID0gbmV3IE5vZGVDbGFz c0ZpbHRlcihUYWJsZVRhZy5jbGFzcyk7DQpOb2RlTGlzdCB0YWJsZXMgPSBhbGxfbm9kZXMuZXh0 cmFjdEFsbE5vZGVzVGhhdE1hdGNoKGFsbF90YWJsZXMpOw0KDQpUaGUgbGlzdCBhbGxfbm9kZXMg Y29udGFpbnMgdGhlIHdob2xlIHNpdGUgYnV0IHdoZW4gSSB1c2UgdGhlIG5vZGVmaWx0ZXIgbm90 aGluZyBzdGF5cyBpbiBpdC4uLi4uLg0KDQpUaGFua3MgYW5kIGJlc3QgcmVnYXJkcw0KTWljaGFl bCANCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCkZyb206IGh0bWxwYXJzZXItdXNlci1h ZG1pbkBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQgW21haWx0bzpodG1scGFyc2VyLXVzZXItYWRtaW5A bGlzdHMuc291cmNlZm9yZ2UubmV0XSBPbiBCZWhhbGYgT2YgRGVycmljayBPc3dhbGQNClNlbnQ6 IERvbm5lcnN0YWcsIDEyLiBKYW51YXIgMjAwNiAxNTo0Ng0KVG86IGh0bWxwYXJzZXItdXNlckBs aXN0cy5zb3VyY2Vmb3JnZS5uZXQNClN1YmplY3Q6IFJlOiBbSHRtbHBhcnNlci11c2VyXSBDaGFu Z2UgQXR0cmlidXRlcyBvZiBURHMgYW5kIFRScw0KDQpHYXRoZXIgYWxsIHRoZSBub2RlcyBpbnRv IGEgbGlzdCB1c2luZyBubyBmaWx0ZXI6DQogICBOb2RlTGlzdCBhbGxfbm9kZXMgPSBwYXJzZXIu cGFyc2UgKG51bGwpOw0KDQpUaGVuIHVzZSB0aGUgdGFibGUgZmlsdGVyIG9uIHRoZSB3aG9sZSBs aXN0LCBwcm9jZXNzIHRoZSBub2RlcywgYW5kIHRoZW4gDQp0dXJuIGl0IGJhY2sgaW50byBhIHN0 cmluZzoNCiAgIE5vZGVMaXN0IHRhYmxlcyA9IGFsbF9ub2Rlcy5leHRyYWN0QWxsTm9kZXNUaGF0 TWF0Y2ggKGFsbF90YWJsZXMpOw0KICAgIC4uLiBwcm9jZXNzIHRoZSB0YWJsZXMgbGlzdC4uLg0K ICAgU3lzdGVtLm91dC5wcmludGxuIChhbGxfbm9kZXMudG9IdG1sICgpKTsNCg0KDQoNCkZ1aHJt YW5uLCBNaWNoYWVsIHdyb3RlOg0KDQo+VGhhbnggZm9yIHlvdSBzdXBwb3J0IQ0KPkJ1dCBhY3R1 YWxseSBJIGRvbid0IHdhbnQgdG8gcGFyc2UgdGhlIHdob2xlIHRoaW5nIHR3aWNlLg0KPk15IHBy b2JsZW0gaXMgdGhhdCB0aGUgcGFnZSBJIHdhbnQgdG8gcGFyc2UgY29udGFpbnMgbWFueSB0YWJs ZXMuDQo+VW5mb3J0dW5hdGVseSB0aGVzZSB0YWJsZXMgY29udGFpbiBvdGhlciB0YWJsZXMgYW5k IHNvIG9uLi4uLi4uLg0KPk5vdyB3aGF0IEkgd2FudCB0byBkbyBpcyB0byBjaGFuZ2Ugc2V2ZXJh bCBhdHRyaWJ1dGVzIG9mIHRoZSB0ZHMgYW5kIHRycyBmb3IgYWxsIHRhYmxlcy4NCj5UaGUgYWlt IGlzIHRvIGNsZWFudXAgdGhlICJkaXJ0eSIgaHRtbCBjb2RlIGluIG9yZGVyIHRvIGdlbmVyYXRl IGEgcGRmIGZpbmFsbHkuDQo+TXkgdGhvdWdodCB3YXMgdG8gbWFrZSBhIGZvciBsb29wIHdoaWNo IGdvZXMgdGhyb3VnaCBhbGwgdGFibGUgdGFncy4NCj5PciBkbyB5b3Uga25vdyBhIGJldHRlciBz b2x1dGlvbj8NCj4NCj4tLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPkZyb206IGh0bWxwYXJz ZXItdXNlci1hZG1pbkBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQgW21haWx0bzpodG1scGFyc2VyLXVz ZXItYWRtaW5AbGlzdHMuc291cmNlZm9yZ2UubmV0XSBPbiBCZWhhbGYgT2YgRGVycmljayBPc3dh bGQNCj5TZW50OiBEb25uZXJzdGFnLCAxMi4gSmFudWFyIDIwMDYgMDE6MjQNCj5UbzogaHRtbHBh cnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA0KPlN1YmplY3Q6IFJlOiBbSHRtbHBhcnNl ci11c2VyXSBDaGFuZ2UgQXR0cmlidXRlcyBvZiBURHMgYW5kIFRScw0KPg0KPkJ5IHRoZSB3YXks IGFmdGVyIHRoaXMgY2FsbDogDQo+ICBOb2RlTGlzdCBsaXN0ID0gcGFyc2VyLnBhcnNlIChhbGxf dGFibGVzKTsNCj50aGUgcGFyc2VyIHdpbGwgYmUgYXQgdGhlIGVuZCBvZiB0aGUgcGFnZSBhbmQg cmV0dXJuIG5vIG1vcmUgbm9kZXMuDQo+U28sIHRoaXM6DQo+ICAgICAgICAgICAvLyBTZXBlcmF0 ZSBhbGwgdGFibGUgdGFncw0KPiAgICAgICAgICAgICAgICAgICogZm9yKiAoTm9kZUl0ZXJhdG9y IGUgPSBwYXJzZXIuZWxlbWVudHMgKCk7IA0KPmUuaGFzTW9yZU5vZGVzICgpOykNCj4gICAgICAg ICAgICAgICAgICAgZS5uZXh0Tm9kZSAoKS5jb2xsZWN0SW50byAobGlzdCxhbGxfdGFibGVzKTsN Cj5kb2Vzbid0IGRvIGFueXRoaW5nLg0KPg0KPllvdSBjYW4gdXNlOg0KPiAgcGFyc2VyLnJlc2V0 ICgpOw0KPnRvIHN0YXJ0IGFnYWluLCBpZiB0aGF0IGlzIHdoYXQgeW91IHJlYWxseSB3YW50IHRv IGRvLCBidXQgaW4geW91ciBjYXNlIA0KPnlvdSB3b3VsZCBnZXQgZHVwbGljYXRlcyBvZiBldmVy eXRoaW5nLg0KPg0KPg0KPlRoaXJkIEV5ZSB3cm90ZToNCj4NCj4gIA0KPg0KPj5UYWJsZSB0YWcg b2JqZWN0IGFscmVhZHkgaGFzIGEgZnVjbnRpb24gdG8gZ2V0IHRoZSByb3dzIGFuZCBUYWJsZVJv dw0KPj5oYXMgZnVuY3Rpb24gdG8gZ2V0IGNvbHVtbnMuIFlvdSBkb24ndCBuZWVkIHRvIGl0ZXJh dGUgeW91cnNlbGYuDQo+Pg0KPj5PbiAxLzExLzA2LCBGdWhybWFubiwgTWljaGFlbCA8bWljaGFl bC5mdWhybWFubkBzYXAuY29tPiB3cm90ZToNCj4+IA0KPj4NCj4+ICAgIA0KPj4NCj4+PkhpIEFs bCENCj4+Pg0KPj4+SSB3YW50IHRvIGNoYW5nZSBzZXZlcmFsIGF0dHJpYnV0ZXMgb2YgdGhlIHRk IGFuZCB0ciB0YWdzIG9mIGNlcnRhaW4gdGFibGVzDQo+Pj5idXQgSSBkb24ndCBrbm93IGlmIGRv IGl0IHRoZSByaWdodCB3YXkuDQo+Pj5UaGUgcHJvYmxlbSBpcyB0aGF0IEkgZmluZCB0aGUgcmln aHQgdGFibGUgKG9ubHkgdGFibGVzIHdpdGggaWRzKSBidXQgSQ0KPj4+ZG9uJ3QgcmVhY2ggdGhl IHRkIG9yIHRyIHRhZ3PigKYuDQo+Pj5NeSBjb2RlIGxvb2tzIGxpa2UgdGhhdDoNCj4+Pg0KPj4+ cHVibGljIHZvaWQgY2xlYW5Eb2t1bWVudChIdHRwU2VydmxldFJlcXVlc3QNCj4+PnJlcXVlc3Qs SHR0cFNlcnZsZXRSZXNwb25zZSByZXNwb25zZSkgdGhyb3dzIElPRXhjZXB0aW9uDQo+Pj4gICAg ICAgew0KPj4+ICAgICAgICAgICAgICAgLy8gR2V0IHRoZSBjYWxsaW5nIEhUTUwgRG9jdW1lbnQg ZGVmaW5lIHRoZSBXcml0ZXIgYW5kIG9wZW4NCj4+PnRoZSBjb25uZWN0aW9uDQo+Pj4gICAgICAg ICAgICAgICBVUkxDb25uZWN0aW9uIGNvbm5lY3Rpb247DQo+Pj4gICAgICAgICAgICAgICBVUkwg cmVxdWVzdF91cmwgPSBuZXcNCj4+PlVSTChyZXF1ZXN0LmdldEhlYWRlcigicmVmZXJlciIpLnRv U3RyaW5nKCkpOw0KPj4+DQo+Pj4gICAgICAgICAgICAgICBQcmludFdyaXRlciBvdXQgPSByZXNw b25zZS5nZXRXcml0ZXIoKTsNCj4+PiAgICAgICAgICAgICAgIGNvbm5lY3Rpb24gPQ0KPj4+KEh0 dHBVUkxDb25uZWN0aW9uKXJlcXVlc3RfdXJsLm9wZW5Db25uZWN0aW9uICgpOw0KPj4+DQo+Pj4g ICAgICAgICAgICAgICB0cnkNCj4+PiAgICAgICAgICAgICAgIHsNCj4+PiAgICAgICAgICAgICAg ICAgIFBhcnNlciBwYXJzZXIgPSBuZXcgUGFyc2VyICgpOw0KPj4+ICAgICAgICAgICAgICAgICAg cGFyc2VyLnNldENvbm5lY3Rpb24oY29ubmVjdGlvbik7DQo+Pj4NCj4+PiAgICAgICAgICAgICAg ICAgIE5vZGVGaWx0ZXIgYWxsX3RhYmxlcyA9IG5ldyBUYWdOYW1lRmlsdGVyKCJ0YWJsZSIpOw0K Pj4+ICAgICAgICAgICAgICAgICAgTm9kZUxpc3QgbGlzdCA9IHBhcnNlci5wYXJzZSAoYWxsX3Rh Ymxlcyk7DQo+Pj4gICAgICAgICAgICAgICAgICBOb2RlW10gbm9kZWxpc3Q7DQo+Pj4NCj4+PiAg ICAgICAgICAvLyBTZXBlcmF0ZSBhbGwgdGFibGUgdGFncw0KPj4+ICAgICAgICAgICAgICAgICAg Zm9yIChOb2RlSXRlcmF0b3IgZSA9IHBhcnNlci5lbGVtZW50cyAoKTsgZS5oYXNNb3JlTm9kZXMN Cj4+PigpOykNCj4+PiAgICAgICAgICAgICAgICAgIGUubmV4dE5vZGUgKCkuY29sbGVjdEludG8g KGxpc3QsYWxsX3RhYmxlcyk7DQo+Pj4NCj4+PiAgICAgICAgICAgICAgICAgIG5vZGVsaXN0PWxp c3QudG9Ob2RlQXJyYXkoKTsNCj4+Pg0KPj4+ICAgICAgICAgICAgICAgICAgZm9yIChpbnQgaD0w OyBoPG5vZGVsaXN0Lmxlbmd0aDtoKyspDQo+Pj4gICAgICAgICAgICAgICAgICB7DQo+Pj4gICAg ICAgICAgICAgICAgICAgICAgIGlmIChub2RlbGlzdFtoXSBpbnN0YW5jZW9mIFRhYmxlVGFnKQ0K Pj4+ICAgICAgICAgICAgICAgICAgICAgICB7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgLy9mb3Igc2NobGVpZmUgZiByIGRpZSB0ZCdzIHVuZCB0cidzDQo+Pj4NCj4+PmlmKCgo VGFibGVUYWcpbm9kZWxpc3RbaF0pLmdldEF0dHJpYnV0ZSgiaWQiKSE9IG51bGwpDQo+Pj4gICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgew0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgZm9yIChpbnQgaT0wOyBpPG5vZGVsaXN0Lmxlbmd0aDsNCj4+PmkrKykN Cj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHsNCj4+Pg0KPj4+b3V0 LnByaW50bG4obm9kZWxpc3QudG9TdHJpbmcoKSk7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIGlmKG5vZGVsaXN0W2ldIGluc3RhbmNlb2YNCj4+PlRh YmxlUm93KQ0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICB7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgb3V0LnByaW50bG4oIlJvdw0KPj4+Zm91bmQhIik7DQo+Pj4NCj4+PigoVGFibGVSb3cp bm9kZWxpc3RbaV0pLnJlbW92ZUF0dHJpYnV0ZSAoIm5vd3JhcCIpOw0KPj4+ICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB9DQo+Pj4gICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGVsc2UgaWYgKG5vZGVsaXN0W2ldDQo+Pj5p bnN0YW5jZW9mIFRhYmxlQ29sdW1uKQ0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICB7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgb3V0LnByaW50bG4oIkNvbHVtbg0KPj4+Zm91bmQhIik7DQo+ Pj4NCj4+PigoVGFibGVDb2x1bW4pbm9kZWxpc3RbaV0pLnJlbW92ZUF0dHJpYnV0ZSAoIm5vd3Jh cCIpOw0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB9 DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB9DQo+Pj4gICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBvdXQucHJpbnRsbihub2RlbGlzdFtoXS50 b0h0bWwoKSk7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfQ0KPj4+ICAgICAg ICAgICAgICAgICAgICAgICB9DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgIGVsc2UgaWYobm9k ZWxpc3RbaF0gaW5zdGFuY2VvZiBUYWJsZVJvdyB8fA0KPj4+bm9kZWxpc3RbaF0gaW5zdGFuY2Vv ZiBUYWJsZUNvbHVtbikNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgew0KPj4+ICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgIG91dC5wcmludGxuKCJFbHNlIGVycmVpY2h0ISIpOw0KPj4+ DQo+Pj5vdXQucHJpbnRsbigoKFRhYmxlUm93KW5vZGVsaXN0W2hdKS5nZXRUZXh0KCkpOw0KPj4+ ICAgICAgICAgICAgICAgICAgICAgICB9DQo+Pj4gICAgICAgICAgICAgICAgICB9DQo+Pj4gICAg ICAgICAgICAgICAgICAvL21ha2VQZGYob3V0LHJlc3BvbnNlKTsNCj4+PiAgICAgICAgICAgICAg IH0NCj4+PiAgICAgICAgICAgICAgIGNhdGNoKEV4Y2VwdGlvbiBlKQ0KPj4+ICAgICAgICAgICAg ICAgew0KPj4+ICAgICAgICAgICAgICAgICAgICAgICBvdXQucHJpbnRsbigiRmVobGVyIGJlaW0g UGFyc2VuISIpOw0KPj4+ICAgICAgICAgICAgICAgICAgICAgICBlLnByaW50U3RhY2tUcmFjZShv dXQpOw0KPj4+ICAgICAgICAgICAgICAgfQ0KPj4+ICAgICAgIH0NCj4+Pg0KPj4+RG9lcyBteSBu b2RlbGlzdCBjb250YWluIHRoZSB0ciBhbmQgdGQgdGFncz8gSXMgaXQgcmlnaHQgdG8gc2F5IGlu c3RhbmNlb2YNCj4+PlRhYmxlUm93Pz8/Pw0KPj4+DQo+Pj5NYW55IHRoYW5rcyBhbmQgYmVzdCBy ZWdhcmRzDQo+Pj5NaWNoYWVsDQo+Pj4gICANCj4+Pg0KPj4+ICAgICAgDQo+Pj4NCj4+LS0NCj4+ TmF2ZWVuIEsgS29obGkNCj4+aHR0cDovL3d3dy5uZXRvbWF0aXguY29tDQo+Pk4YP0hZ3rXpmopY Pz8/Jz8/P3U/Pz9bPz8/Pw4/Pz8NCj4+3qY/az8/IT8/Hz9XP34/6a6GP3prEj8/Qz8J5aGnbT8/ Pz8CQF7Hmj8/Xj8IP3o/Wj9mP3o/Hmo/IT94Mj8/PyA/Pxo/P8mrLD8/Pw0KPj4gICAgDQo+Pg0K PmF7ID8MPyw/A0g/PzQ/bT8/P2k/KD8/3KJvP3YnPz9qWWhyJ9evOj9yWD8/e2Y/Pz8/Pz8/P2op Yj8JYj8/P1paP8erP8erPystPz8uP8efPz8ePz9hPz9sPz9iPz8sPz8/eT8rPz/etz9iPz8/Pyst P3c/PxtmPz8/Pz8/c2VyPQ0KPiAgDQo+DQo+DQo+DQo+DQo+LS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KPlRoaXMgU0YubmV0IGVtYWlsIGlz IHNwb25zb3JlZCBieTogU3BsdW5rIEluYy4gRG8geW91IGdyZXAgdGhyb3VnaCBsb2cgZmlsZXMN Cj5mb3IgcHJvYmxlbXM/ICBTdG9wISAgRG93bmxvYWQgdGhlIG5ldyBBSkFYIHNlYXJjaCBlbmdp bmUgdGhhdCBtYWtlcw0KPnNlYXJjaGluZyB5b3VyIGxvZyBmaWxlcyBhcyBlYXN5IGFzIHN1cmZp bmcgdGhlICB3ZWIuICBET1dOTE9BRCBTUExVTkshDQo+aHR0cDovL2Fkcy5vc2RuLmNvbS8/YWRf aWQ9NzYzNyZhbGxvY19pZD0xNjg2NSZvcD1jbGljaw0KPl9fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fDQo+SHRtbHBhcnNlci11c2VyIG1haWxpbmcgbGlzdA0K Pkh0bWxwYXJzZXItdXNlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQNCj5odHRwczovL2xpc3RzLnNv dXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9odG1scGFyc2VyLXVzZXINCj5OGD9IWd616ZqK WD8/Pyc/Pz91Pz8/Wz8/Pz8OPz8/DQo+3qY/az8/IT8/Hz9XP34/6a6GP3prEj8/Qz8J5aGnbT8/ Pz8CQF7Hmj8/Xj8IP3o/Wj9mP3o/Hmo/IT94Mj8/PyA/Pxo/P8mrLD8/Pw0KYXsgPww/LD8DSD8/ ND9tPz8/aT8oPz/com8/dic/P2pZaHIn1686P3JYPz97Zj8/Pz8/Pz8/ailiPwliPz8/Wlo/x6s/ x6s/Ky0/Py4/x58/Px4/P2E/P2w/P2I/Pyw/Pz95Pys/P963P2I/Pz8/Ky0/dz8/G2Y/Pz8/Pz9z ZXI9DQo+DQoNCg0KDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tDQpUaGlzIFNGLm5ldCBlbWFpbCBpcyBzcG9uc29yZWQgYnk6IFNwbHVuayBJ bmMuIERvIHlvdSBncmVwIHRocm91Z2ggbG9nIGZpbGVzDQpmb3IgcHJvYmxlbXM/ICBTdG9wISAg RG93bmxvYWQgdGhlIG5ldyBBSkFYIHNlYXJjaCBlbmdpbmUgdGhhdCBtYWtlcw0Kc2VhcmNoaW5n IHlvdXIgbG9nIGZpbGVzIGFzIGVhc3kgYXMgc3VyZmluZyB0aGUgIHdlYi4gIERPV05MT0FEIFNQ TFVOSyENCmh0dHA6Ly9hZHMub3Nkbi5jb20vP2FkX2lkPTc2MzcmYWxsb2NfaWQ9MTY4NjUmb3A9 Y2xpY2sNCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQpI dG1scGFyc2VyLXVzZXIgbWFpbGluZyBsaXN0DQpIdG1scGFyc2VyLXVzZXJAbGlzdHMuc291cmNl Zm9yZ2UubmV0DQpodHRwczovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9o dG1scGFyc2VyLXVzZXINCk4YSFnetemailgndVsODQrepmshH1d+6a6GemsSQwnloadtAkBex5pe CHpaZnoeaiF4MiAayassDQpheyAMLANING1aH2pZGnfHpXJneQMNCg== |
From: Derrick O. <Der...@Ro...> - 2006-01-18 23:27:02
|
Sorry, I wasn't thinking. You need to use the recursive flag (second parameter) to dig down into the list: NodeList tables = all_nodes.extractAllNodesThatMatch (all_tables, true); Fuhrmann, Michael wrote: >Hi, > >When I use the method you suggested me the tables nodelist contains nothing. >Do you have an idea why? > >NodeList all_nodes = parser.parse(null); >NodeFilter all_tables = new NodeClassFilter(TableTag.class); >NodeList tables = all_nodes.extractAllNodesThatMatch(all_tables); > >The list all_nodes contains the whole site but when I use the nodefilter nothing stays in it...... > >Thanks and best regards >Michael > >-----Original Message----- >From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >Sent: Donnerstag, 12. Januar 2006 15:46 >To: htm...@li... >Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs > >Gather all the nodes into a list using no filter: > NodeList all_nodes = parser.parse (null); > >Then use the table filter on the whole list, process the nodes, and then >turn it back into a string: > NodeList tables = all_nodes.extractAllNodesThatMatch (all_tables); > ... process the tables list... > System.out.println (all_nodes.toHtml ()); > > > >Fuhrmann, Michael wrote: > > > >>Thanx for you support! >>But actually I don't want to parse the whole thing twice. >>My problem is that the page I want to parse contains many tables. >>Unfortunately these tables contain other tables and so on....... >>Now what I want to do is to change several attributes of the tds and trs for all tables. >>The aim is to cleanup the "dirty" html code in order to generate a pdf finally. >>My thought was to make a for loop which goes through all table tags. >>Or do you know a better solution? >> >>-----Original Message----- >>From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >>Sent: Donnerstag, 12. Januar 2006 01:24 >>To: htm...@li... >>Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs >> >>By the way, after this call: >> NodeList list = parser.parse (all_tables); >>the parser will be at the end of the page and return no more nodes. >>So, this: >> // Seperate all table tags >> * for* (NodeIterator e = parser.elements (); >>e.hasMoreNodes ();) >> e.nextNode ().collectInto (list,all_tables); >>doesn't do anything. >> >>You can use: >> parser.reset (); >>to start again, if that is what you really want to do, but in your case >>you would get duplicates of everything. >> >> >>Third Eye wrote: >> >> >> >> >> >>>Table tag object already has a fucntion to get the rows and TableRow >>>has function to get columns. You don't need to iterate yourself. >>> >>>On 1/11/06, Fuhrmann, Michael <mic...@sa...> wrote: >>> >>> >>> >>> >>> >>> >>>>Hi All! >>>> >>>>I want to change several attributes of the td and tr tags of certain tables >>>>but I don't know if do it the right way. >>>>The problem is that I find the right table (only tables with ids) but I >>>>don't reach the td or tr tags…. >>>>My code looks like that: >>>> >>>>public void cleanDokument(HttpServletRequest >>>>request,HttpServletResponse response) throws IOException >>>> { >>>> // Get the calling HTML Document define the Writer and open >>>>the connection >>>> URLConnection connection; >>>> URL request_url = new >>>>URL(request.getHeader("referer").toString()); >>>> >>>> PrintWriter out = response.getWriter(); >>>> connection = >>>>(HttpURLConnection)request_url.openConnection (); >>>> >>>> try >>>> { >>>> Parser parser = new Parser (); >>>> parser.setConnection(connection); >>>> >>>> NodeFilter all_tables = new TagNameFilter("table"); >>>> NodeList list = parser.parse (all_tables); >>>> Node[] nodelist; >>>> >>>> // Seperate all table tags >>>> for (NodeIterator e = parser.elements (); e.hasMoreNodes >>>>();) >>>> e.nextNode ().collectInto (list,all_tables); >>>> >>>> nodelist=list.toNodeArray(); >>>> >>>> for (int h=0; h<nodelist.length;h++) >>>> { >>>> if (nodelist[h] instanceof TableTag) >>>> { >>>> //for schleife f r die td's und tr's >>>> >>>>if(((TableTag)nodelist[h]).getAttribute("id")!= null) >>>> { >>>> for (int i=0; i<nodelist.length; >>>>i++) >>>> { >>>> >>>>out.println(nodelist.toString()); >>>> if(nodelist[i] instanceof >>>>TableRow) >>>> { >>>> out.println("Row >>>>found!"); >>>> >>>>((TableRow)nodelist[i]).removeAttribute ("nowrap"); >>>> } >>>> else if (nodelist[i] >>>>instanceof TableColumn) >>>> { >>>> out.println("Column >>>>found!"); >>>> >>>>((TableColumn)nodelist[i]).removeAttribute ("nowrap"); >>>> } >>>> } >>>> out.println(nodelist[h].toHtml()); >>>> } >>>> } >>>> else if(nodelist[h] instanceof TableRow || >>>>nodelist[h] instanceof TableColumn) >>>> { >>>> out.println("Else erreicht!"); >>>> >>>>out.println(((TableRow)nodelist[h]).getText()); >>>> } >>>> } >>>> //makePdf(out,response); >>>> } >>>> catch(Exception e) >>>> { >>>> out.println("Fehler beim Parsen!"); >>>> e.printStackTrace(out); >>>> } >>>> } >>>> >>>>Does my nodelist contain the tr and td tags? Is it right to say instanceof >>>>TableRow???? >>>> >>>>Many thanks and best regards >>>>Michael >>>> >>>> >>>> >>>> >>>> >>>> >>>-- >>>Naveen K Kohli >>>http://www.netomatix.com >>>N?HY隊X???'???u???[??????? >>>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >>> >>> >>> >>> >>a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= >> >> >> >> >> >>------------------------------------------------------- >>This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >>for problems? Stop! Download the new AJAX search engine that makes >>searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >>http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >>_______________________________________________ >>Htmlparser-user mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>N?HY隊X???'???u???[??????? >>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >> >> >a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user >N?HY隊X???'???u???[??????? >ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2???????ɫ,???a{??,?H??4?m?????Z??jY?w??ǥrg?y$???~7ٸ?m?Νj??^??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > |
From: Fuhrmann, M. <mic...@sa...> - 2006-01-18 13:28:42
|
SGksDQoNCldoZW4gSSB1c2UgdGhlIG1ldGhvZCB5b3Ugc3VnZ2VzdGVkIG1lIHRoZSB0YWJsZXMg bm9kZWxpc3QgY29udGFpbnMgbm90aGluZy4NCkRvIHlvdSBoYXZlIGFuIGlkZWEgd2h5Pw0KDQpO b2RlTGlzdCBhbGxfbm9kZXMgPSBwYXJzZXIucGFyc2UobnVsbCk7DQpOb2RlRmlsdGVyIGFsbF90 YWJsZXMgPSBuZXcgTm9kZUNsYXNzRmlsdGVyKFRhYmxlVGFnLmNsYXNzKTsNCk5vZGVMaXN0IHRh YmxlcyA9IGFsbF9ub2Rlcy5leHRyYWN0QWxsTm9kZXNUaGF0TWF0Y2goYWxsX3RhYmxlcyk7DQoN ClRoZSBsaXN0IGFsbF9ub2RlcyBjb250YWlucyB0aGUgd2hvbGUgc2l0ZSBidXQgd2hlbiBJIHVz ZSB0aGUgbm9kZWZpbHRlciBub3RoaW5nIHN0YXlzIGluIGl0Li4uLi4uDQoNClRoYW5rcyBhbmQg YmVzdCByZWdhcmRzDQpNaWNoYWVsIA0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJv bTogaHRtbHBhcnNlci11c2VyLWFkbWluQGxpc3RzLnNvdXJjZWZvcmdlLm5ldCBbbWFpbHRvOmh0 bWxwYXJzZXItdXNlci1hZG1pbkBsaXN0cy5zb3VyY2Vmb3JnZS5uZXRdIE9uIEJlaGFsZiBPZiBE ZXJyaWNrIE9zd2FsZA0KU2VudDogRG9ubmVyc3RhZywgMTIuIEphbnVhciAyMDA2IDE1OjQ2DQpU bzogaHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA0KU3ViamVjdDogUmU6IFtI dG1scGFyc2VyLXVzZXJdIENoYW5nZSBBdHRyaWJ1dGVzIG9mIFREcyBhbmQgVFJzDQoNCkdhdGhl ciBhbGwgdGhlIG5vZGVzIGludG8gYSBsaXN0IHVzaW5nIG5vIGZpbHRlcjoNCiAgIE5vZGVMaXN0 IGFsbF9ub2RlcyA9IHBhcnNlci5wYXJzZSAobnVsbCk7DQoNClRoZW4gdXNlIHRoZSB0YWJsZSBm aWx0ZXIgb24gdGhlIHdob2xlIGxpc3QsIHByb2Nlc3MgdGhlIG5vZGVzLCBhbmQgdGhlbiANCnR1 cm4gaXQgYmFjayBpbnRvIGEgc3RyaW5nOg0KICAgTm9kZUxpc3QgdGFibGVzID0gYWxsX25vZGVz LmV4dHJhY3RBbGxOb2Rlc1RoYXRNYXRjaCAoYWxsX3RhYmxlcyk7DQogICAgLi4uIHByb2Nlc3Mg dGhlIHRhYmxlcyBsaXN0Li4uDQogICBTeXN0ZW0ub3V0LnByaW50bG4gKGFsbF9ub2Rlcy50b0h0 bWwgKCkpOw0KDQoNCg0KRnVocm1hbm4sIE1pY2hhZWwgd3JvdGU6DQoNCj5UaGFueCBmb3IgeW91 IHN1cHBvcnQhDQo+QnV0IGFjdHVhbGx5IEkgZG9uJ3Qgd2FudCB0byBwYXJzZSB0aGUgd2hvbGUg dGhpbmcgdHdpY2UuDQo+TXkgcHJvYmxlbSBpcyB0aGF0IHRoZSBwYWdlIEkgd2FudCB0byBwYXJz ZSBjb250YWlucyBtYW55IHRhYmxlcy4NCj5VbmZvcnR1bmF0ZWx5IHRoZXNlIHRhYmxlcyBjb250 YWluIG90aGVyIHRhYmxlcyBhbmQgc28gb24uLi4uLi4uDQo+Tm93IHdoYXQgSSB3YW50IHRvIGRv IGlzIHRvIGNoYW5nZSBzZXZlcmFsIGF0dHJpYnV0ZXMgb2YgdGhlIHRkcyBhbmQgdHJzIGZvciBh bGwgdGFibGVzLg0KPlRoZSBhaW0gaXMgdG8gY2xlYW51cCB0aGUgImRpcnR5IiBodG1sIGNvZGUg aW4gb3JkZXIgdG8gZ2VuZXJhdGUgYSBwZGYgZmluYWxseS4NCj5NeSB0aG91Z2h0IHdhcyB0byBt YWtlIGEgZm9yIGxvb3Agd2hpY2ggZ29lcyB0aHJvdWdoIGFsbCB0YWJsZSB0YWdzLg0KPk9yIGRv IHlvdSBrbm93IGEgYmV0dGVyIHNvbHV0aW9uPw0KPg0KPi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0t LS0tDQo+RnJvbTogaHRtbHBhcnNlci11c2VyLWFkbWluQGxpc3RzLnNvdXJjZWZvcmdlLm5ldCBb bWFpbHRvOmh0bWxwYXJzZXItdXNlci1hZG1pbkBsaXN0cy5zb3VyY2Vmb3JnZS5uZXRdIE9uIEJl aGFsZiBPZiBEZXJyaWNrIE9zd2FsZA0KPlNlbnQ6IERvbm5lcnN0YWcsIDEyLiBKYW51YXIgMjAw NiAwMToyNA0KPlRvOiBodG1scGFyc2VyLXVzZXJAbGlzdHMuc291cmNlZm9yZ2UubmV0DQo+U3Vi amVjdDogUmU6IFtIdG1scGFyc2VyLXVzZXJdIENoYW5nZSBBdHRyaWJ1dGVzIG9mIFREcyBhbmQg VFJzDQo+DQo+QnkgdGhlIHdheSwgYWZ0ZXIgdGhpcyBjYWxsOiANCj4gIE5vZGVMaXN0IGxpc3Qg PSBwYXJzZXIucGFyc2UgKGFsbF90YWJsZXMpOw0KPnRoZSBwYXJzZXIgd2lsbCBiZSBhdCB0aGUg ZW5kIG9mIHRoZSBwYWdlIGFuZCByZXR1cm4gbm8gbW9yZSBub2Rlcy4NCj5TbywgdGhpczoNCj4g ICAgICAgICAgIC8vIFNlcGVyYXRlIGFsbCB0YWJsZSB0YWdzDQo+ICAgICAgICAgICAgICAgICAg KiBmb3IqIChOb2RlSXRlcmF0b3IgZSA9IHBhcnNlci5lbGVtZW50cyAoKTsgDQo+ZS5oYXNNb3Jl Tm9kZXMgKCk7KQ0KPiAgICAgICAgICAgICAgICAgICBlLm5leHROb2RlICgpLmNvbGxlY3RJbnRv IChsaXN0LGFsbF90YWJsZXMpOw0KPmRvZXNuJ3QgZG8gYW55dGhpbmcuDQo+DQo+WW91IGNhbiB1 c2U6DQo+ICBwYXJzZXIucmVzZXQgKCk7DQo+dG8gc3RhcnQgYWdhaW4sIGlmIHRoYXQgaXMgd2hh dCB5b3UgcmVhbGx5IHdhbnQgdG8gZG8sIGJ1dCBpbiB5b3VyIGNhc2UgDQo+eW91IHdvdWxkIGdl dCBkdXBsaWNhdGVzIG9mIGV2ZXJ5dGhpbmcuDQo+DQo+DQo+VGhpcmQgRXllIHdyb3RlOg0KPg0K PiAgDQo+DQo+PlRhYmxlIHRhZyBvYmplY3QgYWxyZWFkeSBoYXMgYSBmdWNudGlvbiB0byBnZXQg dGhlIHJvd3MgYW5kIFRhYmxlUm93DQo+PmhhcyBmdW5jdGlvbiB0byBnZXQgY29sdW1ucy4gWW91 IGRvbid0IG5lZWQgdG8gaXRlcmF0ZSB5b3Vyc2VsZi4NCj4+DQo+Pk9uIDEvMTEvMDYsIEZ1aHJt YW5uLCBNaWNoYWVsIDxtaWNoYWVsLmZ1aHJtYW5uQHNhcC5jb20+IHdyb3RlOg0KPj4gDQo+Pg0K Pj4gICAgDQo+Pg0KPj4+SGkgQWxsIQ0KPj4+DQo+Pj5JIHdhbnQgdG8gY2hhbmdlIHNldmVyYWwg YXR0cmlidXRlcyBvZiB0aGUgdGQgYW5kIHRyIHRhZ3Mgb2YgY2VydGFpbiB0YWJsZXMNCj4+PmJ1 dCBJIGRvbid0IGtub3cgaWYgZG8gaXQgdGhlIHJpZ2h0IHdheS4NCj4+PlRoZSBwcm9ibGVtIGlz IHRoYXQgSSBmaW5kIHRoZSByaWdodCB0YWJsZSAob25seSB0YWJsZXMgd2l0aCBpZHMpIGJ1dCBJ DQo+Pj5kb24ndCByZWFjaCB0aGUgdGQgb3IgdHIgdGFnc+KApi4NCj4+Pk15IGNvZGUgbG9va3Mg bGlrZSB0aGF0Og0KPj4+DQo+Pj5wdWJsaWMgdm9pZCBjbGVhbkRva3VtZW50KEh0dHBTZXJ2bGV0 UmVxdWVzdA0KPj4+cmVxdWVzdCxIdHRwU2VydmxldFJlc3BvbnNlIHJlc3BvbnNlKSB0aHJvd3Mg SU9FeGNlcHRpb24NCj4+PiAgICAgICB7DQo+Pj4gICAgICAgICAgICAgICAvLyBHZXQgdGhlIGNh bGxpbmcgSFRNTCBEb2N1bWVudCBkZWZpbmUgdGhlIFdyaXRlciBhbmQgb3Blbg0KPj4+dGhlIGNv bm5lY3Rpb24NCj4+PiAgICAgICAgICAgICAgIFVSTENvbm5lY3Rpb24gY29ubmVjdGlvbjsNCj4+ PiAgICAgICAgICAgICAgIFVSTCByZXF1ZXN0X3VybCA9IG5ldw0KPj4+VVJMKHJlcXVlc3QuZ2V0 SGVhZGVyKCJyZWZlcmVyIikudG9TdHJpbmcoKSk7DQo+Pj4NCj4+PiAgICAgICAgICAgICAgIFBy aW50V3JpdGVyIG91dCA9IHJlc3BvbnNlLmdldFdyaXRlcigpOw0KPj4+ICAgICAgICAgICAgICAg Y29ubmVjdGlvbiA9DQo+Pj4oSHR0cFVSTENvbm5lY3Rpb24pcmVxdWVzdF91cmwub3BlbkNvbm5l Y3Rpb24gKCk7DQo+Pj4NCj4+PiAgICAgICAgICAgICAgIHRyeQ0KPj4+ICAgICAgICAgICAgICAg ew0KPj4+ICAgICAgICAgICAgICAgICAgUGFyc2VyIHBhcnNlciA9IG5ldyBQYXJzZXIgKCk7DQo+ Pj4gICAgICAgICAgICAgICAgICBwYXJzZXIuc2V0Q29ubmVjdGlvbihjb25uZWN0aW9uKTsNCj4+ Pg0KPj4+ICAgICAgICAgICAgICAgICAgTm9kZUZpbHRlciBhbGxfdGFibGVzID0gbmV3IFRhZ05h bWVGaWx0ZXIoInRhYmxlIik7DQo+Pj4gICAgICAgICAgICAgICAgICBOb2RlTGlzdCBsaXN0ID0g cGFyc2VyLnBhcnNlIChhbGxfdGFibGVzKTsNCj4+PiAgICAgICAgICAgICAgICAgIE5vZGVbXSBu b2RlbGlzdDsNCj4+Pg0KPj4+ICAgICAgICAgIC8vIFNlcGVyYXRlIGFsbCB0YWJsZSB0YWdzDQo+ Pj4gICAgICAgICAgICAgICAgICBmb3IgKE5vZGVJdGVyYXRvciBlID0gcGFyc2VyLmVsZW1lbnRz ICgpOyBlLmhhc01vcmVOb2Rlcw0KPj4+KCk7KQ0KPj4+ICAgICAgICAgICAgICAgICAgZS5uZXh0 Tm9kZSAoKS5jb2xsZWN0SW50byAobGlzdCxhbGxfdGFibGVzKTsNCj4+Pg0KPj4+ICAgICAgICAg ICAgICAgICAgbm9kZWxpc3Q9bGlzdC50b05vZGVBcnJheSgpOw0KPj4+DQo+Pj4gICAgICAgICAg ICAgICAgICBmb3IgKGludCBoPTA7IGg8bm9kZWxpc3QubGVuZ3RoO2grKykNCj4+PiAgICAgICAg ICAgICAgICAgIHsNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgaWYgKG5vZGVsaXN0W2hdIGlu c3RhbmNlb2YgVGFibGVUYWcpDQo+Pj4gICAgICAgICAgICAgICAgICAgICAgIHsNCj4+PiAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAvL2ZvciBzY2hsZWlmZSBmIHIgZGllIHRkJ3MgdW5k IHRyJ3MNCj4+Pg0KPj4+aWYoKChUYWJsZVRhZylub2RlbGlzdFtoXSkuZ2V0QXR0cmlidXRlKCJp ZCIpIT0gbnVsbCkNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB7DQo+Pj4gICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBmb3IgKGludCBpPTA7IGk8bm9kZWxp c3QubGVuZ3RoOw0KPj4+aSsrKQ0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgew0KPj4+DQo+Pj5vdXQucHJpbnRsbihub2RlbGlzdC50b1N0cmluZygpKTsNCj4+PiAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgaWYobm9kZWxpc3Rb aV0gaW5zdGFuY2VvZg0KPj4+VGFibGVSb3cpDQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgIHsNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICBvdXQucHJpbnRsbigiUm93DQo+Pj5mb3VuZCEiKTsN Cj4+Pg0KPj4+KChUYWJsZVJvdylub2RlbGlzdFtpXSkucmVtb3ZlQXR0cmlidXRlICgibm93cmFw Iik7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIH0N Cj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZWxzZSBp ZiAobm9kZWxpc3RbaV0NCj4+Pmluc3RhbmNlb2YgVGFibGVDb2x1bW4pDQo+Pj4gICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHsNCj4+PiAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBvdXQucHJpbnRsbigiQ29s dW1uDQo+Pj5mb3VuZCEiKTsNCj4+Pg0KPj4+KChUYWJsZUNvbHVtbilub2RlbGlzdFtpXSkucmVt b3ZlQXR0cmlidXRlICgibm93cmFwIik7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIH0NCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIH0NCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG91dC5w cmludGxuKG5vZGVsaXN0W2hdLnRvSHRtbCgpKTsNCj4+PiAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICB9DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgIH0NCj4+PiAgICAgICAgICAgICAg ICAgICAgICAgZWxzZSBpZihub2RlbGlzdFtoXSBpbnN0YW5jZW9mIFRhYmxlUm93IHx8DQo+Pj5u b2RlbGlzdFtoXSBpbnN0YW5jZW9mIFRhYmxlQ29sdW1uKQ0KPj4+ICAgICAgICAgICAgICAgICAg ICAgICB7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgb3V0LnByaW50bG4oIkVs c2UgZXJyZWljaHQhIik7DQo+Pj4NCj4+Pm91dC5wcmludGxuKCgoVGFibGVSb3cpbm9kZWxpc3Rb aF0pLmdldFRleHQoKSk7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgIH0NCj4+PiAgICAgICAg ICAgICAgICAgIH0NCj4+PiAgICAgICAgICAgICAgICAgIC8vbWFrZVBkZihvdXQscmVzcG9uc2Up Ow0KPj4+ICAgICAgICAgICAgICAgfQ0KPj4+ICAgICAgICAgICAgICAgY2F0Y2goRXhjZXB0aW9u IGUpDQo+Pj4gICAgICAgICAgICAgICB7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAgIG91dC5w cmludGxuKCJGZWhsZXIgYmVpbSBQYXJzZW4hIik7DQo+Pj4gICAgICAgICAgICAgICAgICAgICAg IGUucHJpbnRTdGFja1RyYWNlKG91dCk7DQo+Pj4gICAgICAgICAgICAgICB9DQo+Pj4gICAgICAg fQ0KPj4+DQo+Pj5Eb2VzIG15IG5vZGVsaXN0IGNvbnRhaW4gdGhlIHRyIGFuZCB0ZCB0YWdzPyBJ cyBpdCByaWdodCB0byBzYXkgaW5zdGFuY2VvZg0KPj4+VGFibGVSb3c/Pz8/DQo+Pj4NCj4+Pk1h bnkgdGhhbmtzIGFuZCBiZXN0IHJlZ2FyZHMNCj4+Pk1pY2hhZWwNCj4+PiAgIA0KPj4+DQo+Pj4g ICAgICANCj4+Pg0KPj4tLQ0KPj5OYXZlZW4gSyBLb2hsaQ0KPj5odHRwOi8vd3d3Lm5ldG9tYXRp eC5jb20NCj4+Thg/SFnetemailg/Pz8nPz8/dT8/P1s/Pz8/Dj8/Pw0KPj7epj9rPz8hPz8fP1c/ fj/proY/emsSPz9DPwnloadtPz8/PwJAXseaPz9ePwg/ej9aP2Y/ej8eaj8hP3gyPz8/ID8/Gj8/ yassPz8/DQo+PiAgICANCj4+DQo+YXsgPww/LD8DSD8/ND9tPz8/aT8oPz/com8/dic/P2pZaHIn 1686P3JYPz97Zj8/Pz8/Pz8/ailiPwliPz8/Wlo/x6s/x6s/Ky0/Py4/x58/Px4/P2E/P2w/P2I/ Pyw/Pz95Pys/P963P2I/Pz8/Ky0/dz8/G2Y/Pz8/Pz9zZXI9DQo+ICANCj4NCj4NCj4NCj4NCj4t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ VGhpcyBTRi5uZXQgZW1haWwgaXMgc3BvbnNvcmVkIGJ5OiBTcGx1bmsgSW5jLiBEbyB5b3UgZ3Jl cCB0aHJvdWdoIGxvZyBmaWxlcw0KPmZvciBwcm9ibGVtcz8gIFN0b3AhICBEb3dubG9hZCB0aGUg bmV3IEFKQVggc2VhcmNoIGVuZ2luZSB0aGF0IG1ha2VzDQo+c2VhcmNoaW5nIHlvdXIgbG9nIGZp bGVzIGFzIGVhc3kgYXMgc3VyZmluZyB0aGUgIHdlYi4gIERPV05MT0FEIFNQTFVOSyENCj5odHRw Oi8vYWRzLm9zZG4uY29tLz9hZF9pZD03NjM3JmFsbG9jX2lkPTE2ODY1Jm9wPWNsaWNrDQo+X19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCj5IdG1scGFyc2Vy LXVzZXIgbWFpbGluZyBsaXN0DQo+SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5l dA0KPmh0dHBzOi8vbGlzdHMuc291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3RpbmZvL2h0bWxwYXJz ZXItdXNlcg0KPk4YP0hZ3rXpmopYPz8/Jz8/P3U/Pz9bPz8/Pw4/Pz8NCj7epj9rPz8hPz8fP1c/ fj/proY/emsSPz9DPwnloadtPz8/PwJAXseaPz9ePwg/ej9aP2Y/ej8eaj8hP3gyPz8/ID8/Gj8/ yassPz8/DQpheyA/DD8sPwNIPz80P20/Pz9pPyg/P9yibz92Jz8/allocifXrzo/clg/P3tmPz8/ Pz8/Pz9qKWI/CWI/Pz9aWj/Hqz/Hqz8rLT8/Lj/Hnz8/Hj8/YT8/bD8/Yj8/LD8/P3k/Kz8/3rc/ Yj8/Pz8rLT93Pz8bZj8/Pz8/P3Nlcj0NCj4NCg0KDQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NClRoaXMgU0YubmV0IGVtYWlsIGlzIHNw b25zb3JlZCBieTogU3BsdW5rIEluYy4gRG8geW91IGdyZXAgdGhyb3VnaCBsb2cgZmlsZXMNCmZv ciBwcm9ibGVtcz8gIFN0b3AhICBEb3dubG9hZCB0aGUgbmV3IEFKQVggc2VhcmNoIGVuZ2luZSB0 aGF0IG1ha2VzDQpzZWFyY2hpbmcgeW91ciBsb2cgZmlsZXMgYXMgZWFzeSBhcyBzdXJmaW5nIHRo ZSAgd2ViLiAgRE9XTkxPQUQgU1BMVU5LIQ0KaHR0cDovL2Fkcy5vc2RuLmNvbS8/YWRfaWQ9NzYz NyZhbGxvY19pZD0xNjg2NSZvcD1jbGljaw0KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX18NCkh0bWxwYXJzZXItdXNlciBtYWlsaW5nIGxpc3QNCkh0bWxwYXJz ZXItdXNlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQNCmh0dHBzOi8vbGlzdHMuc291cmNlZm9yZ2Uu bmV0L2xpc3RzL2xpc3RpbmZvL2h0bWxwYXJzZXItdXNlcg0K |
From: Siddhartha L. <sid...@sl...> - 2006-01-17 15:47:19
|
SGksCkkgd2FudCB0byB1c2UgaHRtbHBhcnNlciB0byBmaWx0ZXIgdGFncyBmcm9tIGFuIGV4aXN0 aW5nIGh0bWwgc3RyaW5nCmZyb20gYSBsaXN0IG9mIGFuIGFsbG93YWJsZSB0YWdzLiBJIHdhbnQg dG8gcmVtb3ZlIGFsbCB0aGUgdW5hbGxvd2VkCnRhZ3MgYnV0IGtlZXAgdGhlIHRleHQgaW4gaXQu IFNpbmNlIEkgYW0gbWFraW5nIHRoZSBtb2RpZmljYXRpb24gaW4Kb3BlbmNtcyBidWlsZCwgSSBj YW5ub3QgdXBkYXRlIG9yIHVwZ3JhZGUgdGhlIGh0bWxwYXJzZXIgdmVyc2lvbiBmcm9tCjEuNQoK QnV0IHRoZSBwcm9ibGVtIGlzIHRoZSBjaGlsZHMgZ2V0cyByZXBlYXRlZC4KTGlrZQpJIGhhdmUg aHRtbCB3aGVyZSBzcGFuIHRhZ3MgYXJlIGFsbG93ZWQKPHNwYW4+SGVsbG88c3Bhbj4KdGhlIHJl c3VsdCBpcyA8c3Bhbj5IZWxsbzwvc3Bhbj5IZWxsbwpJIG1hZGUgYSBOb2RlVmlzaXRvciBjbGFz cwoKcGFja2FnZSBvcmcub3BlbmNtcy53b3JrcGxhY2UuZWRpdG9yczsKCmltcG9ydCBvcmcuaHRt bHBhcnNlci5UYWc7CmltcG9ydCBvcmcuaHRtbHBhcnNlci5UZXh0OwppbXBvcnQgb3JnLmh0bWxw YXJzZXIudGFncy5MaW5rVGFnOwppbXBvcnQgb3JnLmh0bWxwYXJzZXIudXRpbC5Ob2RlTGlzdDsK aW1wb3J0IG9yZy5odG1scGFyc2VyLnZpc2l0b3JzLk5vZGVWaXNpdG9yOwoKPGNvZGU+CnB1Ymxp YyBmaW5hbCBjbGFzcyBUZXh0VmlzaXRvciBleHRlbmRzIE5vZGVWaXNpdG9yIHsKCglOb2RlTGlz dCBtTm9kZUxpc3Q7Cglib29sZWFuIGFkZFRhZ1RleHQgPSB0cnVlOwoKCXB1YmxpYyBUZXh0Vmlz aXRvcigpIHsKCQlzdXBlcigpOwoJfQoKCXB1YmxpYyBUZXh0VmlzaXRvcihib29sZWFuIHJlY3Vy c2VDaGlsZHJlbikgewoJCXN1cGVyKHJlY3Vyc2VDaGlsZHJlbik7Cgl9CgoJcHVibGljIFRleHRW aXNpdG9yKGJvb2xlYW4gcmVjdXJzZUNoaWxkcmVuLCBib29sZWFuIHJlY3Vyc2VTZWxmKSB7CgkJ c3VwZXIocmVjdXJzZUNoaWxkcmVuLCByZWN1cnNlU2VsZik7CgkJbU5vZGVMaXN0ID0gbmV3IE5v ZGVMaXN0KCk7CgkJYWRkVGFnVGV4dCA9IHRydWU7Cgl9CgkKCXB1YmxpYyB2b2lkIHZpc2l0VGFn KFRhZyB0YWcpIHsKICAgICAgICBpZih0YWcuZ2V0VGFnTmFtZSgpLnRyaW0oKS5lcXVhbHNJZ25v cmVDYXNlKCJBIikpewogICAgICAgIAltTm9kZUxpc3QuYWRkKHRhZyk7CiAgICAgICAgCWFkZFRh Z1RleHQgPSBmYWxzZTsKICAgICAgICB9ZWxzZSBpZih0YWcuZ2V0VGFnTmFtZSgpLnRyaW0oKS5l cXVhbHNJZ25vcmVDYXNlKCJCIikgfHwKICAgICAgICAJCXRhZy5nZXRUYWdOYW1lKCkudHJpbSgp LmVxdWFsc0lnbm9yZUNhc2UoIkkiKSB8fAogICAgICAgIAkJdGFnLmdldFRhZ05hbWUoKS50cmlt KCkuZXF1YWxzSWdub3JlQ2FzZSgiVSIpIHx8CiAgICAgICAgCQl0YWcuZ2V0VGFnTmFtZSgpLnRy aW0oKS5lcXVhbHNJZ25vcmVDYXNlKCJQIikgfHwKICAgICAgICAJCXRhZy5nZXRUYWdOYW1lKCku dHJpbSgpLmVxdWFsc0lnbm9yZUNhc2UoIlNQQU4iKSB8fAogICAgICAgIAkJdGFnLmdldFRhZ05h bWUoKS50cmltKCkuZXF1YWxzSWdub3JlQ2FzZSgiQlIiKSl7CiAgICAgICAgCW1Ob2RlTGlzdC5h ZGQodGFnKTsKICAgIAl9CgoJfQoJCglwdWJsaWMgdm9pZCB2aXNpdEVuZFRhZyhUYWcgdGFnKSB7 CiAgICAgICAgaWYodGFnLmdldFRhZ05hbWUoKS50cmltKCkuZXF1YWxzSWdub3JlQ2FzZSgiQSIp KXsKICAgICAgICAJbU5vZGVMaXN0LmFkZCh0YWcpOwogICAgICAgIAlhZGRUYWdUZXh0ID0gdHJ1 ZTsKICAgICAgICB9ZWxzZSBpZih0YWcuZ2V0VGFnTmFtZSgpLnRyaW0oKS5lcXVhbHNJZ25vcmVD YXNlKCJCIikgfHwKICAgICAgICAJCXRhZy5nZXRUYWdOYW1lKCkudHJpbSgpLmVxdWFsc0lnbm9y ZUNhc2UoIkkiKSB8fAogICAgICAgIAkJdGFnLmdldFRhZ05hbWUoKS50cmltKCkuZXF1YWxzSWdu b3JlQ2FzZSgiVSIpIHx8CiAgICAgICAgCQl0YWcuZ2V0VGFnTmFtZSgpLnRyaW0oKS5lcXVhbHNJ Z25vcmVDYXNlKCJQIikgfHwKICAgICAgICAJCXRhZy5nZXRUYWdOYW1lKCkudHJpbSgpLmVxdWFs c0lnbm9yZUNhc2UoIlNQQU4iKXx8CiAgICAgICAgCQl0YWcuZ2V0VGFnTmFtZSgpLnRyaW0oKS5l cXVhbHNJZ25vcmVDYXNlKCJCUiIpKXsKICAgICAgICAJbU5vZGVMaXN0LmFkZCh0YWcpOwogICAg CX0KCiAgICB9CgkKCXB1YmxpYyB2b2lkIHZpc2l0U3RyaW5nTm9kZShUZXh0IHRleHQpIHsKCQlp ZihhZGRUYWdUZXh0KXsKCQkJbU5vZGVMaXN0LmFkZCh0ZXh0KTsKCQl9Cgl9CgkKCXB1YmxpYyBT dHJpbmcgdG9TdHJpbmcoKSB7CgkJcmV0dXJuIG1Ob2RlTGlzdC50b0h0bWwoKTsKCX0KCn0KPC9j b2RlPgoKCkNhbiBhbnlib2R5IGhlbHAgbWUuCgpUaGFua3MKU2lkZGhhcnRoYQo= |
From: Derrick O. <Der...@Ro...> - 2006-01-16 23:10:11
|
The filter isn't going to re-arrange your page for you. The best you can do is get the list of tags and then remove them from their parent's child list: for (int i=0; i < arr.length; i++) { filter1 = new TagNameFilter (arr[i]); filter2 = new NotFilter (filter1); NodeList x = n.extractAllNodesThatMatch (filter2); for (int j = 0; j < x.Length; j++) { Node tag = x.elementAt (j); tag.getParent ().getChildren ().remove (tag); } } or something like that. No guarantees about removing something that's already been removed though. Srikrishna Swaminathan wrote: > hi, > my name is srikrishna,i am a college student. > i am quiet new to html parser. > i am trying to use html parser to filter out certain tags in a html page. > with not filter i am able take out tags like script,img. > but the problem is that the tags are removed only if they outside any > other tags. > eg:if there is an img tag inside an table or td tag,the img tag is not > removed. > i dont want to remove the table or td tag,but i want to remove the > img,script tags inside the table tags. > the initial code i wrote was: > > String[]arr=new > String[]{"img","input","br","span","script","noscript","b","a href",}; > try{ > > Parser parser = new Parser (); > parser.setURL ("targetin.html"); > NodeList n = parser.parse(null); > NodeFilter filter1 = null; > NotFilter filter2 = null; > for(int i=0;i<arr.length;i++){ > filter1 = new TagNameFilter (arr[i]); > filter2 = new NotFilter(filter1); > n = n.extractAllNodesThatMatch(filter2); > } > String s1=n.toHtml(); > > This code removes all tags,outside table,tr,td tags.but if they are > inside a table tag,it is not able to do it. |
From: Srikrishna S. <swa...@gm...> - 2006-01-16 06:52:50
|
hi, my name is srikrishna,i am a college student. i am quiet new to html parser. i am trying to use html parser to filter out certain tags in a html page. with not filter i am able take out tags like script,img. but the problem is that the tags are removed only if they outside any other tags. eg:if there is an img tag inside an table or td tag,the img tag is not removed. i dont want to remove the table or td tag,but i want to remove the img,script tags inside the table tags. the initial code i wrote was: String[]arr=3Dnew String[]{"img","input","br","span","script","noscript","b","a href",}; try{ Parser parser =3D new Parser (); parser.setURL ("targetin.html"); NodeList n =3D parser.parse(null); NodeFilter filter1 =3D null; NotFilter filter2 =3D null; for(int i=3D0;i<arr.length;i++){ filter1 =3D new TagNameFilter (arr[i]); filter2 =3D new NotFilter(filter1); n =3D n.extractAllNodesThatMatch(filter2); } String s1=3Dn.toHtml(); This code removes all tags,outside table,tr,td tags.but if they are inside a table tag,it is not able to do it. |
From: Derrick O. <Der...@Ro...> - 2006-01-12 14:46:26
|
Gather all the nodes into a list using no filter: NodeList all_nodes = parser.parse (null); Then use the table filter on the whole list, process the nodes, and then turn it back into a string: NodeList tables = all_nodes.extractAllNodesThatMatch (all_tables); ... process the tables list... System.out.println (all_nodes.toHtml ()); Fuhrmann, Michael wrote: >Thanx for you support! >But actually I don't want to parse the whole thing twice. >My problem is that the page I want to parse contains many tables. >Unfortunately these tables contain other tables and so on....... >Now what I want to do is to change several attributes of the tds and trs for all tables. >The aim is to cleanup the "dirty" html code in order to generate a pdf finally. >My thought was to make a for loop which goes through all table tags. >Or do you know a better solution? > >-----Original Message----- >From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >Sent: Donnerstag, 12. Januar 2006 01:24 >To: htm...@li... >Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs > >By the way, after this call: > NodeList list = parser.parse (all_tables); >the parser will be at the end of the page and return no more nodes. >So, this: > // Seperate all table tags > * for* (NodeIterator e = parser.elements (); >e.hasMoreNodes ();) > e.nextNode ().collectInto (list,all_tables); >doesn't do anything. > >You can use: > parser.reset (); >to start again, if that is what you really want to do, but in your case >you would get duplicates of everything. > > >Third Eye wrote: > > > >>Table tag object already has a fucntion to get the rows and TableRow >>has function to get columns. You don't need to iterate yourself. >> >>On 1/11/06, Fuhrmann, Michael <mic...@sa...> wrote: >> >> >> >> >>>Hi All! >>> >>>I want to change several attributes of the td and tr tags of certain tables >>>but I don't know if do it the right way. >>>The problem is that I find the right table (only tables with ids) but I >>>don't reach the td or tr tags…. >>>My code looks like that: >>> >>>public void cleanDokument(HttpServletRequest >>>request,HttpServletResponse response) throws IOException >>> { >>> // Get the calling HTML Document define the Writer and open >>>the connection >>> URLConnection connection; >>> URL request_url = new >>>URL(request.getHeader("referer").toString()); >>> >>> PrintWriter out = response.getWriter(); >>> connection = >>>(HttpURLConnection)request_url.openConnection (); >>> >>> try >>> { >>> Parser parser = new Parser (); >>> parser.setConnection(connection); >>> >>> NodeFilter all_tables = new TagNameFilter("table"); >>> NodeList list = parser.parse (all_tables); >>> Node[] nodelist; >>> >>> // Seperate all table tags >>> for (NodeIterator e = parser.elements (); e.hasMoreNodes >>>();) >>> e.nextNode ().collectInto (list,all_tables); >>> >>> nodelist=list.toNodeArray(); >>> >>> for (int h=0; h<nodelist.length;h++) >>> { >>> if (nodelist[h] instanceof TableTag) >>> { >>> //for schleife f r die td's und tr's >>> >>>if(((TableTag)nodelist[h]).getAttribute("id")!= null) >>> { >>> for (int i=0; i<nodelist.length; >>>i++) >>> { >>> >>>out.println(nodelist.toString()); >>> if(nodelist[i] instanceof >>>TableRow) >>> { >>> out.println("Row >>>found!"); >>> >>>((TableRow)nodelist[i]).removeAttribute ("nowrap"); >>> } >>> else if (nodelist[i] >>>instanceof TableColumn) >>> { >>> out.println("Column >>>found!"); >>> >>>((TableColumn)nodelist[i]).removeAttribute ("nowrap"); >>> } >>> } >>> out.println(nodelist[h].toHtml()); >>> } >>> } >>> else if(nodelist[h] instanceof TableRow || >>>nodelist[h] instanceof TableColumn) >>> { >>> out.println("Else erreicht!"); >>> >>>out.println(((TableRow)nodelist[h]).getText()); >>> } >>> } >>> //makePdf(out,response); >>> } >>> catch(Exception e) >>> { >>> out.println("Fehler beim Parsen!"); >>> e.printStackTrace(out); >>> } >>> } >>> >>>Does my nodelist contain the tr and td tags? Is it right to say instanceof >>>TableRow???? >>> >>>Many thanks and best regards >>>Michael >>> >>> >>> >>> >>-- >>Naveen K Kohli >>http://www.netomatix.com >>N?HY隊X???'???u???[??????? >>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >> >> >a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user >N?HY隊X???'???u???[??????? >ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2???????ɫ,???a{??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > |
From: Fuhrmann, M. <mic...@sa...> - 2006-01-12 13:49:31
|
VGhhbnggZm9yIHlvdSBzdXBwb3J0IQ0KQnV0IGFjdHVhbGx5IEkgZG9uJ3Qgd2FudCB0byBwYXJz ZSB0aGUgd2hvbGUgdGhpbmcgdHdpY2UuDQpNeSBwcm9ibGVtIGlzIHRoYXQgdGhlIHBhZ2UgSSB3 YW50IHRvIHBhcnNlIGNvbnRhaW5zIG1hbnkgdGFibGVzLg0KVW5mb3J0dW5hdGVseSB0aGVzZSB0 YWJsZXMgY29udGFpbiBvdGhlciB0YWJsZXMgYW5kIHNvIG9uLi4uLi4uLg0KTm93IHdoYXQgSSB3 YW50IHRvIGRvIGlzIHRvIGNoYW5nZSBzZXZlcmFsIGF0dHJpYnV0ZXMgb2YgdGhlIHRkcyBhbmQg dHJzIGZvciBhbGwgdGFibGVzLg0KVGhlIGFpbSBpcyB0byBjbGVhbnVwIHRoZSAiZGlydHkiIGh0 bWwgY29kZSBpbiBvcmRlciB0byBnZW5lcmF0ZSBhIHBkZiBmaW5hbGx5Lg0KTXkgdGhvdWdodCB3 YXMgdG8gbWFrZSBhIGZvciBsb29wIHdoaWNoIGdvZXMgdGhyb3VnaCBhbGwgdGFibGUgdGFncy4N Ck9yIGRvIHlvdSBrbm93IGEgYmV0dGVyIHNvbHV0aW9uPw0KDQotLS0tLU9yaWdpbmFsIE1lc3Nh Z2UtLS0tLQ0KRnJvbTogaHRtbHBhcnNlci11c2VyLWFkbWluQGxpc3RzLnNvdXJjZWZvcmdlLm5l dCBbbWFpbHRvOmh0bWxwYXJzZXItdXNlci1hZG1pbkBsaXN0cy5zb3VyY2Vmb3JnZS5uZXRdIE9u IEJlaGFsZiBPZiBEZXJyaWNrIE9zd2FsZA0KU2VudDogRG9ubmVyc3RhZywgMTIuIEphbnVhciAy MDA2IDAxOjI0DQpUbzogaHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA0KU3Vi amVjdDogUmU6IFtIdG1scGFyc2VyLXVzZXJdIENoYW5nZSBBdHRyaWJ1dGVzIG9mIFREcyBhbmQg VFJzDQoNCkJ5IHRoZSB3YXksIGFmdGVyIHRoaXMgY2FsbDogDQogIE5vZGVMaXN0IGxpc3QgPSBw YXJzZXIucGFyc2UgKGFsbF90YWJsZXMpOw0KdGhlIHBhcnNlciB3aWxsIGJlIGF0IHRoZSBlbmQg b2YgdGhlIHBhZ2UgYW5kIHJldHVybiBubyBtb3JlIG5vZGVzLg0KU28sIHRoaXM6DQogICAgICAg ICAgIC8vIFNlcGVyYXRlIGFsbCB0YWJsZSB0YWdzDQogICAgICAgICAgICAgICAgICAqIGZvciog KE5vZGVJdGVyYXRvciBlID0gcGFyc2VyLmVsZW1lbnRzICgpOyANCmUuaGFzTW9yZU5vZGVzICgp OykNCiAgICAgICAgICAgICAgICAgICBlLm5leHROb2RlICgpLmNvbGxlY3RJbnRvIChsaXN0LGFs bF90YWJsZXMpOw0KZG9lc24ndCBkbyBhbnl0aGluZy4NCg0KWW91IGNhbiB1c2U6DQogIHBhcnNl ci5yZXNldCAoKTsNCnRvIHN0YXJ0IGFnYWluLCBpZiB0aGF0IGlzIHdoYXQgeW91IHJlYWxseSB3 YW50IHRvIGRvLCBidXQgaW4geW91ciBjYXNlIA0KeW91IHdvdWxkIGdldCBkdXBsaWNhdGVzIG9m IGV2ZXJ5dGhpbmcuDQoNCg0KVGhpcmQgRXllIHdyb3RlOg0KDQo+VGFibGUgdGFnIG9iamVjdCBh bHJlYWR5IGhhcyBhIGZ1Y250aW9uIHRvIGdldCB0aGUgcm93cyBhbmQgVGFibGVSb3cNCj5oYXMg ZnVuY3Rpb24gdG8gZ2V0IGNvbHVtbnMuIFlvdSBkb24ndCBuZWVkIHRvIGl0ZXJhdGUgeW91cnNl bGYuDQo+DQo+T24gMS8xMS8wNiwgRnVocm1hbm4sIE1pY2hhZWwgPG1pY2hhZWwuZnVocm1hbm5A c2FwLmNvbT4gd3JvdGU6DQo+ICANCj4NCj4+SGkgQWxsIQ0KPj4NCj4+SSB3YW50IHRvIGNoYW5n ZSBzZXZlcmFsIGF0dHJpYnV0ZXMgb2YgdGhlIHRkIGFuZCB0ciB0YWdzIG9mIGNlcnRhaW4gdGFi bGVzDQo+PmJ1dCBJIGRvbid0IGtub3cgaWYgZG8gaXQgdGhlIHJpZ2h0IHdheS4NCj4+VGhlIHBy b2JsZW0gaXMgdGhhdCBJIGZpbmQgdGhlIHJpZ2h0IHRhYmxlIChvbmx5IHRhYmxlcyB3aXRoIGlk cykgYnV0IEkNCj4+ZG9uJ3QgcmVhY2ggdGhlIHRkIG9yIHRyIHRhZ3PigKYuDQo+Pk15IGNvZGUg bG9va3MgbGlrZSB0aGF0Og0KPj4NCj4+cHVibGljIHZvaWQgY2xlYW5Eb2t1bWVudChIdHRwU2Vy dmxldFJlcXVlc3QNCj4+cmVxdWVzdCxIdHRwU2VydmxldFJlc3BvbnNlIHJlc3BvbnNlKSB0aHJv d3MgSU9FeGNlcHRpb24NCj4+ICAgICAgICB7DQo+PiAgICAgICAgICAgICAgICAvLyBHZXQgdGhl IGNhbGxpbmcgSFRNTCBEb2N1bWVudCBkZWZpbmUgdGhlIFdyaXRlciBhbmQgb3Blbg0KPj50aGUg Y29ubmVjdGlvbg0KPj4gICAgICAgICAgICAgICAgVVJMQ29ubmVjdGlvbiBjb25uZWN0aW9uOw0K Pj4gICAgICAgICAgICAgICAgVVJMIHJlcXVlc3RfdXJsID0gbmV3DQo+PlVSTChyZXF1ZXN0Lmdl dEhlYWRlcigicmVmZXJlciIpLnRvU3RyaW5nKCkpOw0KPj4NCj4+ICAgICAgICAgICAgICAgIFBy aW50V3JpdGVyIG91dCA9IHJlc3BvbnNlLmdldFdyaXRlcigpOw0KPj4gICAgICAgICAgICAgICAg Y29ubmVjdGlvbiA9DQo+PihIdHRwVVJMQ29ubmVjdGlvbilyZXF1ZXN0X3VybC5vcGVuQ29ubmVj dGlvbiAoKTsNCj4+DQo+PiAgICAgICAgICAgICAgICB0cnkNCj4+ICAgICAgICAgICAgICAgIHsN Cj4+ICAgICAgICAgICAgICAgICAgIFBhcnNlciBwYXJzZXIgPSBuZXcgUGFyc2VyICgpOw0KPj4g ICAgICAgICAgICAgICAgICAgcGFyc2VyLnNldENvbm5lY3Rpb24oY29ubmVjdGlvbik7DQo+Pg0K Pj4gICAgICAgICAgICAgICAgICAgTm9kZUZpbHRlciBhbGxfdGFibGVzID0gbmV3IFRhZ05hbWVG aWx0ZXIoInRhYmxlIik7DQo+PiAgICAgICAgICAgICAgICAgICBOb2RlTGlzdCBsaXN0ID0gcGFy c2VyLnBhcnNlIChhbGxfdGFibGVzKTsNCj4+ICAgICAgICAgICAgICAgICAgIE5vZGVbXSBub2Rl bGlzdDsNCj4+DQo+PiAgICAgICAgICAgLy8gU2VwZXJhdGUgYWxsIHRhYmxlIHRhZ3MNCj4+ICAg ICAgICAgICAgICAgICAgIGZvciAoTm9kZUl0ZXJhdG9yIGUgPSBwYXJzZXIuZWxlbWVudHMgKCk7 IGUuaGFzTW9yZU5vZGVzDQo+PigpOykNCj4+ICAgICAgICAgICAgICAgICAgIGUubmV4dE5vZGUg KCkuY29sbGVjdEludG8gKGxpc3QsYWxsX3RhYmxlcyk7DQo+Pg0KPj4gICAgICAgICAgICAgICAg ICAgbm9kZWxpc3Q9bGlzdC50b05vZGVBcnJheSgpOw0KPj4NCj4+ICAgICAgICAgICAgICAgICAg IGZvciAoaW50IGg9MDsgaDxub2RlbGlzdC5sZW5ndGg7aCsrKQ0KPj4gICAgICAgICAgICAgICAg ICAgew0KPj4gICAgICAgICAgICAgICAgICAgICAgICBpZiAobm9kZWxpc3RbaF0gaW5zdGFuY2Vv ZiBUYWJsZVRhZykNCj4+ICAgICAgICAgICAgICAgICAgICAgICAgew0KPj4gICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIC8vZm9yIHNjaGxlaWZlIGYgciBkaWUgdGQncyB1bmQgdHIncw0K Pj4NCj4+aWYoKChUYWJsZVRhZylub2RlbGlzdFtoXSkuZ2V0QXR0cmlidXRlKCJpZCIpIT0gbnVs bCkNCj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB7DQo+PiAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICBmb3IgKGludCBpPTA7IGk8bm9kZWxpc3QubGVuZ3Ro Ow0KPj5pKyspDQo+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB7DQo+ Pg0KPj5vdXQucHJpbnRsbihub2RlbGlzdC50b1N0cmluZygpKTsNCj4+ICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgaWYobm9kZWxpc3RbaV0gaW5zdGFuY2Vv Zg0KPj5UYWJsZVJvdykNCj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgew0KPj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIG91dC5wcmludGxuKCJSb3cNCj4+Zm91bmQhIik7DQo+Pg0KPj4oKFRhYmxl Um93KW5vZGVsaXN0W2ldKS5yZW1vdmVBdHRyaWJ1dGUgKCJub3dyYXAiKTsNCj4+ICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfQ0KPj4gICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBlbHNlIGlmIChub2RlbGlzdFtpXQ0K Pj5pbnN0YW5jZW9mIFRhYmxlQ29sdW1uKQ0KPj4gICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICB7DQo+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgb3V0LnByaW50bG4oIkNvbHVtbg0KPj5mb3VuZCEiKTsN Cj4+DQo+PigoVGFibGVDb2x1bW4pbm9kZWxpc3RbaV0pLnJlbW92ZUF0dHJpYnV0ZSAoIm5vd3Jh cCIpOw0KPj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB9 DQo+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB9DQo+PiAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBvdXQucHJpbnRsbihub2RlbGlzdFtoXS50 b0h0bWwoKSk7DQo+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfQ0KPj4gICAgICAg ICAgICAgICAgICAgICAgICB9DQo+PiAgICAgICAgICAgICAgICAgICAgICAgIGVsc2UgaWYobm9k ZWxpc3RbaF0gaW5zdGFuY2VvZiBUYWJsZVJvdyB8fA0KPj5ub2RlbGlzdFtoXSBpbnN0YW5jZW9m IFRhYmxlQ29sdW1uKQ0KPj4gICAgICAgICAgICAgICAgICAgICAgICB7DQo+PiAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgb3V0LnByaW50bG4oIkVsc2UgZXJyZWljaHQhIik7DQo+Pg0K Pj5vdXQucHJpbnRsbigoKFRhYmxlUm93KW5vZGVsaXN0W2hdKS5nZXRUZXh0KCkpOw0KPj4gICAg ICAgICAgICAgICAgICAgICAgICB9DQo+PiAgICAgICAgICAgICAgICAgICB9DQo+PiAgICAgICAg ICAgICAgICAgICAvL21ha2VQZGYob3V0LHJlc3BvbnNlKTsNCj4+ICAgICAgICAgICAgICAgIH0N Cj4+ICAgICAgICAgICAgICAgIGNhdGNoKEV4Y2VwdGlvbiBlKQ0KPj4gICAgICAgICAgICAgICAg ew0KPj4gICAgICAgICAgICAgICAgICAgICAgICBvdXQucHJpbnRsbigiRmVobGVyIGJlaW0gUGFy c2VuISIpOw0KPj4gICAgICAgICAgICAgICAgICAgICAgICBlLnByaW50U3RhY2tUcmFjZShvdXQp Ow0KPj4gICAgICAgICAgICAgICAgfQ0KPj4gICAgICAgIH0NCj4+DQo+PkRvZXMgbXkgbm9kZWxp c3QgY29udGFpbiB0aGUgdHIgYW5kIHRkIHRhZ3M/IElzIGl0IHJpZ2h0IHRvIHNheSBpbnN0YW5j ZW9mDQo+PlRhYmxlUm93Pz8/Pw0KPj4NCj4+TWFueSB0aGFua3MgYW5kIGJlc3QgcmVnYXJkcw0K Pj5NaWNoYWVsDQo+PiAgICANCj4+DQo+DQo+DQo+LS0NCj5OYXZlZW4gSyBLb2hsaQ0KPmh0dHA6 Ly93d3cubmV0b21hdGl4LmNvbQ0KPk4YP0hZ3rXpmopYPz8/Jz8/P3U/Pz9bPz8/Pw4/Pz8NCj7e pj9rPz8hPz8fP1c/fj/proY/emsSPz9DPwnloadtPz8/PwJAXseaPz9ePwg/ej9aP2Y/ej8eaj8h P3gyPz8/ID8/Gj8/yassPz8/DQpheyA/DD8sPwNIPz80P20/Pz9pPyg/P9yibz92Jz8/allocifX rzo/clg/P3tmPz8/Pz8/Pz9qKWI/CWI/Pz9aWj/Hqz/Hqz8rLT8/Lj/Hnz8/Hj8/YT8/bD8/Yj8/ LD8/P3k/Kz8/3rc/Yj8/Pz8rLT93Pz8bZj8/Pz8/P3Nlcj0NCj4NCg0KDQoNCi0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NClRoaXMgU0YubmV0 IGVtYWlsIGlzIHNwb25zb3JlZCBieTogU3BsdW5rIEluYy4gRG8geW91IGdyZXAgdGhyb3VnaCBs b2cgZmlsZXMNCmZvciBwcm9ibGVtcz8gIFN0b3AhICBEb3dubG9hZCB0aGUgbmV3IEFKQVggc2Vh cmNoIGVuZ2luZSB0aGF0IG1ha2VzDQpzZWFyY2hpbmcgeW91ciBsb2cgZmlsZXMgYXMgZWFzeSBh cyBzdXJmaW5nIHRoZSAgd2ViLiAgRE9XTkxPQUQgU1BMVU5LIQ0KaHR0cDovL2Fkcy5vc2RuLmNv bS8/YWRfaWQ9NzYzNyZhbGxvY19pZD0xNjg2NSZvcD1jbGljaw0KX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCkh0bWxwYXJzZXItdXNlciBtYWlsaW5nIGxp c3QNCkh0bWxwYXJzZXItdXNlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQNCmh0dHBzOi8vbGlzdHMu c291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3RpbmZvL2h0bWxwYXJzZXItdXNlcg0K |
From: Derrick O. <Der...@Ro...> - 2006-01-12 00:24:37
|
By the way, after this call: NodeList list = parser.parse (all_tables); the parser will be at the end of the page and return no more nodes. So, this: // Seperate all table tags * for* (NodeIterator e = parser.elements (); e.hasMoreNodes ();) e.nextNode ().collectInto (list,all_tables); doesn't do anything. You can use: parser.reset (); to start again, if that is what you really want to do, but in your case you would get duplicates of everything. Third Eye wrote: >Table tag object already has a fucntion to get the rows and TableRow >has function to get columns. You don't need to iterate yourself. > >On 1/11/06, Fuhrmann, Michael <mic...@sa...> wrote: > > >>Hi All! >> >>I want to change several attributes of the td and tr tags of certain tables >>but I don't know if do it the right way. >>The problem is that I find the right table (only tables with ids) but I >>don't reach the td or tr tags…. >>My code looks like that: >> >>public void cleanDokument(HttpServletRequest >>request,HttpServletResponse response) throws IOException >> { >> // Get the calling HTML Document define the Writer and open >>the connection >> URLConnection connection; >> URL request_url = new >>URL(request.getHeader("referer").toString()); >> >> PrintWriter out = response.getWriter(); >> connection = >>(HttpURLConnection)request_url.openConnection (); >> >> try >> { >> Parser parser = new Parser (); >> parser.setConnection(connection); >> >> NodeFilter all_tables = new TagNameFilter("table"); >> NodeList list = parser.parse (all_tables); >> Node[] nodelist; >> >> // Seperate all table tags >> for (NodeIterator e = parser.elements (); e.hasMoreNodes >>();) >> e.nextNode ().collectInto (list,all_tables); >> >> nodelist=list.toNodeArray(); >> >> for (int h=0; h<nodelist.length;h++) >> { >> if (nodelist[h] instanceof TableTag) >> { >> //for schleife f�r die td's und tr's >> >>if(((TableTag)nodelist[h]).getAttribute("id")!= null) >> { >> for (int i=0; i<nodelist.length; >>i++) >> { >> >>out.println(nodelist.toString()); >> if(nodelist[i] instanceof >>TableRow) >> { >> out.println("Row >>found!"); >> >>((TableRow)nodelist[i]).removeAttribute ("nowrap"); >> } >> else if (nodelist[i] >>instanceof TableColumn) >> { >> out.println("Column >>found!"); >> >>((TableColumn)nodelist[i]).removeAttribute ("nowrap"); >> } >> } >> out.println(nodelist[h].toHtml()); >> } >> } >> else if(nodelist[h] instanceof TableRow || >>nodelist[h] instanceof TableColumn) >> { >> out.println("Else erreicht!"); >> >>out.println(((TableRow)nodelist[h]).getText()); >> } >> } >> //makePdf(out,response); >> } >> catch(Exception e) >> { >> out.println("Fehler beim Parsen!"); >> e.printStackTrace(out); >> } >> } >> >>Does my nodelist contain the tr and td tags? Is it right to say instanceof >>TableRow???? >> >>Many thanks and best regards >>Michael >> >> > > >-- >Naveen K Kohli >http://www.netomatix.com >N?HY隊X???'???u???[??????? >ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2???????ɫ,???a{??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > |
From: Third E. <nav...@gm...> - 2006-01-11 13:31:52
|
VGFibGUgdGFnIG9iamVjdCBhbHJlYWR5IGhhcyBhIGZ1Y250aW9uIHRvIGdldCB0aGUgcm93cyBh bmQgVGFibGVSb3cKaGFzIGZ1bmN0aW9uIHRvIGdldCBjb2x1bW5zLiBZb3UgZG9uJ3QgbmVlZCB0 byBpdGVyYXRlIHlvdXJzZWxmLgoKT24gMS8xMS8wNiwgRnVocm1hbm4sIE1pY2hhZWwgPG1pY2hh ZWwuZnVocm1hbm5Ac2FwLmNvbT4gd3JvdGU6Cj4KPgo+IEhpIEFsbCEKPgo+IEkgd2FudCB0byBj aGFuZ2Ugc2V2ZXJhbCBhdHRyaWJ1dGVzIG9mIHRoZSB0ZCBhbmQgdHIgdGFncyBvZiBjZXJ0YWlu IHRhYmxlcwo+IGJ1dCBJIGRvbid0IGtub3cgaWYgZG8gaXQgdGhlIHJpZ2h0IHdheS4KPiBUaGUg cHJvYmxlbSBpcyB0aGF0IEkgZmluZCB0aGUgcmlnaHQgdGFibGUgKG9ubHkgdGFibGVzIHdpdGgg aWRzKSBidXQgSQo+IGRvbid0IHJlYWNoIHRoZSB0ZCBvciB0ciB0YWdz4oCmLgo+IE15IGNvZGUg bG9va3MgbGlrZSB0aGF0Ogo+Cj4gcHVibGljIHZvaWQgY2xlYW5Eb2t1bWVudChIdHRwU2Vydmxl dFJlcXVlc3QKPiByZXF1ZXN0LEh0dHBTZXJ2bGV0UmVzcG9uc2UgcmVzcG9uc2UpIHRocm93cyBJ T0V4Y2VwdGlvbgo+ICAgICAgICAgewo+ICAgICAgICAgICAgICAgICAvLyBHZXQgdGhlIGNhbGxp bmcgSFRNTCBEb2N1bWVudCBkZWZpbmUgdGhlIFdyaXRlciBhbmQgb3Blbgo+IHRoZSBjb25uZWN0 aW9uCj4gICAgICAgICAgICAgICAgIFVSTENvbm5lY3Rpb24gY29ubmVjdGlvbjsKPiAgICAgICAg ICAgICAgICAgVVJMIHJlcXVlc3RfdXJsID0gbmV3Cj4gVVJMKHJlcXVlc3QuZ2V0SGVhZGVyKCJy ZWZlcmVyIikudG9TdHJpbmcoKSk7Cj4KPiAgICAgICAgICAgICAgICAgUHJpbnRXcml0ZXIgb3V0 ID0gcmVzcG9uc2UuZ2V0V3JpdGVyKCk7Cj4gICAgICAgICAgICAgICAgIGNvbm5lY3Rpb24gPQo+ IChIdHRwVVJMQ29ubmVjdGlvbilyZXF1ZXN0X3VybC5vcGVuQ29ubmVjdGlvbiAoKTsKPgo+ICAg ICAgICAgICAgICAgICB0cnkKPiAgICAgICAgICAgICAgICAgewo+ICAgICAgICAgICAgICAgICAg ICBQYXJzZXIgcGFyc2VyID0gbmV3IFBhcnNlciAoKTsKPiAgICAgICAgICAgICAgICAgICAgcGFy c2VyLnNldENvbm5lY3Rpb24oY29ubmVjdGlvbik7Cj4KPiAgICAgICAgICAgICAgICAgICAgTm9k ZUZpbHRlciBhbGxfdGFibGVzID0gbmV3IFRhZ05hbWVGaWx0ZXIoInRhYmxlIik7Cj4gICAgICAg ICAgICAgICAgICAgIE5vZGVMaXN0IGxpc3QgPSBwYXJzZXIucGFyc2UgKGFsbF90YWJsZXMpOwo+ ICAgICAgICAgICAgICAgICAgICBOb2RlW10gbm9kZWxpc3Q7Cj4KPiAgICAgICAgICAgIC8vIFNl cGVyYXRlIGFsbCB0YWJsZSB0YWdzCj4gICAgICAgICAgICAgICAgICAgIGZvciAoTm9kZUl0ZXJh dG9yIGUgPSBwYXJzZXIuZWxlbWVudHMgKCk7IGUuaGFzTW9yZU5vZGVzCj4gKCk7KQo+ICAgICAg ICAgICAgICAgICAgICBlLm5leHROb2RlICgpLmNvbGxlY3RJbnRvIChsaXN0LGFsbF90YWJsZXMp Owo+Cj4gICAgICAgICAgICAgICAgICAgIG5vZGVsaXN0PWxpc3QudG9Ob2RlQXJyYXkoKTsKPgo+ ICAgICAgICAgICAgICAgICAgICBmb3IgKGludCBoPTA7IGg8bm9kZWxpc3QubGVuZ3RoO2grKykK PiAgICAgICAgICAgICAgICAgICAgewo+ICAgICAgICAgICAgICAgICAgICAgICAgIGlmIChub2Rl bGlzdFtoXSBpbnN0YW5jZW9mIFRhYmxlVGFnKQo+ICAgICAgICAgICAgICAgICAgICAgICAgIHsK PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC8vZm9yIHNjaGxlaWZlIGbvv71yIGRp ZSB0ZCdzIHVuZCB0cidzCj4KPiBpZigoKFRhYmxlVGFnKW5vZGVsaXN0W2hdKS5nZXRBdHRyaWJ1 dGUoImlkIikhPSBudWxsKQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgewo+ICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBmb3IgKGludCBpPTA7IGk8bm9k ZWxpc3QubGVuZ3RoOwo+IGkrKykKPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgewo+Cj4gb3V0LnByaW50bG4obm9kZWxpc3QudG9TdHJpbmcoKSk7Cj4gICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgaWYobm9kZWxpc3RbaV0gaW5z dGFuY2VvZgo+IFRhYmxlUm93KQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHsKPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgIG91dC5wcmludGxuKCJSb3cKPiBmb3VuZCEiKTsKPgo+ICgoVGFibGVS b3cpbm9kZWxpc3RbaV0pLnJlbW92ZUF0dHJpYnV0ZSAoIm5vd3JhcCIpOwo+ICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIH0KPiAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBlbHNlIGlmIChub2RlbGlzdFtpXQo+IGlu c3RhbmNlb2YgVGFibGVDb2x1bW4pCj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgewo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgb3V0LnByaW50bG4oIkNvbHVtbgo+IGZvdW5kISIpOwo+Cj4gKChU YWJsZUNvbHVtbilub2RlbGlzdFtpXSkucmVtb3ZlQXR0cmlidXRlICgibm93cmFwIik7Cj4gICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfQo+ICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB9Cj4gICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIG91dC5wcmludGxuKG5vZGVsaXN0W2hdLnRvSHRtbCgpKTsKPiAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIH0KPiAgICAgICAgICAgICAgICAgICAgICAg ICB9Cj4gICAgICAgICAgICAgICAgICAgICAgICAgZWxzZSBpZihub2RlbGlzdFtoXSBpbnN0YW5j ZW9mIFRhYmxlUm93IHx8Cj4gbm9kZWxpc3RbaF0gaW5zdGFuY2VvZiBUYWJsZUNvbHVtbikKPiAg ICAgICAgICAgICAgICAgICAgICAgICB7Cj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICBvdXQucHJpbnRsbigiRWxzZSBlcnJlaWNodCEiKTsKPgo+IG91dC5wcmludGxuKCgoVGFibGVS b3cpbm9kZWxpc3RbaF0pLmdldFRleHQoKSk7Cj4gICAgICAgICAgICAgICAgICAgICAgICAgfQo+ ICAgICAgICAgICAgICAgICAgICB9Cj4gICAgICAgICAgICAgICAgICAgIC8vbWFrZVBkZihvdXQs cmVzcG9uc2UpOwo+ICAgICAgICAgICAgICAgICB9Cj4gICAgICAgICAgICAgICAgIGNhdGNoKEV4 Y2VwdGlvbiBlKQo+ICAgICAgICAgICAgICAgICB7Cj4gICAgICAgICAgICAgICAgICAgICAgICAg b3V0LnByaW50bG4oIkZlaGxlciBiZWltIFBhcnNlbiEiKTsKPiAgICAgICAgICAgICAgICAgICAg ICAgICBlLnByaW50U3RhY2tUcmFjZShvdXQpOwo+ICAgICAgICAgICAgICAgICB9Cj4gICAgICAg ICB9Cj4KPiBEb2VzIG15IG5vZGVsaXN0IGNvbnRhaW4gdGhlIHRyIGFuZCB0ZCB0YWdzPyBJcyBp dCByaWdodCB0byBzYXkgaW5zdGFuY2VvZgo+IFRhYmxlUm93Pz8/Pwo+Cj4gTWFueSB0aGFua3Mg YW5kIGJlc3QgcmVnYXJkcwo+IE1pY2hhZWwKCgotLQpOYXZlZW4gSyBLb2hsaQpodHRwOi8vd3d3 Lm5ldG9tYXRpeC5jb20K |
From: Fuhrmann, M. <mic...@sa...> - 2006-01-11 13:26:16
|
Hi All! I want to change several attributes of the td and tr tags of certain = tables but I don't know if do it the right way. The problem is that I find the right table (only tables with ids) but I = don't reach the td or tr tags.... My code looks like that: public void cleanDokument(HttpServletRequest request,HttpServletResponse = response) throws IOException {=09 // Get the calling HTML Document define the Writer and open the = connection URLConnection connection; URL request_url =3D new URL(request.getHeader("referer").toString()); PrintWriter out =3D response.getWriter(); connection =3D (HttpURLConnection)request_url.openConnection (); =09 try { Parser parser =3D new Parser (); parser.setConnection(connection); =20 NodeFilter all_tables =3D new TagNameFilter("table"); NodeList list =3D parser.parse (all_tables); Node[] nodelist; // Seperate all table tags for (NodeIterator e =3D parser.elements (); e.hasMoreNodes ();) e.nextNode ().collectInto (list,all_tables); =20 nodelist=3Dlist.toNodeArray(); =20 for (int h=3D0; h<nodelist.length;h++)=20 { if (nodelist[h] instanceof TableTag)=20 {=09 //for schleife f=FCr die td's und tr's=09 if(((TableTag)nodelist[h]).getAttribute("id")!=3D null)=20 { for (int i=3D0; i<nodelist.length; i++) { out.println(nodelist.toString()); if(nodelist[i] instanceof TableRow) { out.println("Row found!"); ((TableRow)nodelist[i]).removeAttribute ("nowrap"); } else if (nodelist[i] instanceof TableColumn) { out.println("Column found!"); ((TableColumn)nodelist[i]).removeAttribute ("nowrap"); } } out.println(nodelist[h].toHtml()); } } else if(nodelist[h] instanceof TableRow || nodelist[h] instanceof = TableColumn) { out.println("Else erreicht!"); out.println(((TableRow)nodelist[h]).getText()); } } //makePdf(out,response); } catch(Exception e) { out.println("Fehler beim Parsen!"); e.printStackTrace(out); } } Does my nodelist contain the tr and td tags? Is it right to say = instanceof TableRow???? Many thanks and best regards Michael |
From: Third E. <nav...@gm...> - 2006-01-10 02:19:52
|
Are you looking for Option tag that is selected at load time, I mean has SELECTED attribute? I don't know how well will this work because most of time OPTION tag's state is set by JS in DOM. On 1/9/06, wo...@bl... <wo...@bl...> wrote: > Hello, > > Could anyone advise me on the best way to go about querying a SelectTag f= or > which of the child OptionTags is selected if indeed any are? > I was looking for something like a isSelected() method within the OptionT= ag > but didn't see anything. It's very possible i've missed something basic, > if so apologies and thanks in advance. > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi= les > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Naveen K Kohli http://www.netomatix.com |
From: <wo...@bl...> - 2006-01-10 01:13:20
|
Hello, Could anyone advise me on the best way to go about querying a SelectTag for which of the child OptionTags is selected if indeed any are? I was looking for something like a isSelected() method within the OptionTag but didn't see anything. It's very possible i've missed something basic, if so apologies and thanks in advance. |
From: Rajat S. <rs...@ai...> - 2006-01-06 22:24:28
|
Thanks=20 This works just fine !=20 -----Original Message----- From: htm...@li... [mailto:htm...@li...]On Behalf Of Third Eye Sent: Friday, January 06, 2006 11:42 AM To: htm...@li... Subject: Re: [Htmlparser-user] Best way to parse this html code Using AndFilter should help you to get to that value.. Here is some sample code... Sorry it is not java. But it should give you = idea. static void GetValuesFromTestPage() { FileStream obFile =3D new FileStream("TestPage.htm", FileMode.Open); Source obSource =3D new InputStreamSource(obFile); Page obPage =3D new Page(obSource); Lexer obLexer =3D new Lexer(obPage); Parser obParser =3D new Parser(obLexer); NodeFilter obTagFilter =3D new TagNameFilter("input"); NodeFilter obAttribFilter =3D new HasAttributeFilter("name", = "displayName"); NodeFilter andFilter =3D new AndFilter(obTagFilter, obAttribFilter); NodeList inputs =3D obParser.ExtractAllNodesThatMatch(andFilter); if (inputs !=3D null && inputs.Count =3D=3D 1) { INode obNode =3D inputs[0]; Console.WriteLine(((ITag)obNode).GetAttribute("value")); } obFile.Close(); HTML parser"); } On 1/6/06, Rajat Sharma <rs...@ai...> wrote: > > Hi all, > > Looking for the best way to parse the Html Code below and extract the = string > "NELogsTransferPolicy1" which is almost at the last line of this code. > > Thanks, > Raj > > > <table> > <tr> > <td> > <b>Select Policy To Add : </b> > </td> > <td> > <select name=3D"policyToAdd" = onChange=3D"setName()" > > <option > value=3D"NELogsTransferPolicy#com.ems.policies.NELogsTransferPolicy" > >NELogsTransferPolicy</option> > <option > = value=3D"NEConfigFileTransferPolicy#com.ems.policies.NEConfigFileTransfer= Policy" > >NEConfigFileTransferPolicy</option> > <option > value=3D"LogLvlCfgPolicy#com.ems.policies.LogLvlCfgPolicy" > >LogLvlCfgPolicy</option> > <option > value=3D"FileCleanupPolicy#com.ems.policies.FileCleanupPolicy" > >FileCleanupPolicy</option> > <option > value=3D"AlarmTableCleanupPolicy#com.ems.policies.AlertTableCleanup" > >AlarmTableCleanupPolicy</option> > <option > value=3D"EventTableCleanupPolicy#com.ems.policies.EventTableCleanup" > >EventTableCleanupPolicy</option> > > </select> > </td> > </tr> > <tr> > <td> > <b>Instance Name : </b> > </td> > <td> > <input type=3D"text" size=3D40 maxlength=3D100 > name=3D"displayName" value=3D"NELogsTransferPolicy1"> > </td> > </tr> > </table> -- Naveen K Kohli http://www.netomatix.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log = files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dick _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Third E. <nav...@gm...> - 2006-01-06 16:41:56
|
Using AndFilter should help you to get to that value.. Here is some sample code... Sorry it is not java. But it should give you id= ea. static void GetValuesFromTestPage() =09=09{ =09=09=09FileStream obFile =3D new FileStream("TestPage.htm", FileMode.Open= ); =09=09=09Source obSource =3D new InputStreamSource(obFile); =09=09=09Page obPage =3D new Page(obSource); =09=09=09Lexer obLexer =3D new Lexer(obPage); =09=09=09Parser obParser =3D new Parser(obLexer); =09=09=09NodeFilter obTagFilter =3D new TagNameFilter("input"); =09=09=09NodeFilter obAttribFilter =3D new HasAttributeFilter("name", "disp= layName"); =09=09=09NodeFilter andFilter =3D new AndFilter(obTagFilter, obAttribFilter= ); =09=09=09NodeList inputs =3D obParser.ExtractAllNodesThatMatch(andFilter); =09=09=09if (inputs !=3D null && inputs.Count =3D=3D 1) =09=09=09{ =09=09=09=09INode obNode =3D inputs[0]; =09=09=09=09Console.WriteLine(((ITag)obNode).GetAttribute("value")); =09=09=09} =09=09=09obFile.Close(); HTML parser"); =09=09} On 1/6/06, Rajat Sharma <rs...@ai...> wrote: > > Hi all, > > Looking for the best way to parse the Html Code below and extract the str= ing > "NELogsTransferPolicy1" which is almost at the last line of this code. > > Thanks, > Raj > > > <table> > <tr> > <td> > <b>Select Policy To Add : </b> > </td> > <td> > <select name=3D"policyToAdd" onChange=3D"setName(= )" > > <option > value=3D"NELogsTransferPolicy#com.ems.policies.NELogsTransferPolicy" > >NELogsTransferPolicy</option> > <option > value=3D"NEConfigFileTransferPolicy#com.ems.policies.NEConfigFileTransfer= Policy" > >NEConfigFileTransferPolicy</option> > <option > value=3D"LogLvlCfgPolicy#com.ems.policies.LogLvlCfgPolicy" > >LogLvlCfgPolicy</option> > <option > value=3D"FileCleanupPolicy#com.ems.policies.FileCleanupPolicy" > >FileCleanupPolicy</option> > <option > value=3D"AlarmTableCleanupPolicy#com.ems.policies.AlertTableCleanup" > >AlarmTableCleanupPolicy</option> > <option > value=3D"EventTableCleanupPolicy#com.ems.policies.EventTableCleanup" > >EventTableCleanupPolicy</option> > > </select> > </td> > </tr> > <tr> > <td> > <b>Instance Name : </b> > </td> > <td> > <input type=3D"text" size=3D40 maxlength=3D100 > name=3D"displayName" value=3D"NELogsTransferPolicy1"> > </td> > </tr> > </table> -- Naveen K Kohli http://www.netomatix.com |
From: Rajat S. <rs...@ai...> - 2006-01-06 15:32:33
|
Hi all, =20 Looking for the best way to parse the Html Code below and extract the = string "NELogsTransferPolicy1" which is almost at the last line of this = code. =20 Thanks, Raj =20 =20 <table> <tr> <td> <b>Select Policy To Add : </b> </td> <td> <select name=3D"policyToAdd" = onChange=3D"setName()" > <option = value=3D"NELogsTransferPolicy#com.ems.policies.NELogsTransferPolicy" = >NELogsTransferPolicy</option> <option = value=3D"NEConfigFileTransferPolicy#com.ems.policies.NEConfigFileTransfer= Policy" >NEConfigFileTransferPolicy</option> <option value=3D"LogLvlCfgPolicy#com.ems.policies.LogLvlCfgPolicy" = >LogLvlCfgPolicy</option> <option value=3D"FileCleanupPolicy#com.ems.policies.FileCleanupPolicy" = >FileCleanupPolicy</option> <option = value=3D"AlarmTableCleanupPolicy#com.ems.policies.AlertTableCleanup" = >AlarmTableCleanupPolicy</option> <option = value=3D"EventTableCleanupPolicy#com.ems.policies.EventTableCleanup" = >EventTableCleanupPolicy</option> =20 </select> </td> </tr> <tr> <td> <b>Instance Name : </b> </td> <td> <input type=3D"text" size=3D40 maxlength=3D100 = name=3D"displayName" value=3D"NELogsTransferPolicy1"> </td> </tr> </table> |
From: Axel <ax...@gm...> - 2006-01-01 14:23:05
|
Maybe this is also an alternative: https://xhtmlrenderer.dev.java.net/featurelist.html |
From: Derrick O. <Der...@Ro...> - 2005-12-29 14:41:53
|
Ted, I don't think HTML Parser is what you need. It's primary use-case is programatic extraction of information from web pages, i.e. spidering, with some facilities for re-writing. As far as I know, there isn't anyone using the HTML Parser as the parsing component of the JEditorPane, and I don't believe anyone has written a browser based on it. You might want to try some of the java based browsers, e.g. shogun <http://sourceforge.net/projects/shogun>, JXWB <http://sourceforge.net/projects/jxwb>, or others <http://sourceforge.net/softwaremap/trove_list.php?form_cat=91> that purport to do what you want. Derrick Ted Byers wrote: > I have read that the HTML parser that is used within JEditorPane is > seriously broken. In reading the archive of this list, the impression > is created that using HTMLParser with JEditorPane is problematic at > best, although there seems to be little recent material on this issue. > > HTMLParser comes highly recommended. However, it does me no good if I > can't figure out how to get started in order to use it to render > generic web pages. I have a need for a java component (perhaps an > applet, or an application that can be launched using webstart) that > will display web pages (most of my users will need read only access to > the documents rendered, but there is a second category of user that > needs read/write access to documents). I do not need, or want, to > have to deal with examining the data parsed by the parser. I really > don't want to write a class to render the output produced by > HTMLParser. I just want to make a web page viewer (or, better, a web > browser that supports basic scripting using e.g. JScript) that uses > HTMLParser to make it more robust than the default parsed used in > JEditorPane. > > On the face of it, none of the example applications show me how to do > this; although it is possible that I missed something. > > To do what I need done, do I need anything else other than > HTMLParser? Or can it be that HTMLParser includes functions to render > generic web pages on, e.g., a JFrame? In either case, where can I > find an example program that shows me how to do what I need to do to > get started? > > Once I have a start, the next phase will involve using a wysiwyg > editor web page and a servlet that uses HTMLparser to validate web > pages created using the wysiwyg editor web page, and send the user > intelligible error messages when the user tries to create something > HTMLParser doesn't understand. Or maybe there is already something > out there that will do what I need to do (preferably open source). > Any ideas/recommendations? > > Thanks, > > Ted > > R.E. (Ted) Byers, Ph.D., Ed.D. > R & D Decision Support Software > http://www.randddecisionsupportsolutions.com/ |
From: Third E. <nav...@gm...> - 2005-12-29 14:12:27
|
HTMLParser is not a browser so it is not going to be possible to get coordinates and positions of elements directly. You may have to write some addin on top of the parser where you load the output of the parser into some UI controls and then get the position. Just curious, is there some specific functionlaity you are looking for by knowing the coordinates? Naveen On 12/22/05, Gurpreet Sachdeva <gur...@gm...> wrote: > Thanks for the reply Naveen. > > >>>HTML parser will give you position if site is using absolute positioni= ng > and proper coordinates have been set in the STYLE attribute. > > How do we capture that information through HTML Parser. > Lets say if I need the coordinates of each element on http://news.bbc.co.= uk > How do I achive that? > > Thanks for your help, > Gurpreet Singh > > > On 12/22/05, Naveen Kohli <nav...@gm...> wrote: > > > > > > > > HTML parser will give you position if site is using absolute positionin= g > and proper coordinates have been set in the STYLE attribute. Otherwise, n= o > HTML parser can't give you the coordinates. > > > > > > > > Naveen > > > > > > > > ________________________________ > > > > > From: htm...@li... > [mailto:htm...@li...] On > Behalf Of Gurpreet Sachdeva > > Sent: Thursday, December 22, 2005 5:51 AM > > To: htm...@li... > > Subject: [Htmlparser-user] coordinates of text rendered on browser. > > > > > > > > > > Hi guys, > > > > I have a basic query. Do HTML Parser gives me the coordinates of text a= s > rendered on the browser? > > > > When I tried this: > > java -jar lib/htmlparser.jar http://news.bbc.co.uk A > > > > It gave something: > > > > LinkData > > -------- > > 0 Txt (55846[3258,81],55858[3258,93]): News sources > > *** END of LinkData *** > > Link to : http://www.bbc.co.uk/info/; titled : About the BBC; begins at= : > 55875; ends at : 55934, AccessKey=3Dnull > > LinkData > > -------- > > 0 Txt (55934[3259,65],55947[3259,78]): About the BBC > > *** END of LinkData *** > > Link to : > http://news.bbc.co.uk/newswatch/ukfs/hi/feedback/default.stm; > titled : Contact us; begins at : 55964; ends at : 56067, AccessKey=3Dnull > > LinkData > > -------- > > 0 Txt (56067[3260,109],56077[3260,119]): Contact us > > *** END of LinkData *** > > > > In the End... > > Does these numbers (56067[3260,109],56077[3260,119]) > refer to the coordinates? if not can I by some way get those coordinates > that are rendered in a standard browser (Mozilla/Firefox) > > > > Thanks and Regards, > > Gurpreet Singh > > > > > > > > -- > Thanks and Regards, > GSS -- Naveen K Kohli http://www.netomatix.com |
From: Ted B. <r.t...@ro...> - 2005-12-29 01:13:33
|
I have read that the HTML parser that is used within JEditorPane is = seriously broken. In reading the archive of this list, the impression = is created that using HTMLParser with JEditorPane is problematic at = best, although there seems to be little recent material on this issue. HTMLParser comes highly recommended. However, it does me no good if I = can't figure out how to get started in order to use it to render generic = web pages. I have a need for a java component (perhaps an applet, or an = application that can be launched using webstart) that will display web = pages (most of my users will need read only access to the documents = rendered, but there is a second category of user that needs read/write = access to documents). I do not need, or want, to have to deal with = examining the data parsed by the parser. I really don't want to write a = class to render the output produced by HTMLParser. I just want to make = a web page viewer (or, better, a web browser that supports basic = scripting using e.g. JScript) that uses HTMLParser to make it more = robust than the default parsed used in JEditorPane. On the face of it, none of the example applications show me how to do = this; although it is possible that I missed something. To do what I need done, do I need anything else other than HTMLParser? = Or can it be that HTMLParser includes functions to render generic web = pages on, e.g., a JFrame? In either case, where can I find an example = program that shows me how to do what I need to do to get started? Once I have a start, the next phase will involve using a wysiwyg editor = web page and a servlet that uses HTMLparser to validate web pages = created using the wysiwyg editor web page, and send the user = intelligible error messages when the user tries to create something = HTMLParser doesn't understand. Or maybe there is already something out = there that will do what I need to do (preferably open source). Any = ideas/recommendations? Thanks, Ted R.E. (Ted) Byers, Ph.D., Ed.D. R & D Decision Support Software http://www.randddecisionsupportsolutions.com/ |