Re: [Htmlparser-user] Change Attributes of TDs and TRs
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-01-20 03:53:01
|
The recursive extract should get *all" nodes matching the filter in a flattened list. That is, there may be parents and their children all in the same NodeList... you have to traverse the parent child relations to find out which is which. Fuhrmann, Michael wrote: >Hi it's me again. > >Is there a possibility to get all nodes in an array no matter if parent or child? >Or do I have to loop through all nodes from parent to child in order to change their attributes? > >Thanks for your help >Michael > >-----Original Message----- >From: htm...@li... [mailto:htm...@li...] On Behalf Of Fuhrmann, Michael >Sent: Mittwoch, 18. Januar 2006 14:28 >To: htm...@li... >Subject: RE: [Htmlparser-user] Change Attributes of TDs and TRs > >Hi, > >When I use the method you suggested me the tables nodelist contains nothing. >Do you have an idea why? > >NodeList all_nodes = parser.parse(null); >NodeFilter all_tables = new NodeClassFilter(TableTag.class); >NodeList tables = all_nodes.extractAllNodesThatMatch(all_tables); > >The list all_nodes contains the whole site but when I use the nodefilter nothing stays in it...... > >Thanks and best regards >Michael > >-----Original Message----- >From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >Sent: Donnerstag, 12. Januar 2006 15:46 >To: htm...@li... >Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs > >Gather all the nodes into a list using no filter: > NodeList all_nodes = parser.parse (null); > >Then use the table filter on the whole list, process the nodes, and then >turn it back into a string: > NodeList tables = all_nodes.extractAllNodesThatMatch (all_tables); > ... process the tables list... > System.out.println (all_nodes.toHtml ()); > > > >Fuhrmann, Michael wrote: > > > >>Thanx for you support! >>But actually I don't want to parse the whole thing twice. >>My problem is that the page I want to parse contains many tables. >>Unfortunately these tables contain other tables and so on....... >>Now what I want to do is to change several attributes of the tds and trs for all tables. >>The aim is to cleanup the "dirty" html code in order to generate a pdf finally. >>My thought was to make a for loop which goes through all table tags. >>Or do you know a better solution? >> >>-----Original Message----- >>From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >>Sent: Donnerstag, 12. Januar 2006 01:24 >>To: htm...@li... >>Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs >> >>By the way, after this call: >> NodeList list = parser.parse (all_tables); >>the parser will be at the end of the page and return no more nodes. >>So, this: >> // Seperate all table tags >> * for* (NodeIterator e = parser.elements (); >>e.hasMoreNodes ();) >> e.nextNode ().collectInto (list,all_tables); >>doesn't do anything. >> >>You can use: >> parser.reset (); >>to start again, if that is what you really want to do, but in your case >>you would get duplicates of everything. >> >> >>Third Eye wrote: >> >> >> >> >> >>>Table tag object already has a fucntion to get the rows and TableRow >>>has function to get columns. You don't need to iterate yourself. >>> >>>On 1/11/06, Fuhrmann, Michael <mic...@sa...> wrote: >>> >>> >>> >>> >>> >>> >>>>Hi All! >>>> >>>>I want to change several attributes of the td and tr tags of certain tables >>>>but I don't know if do it the right way. >>>>The problem is that I find the right table (only tables with ids) but I >>>>don't reach the td or tr tags…. >>>>My code looks like that: >>>> >>>>public void cleanDokument(HttpServletRequest >>>>request,HttpServletResponse response) throws IOException >>>> { >>>> // Get the calling HTML Document define the Writer and open >>>>the connection >>>> URLConnection connection; >>>> URL request_url = new >>>>URL(request.getHeader("referer").toString()); >>>> >>>> PrintWriter out = response.getWriter(); >>>> connection = >>>>(HttpURLConnection)request_url.openConnection (); >>>> >>>> try >>>> { >>>> Parser parser = new Parser (); >>>> parser.setConnection(connection); >>>> >>>> NodeFilter all_tables = new TagNameFilter("table"); >>>> NodeList list = parser.parse (all_tables); >>>> Node[] nodelist; >>>> >>>> // Seperate all table tags >>>> for (NodeIterator e = parser.elements (); e.hasMoreNodes >>>>();) >>>> e.nextNode ().collectInto (list,all_tables); >>>> >>>> nodelist=list.toNodeArray(); >>>> >>>> for (int h=0; h<nodelist.length;h++) >>>> { >>>> if (nodelist[h] instanceof TableTag) >>>> { >>>> //for schleife f r die td's und tr's >>>> >>>>if(((TableTag)nodelist[h]).getAttribute("id")!= null) >>>> { >>>> for (int i=0; i<nodelist.length; >>>>i++) >>>> { >>>> >>>>out.println(nodelist.toString()); >>>> if(nodelist[i] instanceof >>>>TableRow) >>>> { >>>> out.println("Row >>>>found!"); >>>> >>>>((TableRow)nodelist[i]).removeAttribute ("nowrap"); >>>> } >>>> else if (nodelist[i] >>>>instanceof TableColumn) >>>> { >>>> out.println("Column >>>>found!"); >>>> >>>>((TableColumn)nodelist[i]).removeAttribute ("nowrap"); >>>> } >>>> } >>>> out.println(nodelist[h].toHtml()); >>>> } >>>> } >>>> else if(nodelist[h] instanceof TableRow || >>>>nodelist[h] instanceof TableColumn) >>>> { >>>> out.println("Else erreicht!"); >>>> >>>>out.println(((TableRow)nodelist[h]).getText()); >>>> } >>>> } >>>> //makePdf(out,response); >>>> } >>>> catch(Exception e) >>>> { >>>> out.println("Fehler beim Parsen!"); >>>> e.printStackTrace(out); >>>> } >>>> } >>>> >>>>Does my nodelist contain the tr and td tags? Is it right to say instanceof >>>>TableRow???? >>>> >>>>Many thanks and best regards >>>>Michael >>>> >>>> >>>> >>>> >>>> >>>> >>>-- >>>Naveen K Kohli >>>http://www.netomatix.com >>>N?HY隊X???'???u???[??????? >>>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >>> >>> >>> >>> >>a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= >> >> >> >> >> >>------------------------------------------------------- >>This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >>for problems? Stop! Download the new AJAX search engine that makes >>searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >>http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >>_______________________________________________ >>Htmlparser-user mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>N?HY隊X???'???u???[??????? >>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >> >> >a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user >NHY隊X'u[ >ަk!W~鮆zkC 塧m@^ǚ^zZfzj!x2 ɫ, >a{ ,H4mZjYwǥrgy >N?HY隊X???'???u???[??????? >ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2???????ɫ,???a{??,?H??4?m?????Z??jY?w??ǥrg?y$???~7ٸ?m?Νj??^??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > |