Re: [Htmlparser-user] Change Attributes of TDs and TRs
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-01-12 14:46:26
|
Gather all the nodes into a list using no filter: NodeList all_nodes = parser.parse (null); Then use the table filter on the whole list, process the nodes, and then turn it back into a string: NodeList tables = all_nodes.extractAllNodesThatMatch (all_tables); ... process the tables list... System.out.println (all_nodes.toHtml ()); Fuhrmann, Michael wrote: >Thanx for you support! >But actually I don't want to parse the whole thing twice. >My problem is that the page I want to parse contains many tables. >Unfortunately these tables contain other tables and so on....... >Now what I want to do is to change several attributes of the tds and trs for all tables. >The aim is to cleanup the "dirty" html code in order to generate a pdf finally. >My thought was to make a for loop which goes through all table tags. >Or do you know a better solution? > >-----Original Message----- >From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald >Sent: Donnerstag, 12. Januar 2006 01:24 >To: htm...@li... >Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs > >By the way, after this call: > NodeList list = parser.parse (all_tables); >the parser will be at the end of the page and return no more nodes. >So, this: > // Seperate all table tags > * for* (NodeIterator e = parser.elements (); >e.hasMoreNodes ();) > e.nextNode ().collectInto (list,all_tables); >doesn't do anything. > >You can use: > parser.reset (); >to start again, if that is what you really want to do, but in your case >you would get duplicates of everything. > > >Third Eye wrote: > > > >>Table tag object already has a fucntion to get the rows and TableRow >>has function to get columns. You don't need to iterate yourself. >> >>On 1/11/06, Fuhrmann, Michael <mic...@sa...> wrote: >> >> >> >> >>>Hi All! >>> >>>I want to change several attributes of the td and tr tags of certain tables >>>but I don't know if do it the right way. >>>The problem is that I find the right table (only tables with ids) but I >>>don't reach the td or tr tags…. >>>My code looks like that: >>> >>>public void cleanDokument(HttpServletRequest >>>request,HttpServletResponse response) throws IOException >>> { >>> // Get the calling HTML Document define the Writer and open >>>the connection >>> URLConnection connection; >>> URL request_url = new >>>URL(request.getHeader("referer").toString()); >>> >>> PrintWriter out = response.getWriter(); >>> connection = >>>(HttpURLConnection)request_url.openConnection (); >>> >>> try >>> { >>> Parser parser = new Parser (); >>> parser.setConnection(connection); >>> >>> NodeFilter all_tables = new TagNameFilter("table"); >>> NodeList list = parser.parse (all_tables); >>> Node[] nodelist; >>> >>> // Seperate all table tags >>> for (NodeIterator e = parser.elements (); e.hasMoreNodes >>>();) >>> e.nextNode ().collectInto (list,all_tables); >>> >>> nodelist=list.toNodeArray(); >>> >>> for (int h=0; h<nodelist.length;h++) >>> { >>> if (nodelist[h] instanceof TableTag) >>> { >>> //for schleife f r die td's und tr's >>> >>>if(((TableTag)nodelist[h]).getAttribute("id")!= null) >>> { >>> for (int i=0; i<nodelist.length; >>>i++) >>> { >>> >>>out.println(nodelist.toString()); >>> if(nodelist[i] instanceof >>>TableRow) >>> { >>> out.println("Row >>>found!"); >>> >>>((TableRow)nodelist[i]).removeAttribute ("nowrap"); >>> } >>> else if (nodelist[i] >>>instanceof TableColumn) >>> { >>> out.println("Column >>>found!"); >>> >>>((TableColumn)nodelist[i]).removeAttribute ("nowrap"); >>> } >>> } >>> out.println(nodelist[h].toHtml()); >>> } >>> } >>> else if(nodelist[h] instanceof TableRow || >>>nodelist[h] instanceof TableColumn) >>> { >>> out.println("Else erreicht!"); >>> >>>out.println(((TableRow)nodelist[h]).getText()); >>> } >>> } >>> //makePdf(out,response); >>> } >>> catch(Exception e) >>> { >>> out.println("Fehler beim Parsen!"); >>> e.printStackTrace(out); >>> } >>> } >>> >>>Does my nodelist contain the tr and td tags? Is it right to say instanceof >>>TableRow???? >>> >>>Many thanks and best regards >>>Michael >>> >>> >>> >>> >>-- >>Naveen K Kohli >>http://www.netomatix.com >>N?HY隊X???'???u???[??????? >>ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,??? >> >> >a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user >N?HY隊X???'???u???[??????? >ަ?k??!???W?~?鮆?zk??C? 塧m????@^ǚ??^??z?Z?f?z?j?!?x2???????ɫ,???a{??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b? b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+???b????+-?w??f??????ser= > |