Re: [Htmlparser-user] Change Attributes of TDs and TRs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Gather all the nodes into a list using no filter:
   NodeList all_nodes = parser.parse (null);

Then use the table filter on the whole list, process the nodes, and then 
turn it back into a string:
   NodeList tables = all_nodes.extractAllNodesThatMatch (all_tables);
    ... process the tables list...
   System.out.println (all_nodes.toHtml ());

Fuhrmann, Michael wrote:

>Thanx for you support!
>But actually I don't want to parse the whole thing twice.
>My problem is that the page I want to parse contains many tables.
>Unfortunately these tables contain other tables and so on.......
>Now what I want to do is to change several attributes of the tds and trs for all tables.
>The aim is to cleanup the "dirty" html code in order to generate a pdf finally.
>My thought was to make a for loop which goes through all table tags.
>Or do you know a better solution?
>
>-----Original Message-----
>From: htm...@li... [mailto:htm...@li...] On Behalf Of Derrick Oswald
>Sent: Donnerstag, 12. Januar 2006 01:24
>To: htm...@li...
>Subject: Re: [Htmlparser-user] Change Attributes of TDs and TRs
>
>By the way, after this call: 
>  NodeList list = parser.parse (all_tables);
>the parser will be at the end of the page and return no more nodes.
>So, this:
>           // Seperate all table tags
>                  * for* (NodeIterator e = parser.elements (); 
>e.hasMoreNodes ();)
>                   e.nextNode ().collectInto (list,all_tables);
>doesn't do anything.
>
>You can use:
>  parser.reset ();
>to start again, if that is what you really want to do, but in your case 
>you would get duplicates of everything.
>
>
>Third Eye wrote:
>
>  
>
>>Table tag object already has a fucntion to get the rows and TableRow
>>has function to get columns. You don't need to iterate yourself.
>>
>>On 1/11/06, Fuhrmann, Michael <mic...@sa...> wrote:
>> 
>>
>>    
>>
>>>Hi All!
>>>
>>>I want to change several attributes of the td and tr tags of certain tables
>>>but I don't know if do it the right way.
>>>The problem is that I find the right table (only tables with ids) but I
>>>don't reach the td or tr tags….
>>>My code looks like that:
>>>
>>>public void cleanDokument(HttpServletRequest
>>>request,HttpServletResponse response) throws IOException
>>>       {
>>>               // Get the calling HTML Document define the Writer and open
>>>the connection
>>>               URLConnection connection;
>>>               URL request_url = new
>>>URL(request.getHeader("referer").toString());
>>>
>>>               PrintWriter out = response.getWriter();
>>>               connection =
>>>(HttpURLConnection)request_url.openConnection ();
>>>
>>>               try
>>>               {
>>>                  Parser parser = new Parser ();
>>>                  parser.setConnection(connection);
>>>
>>>                  NodeFilter all_tables = new TagNameFilter("table");
>>>                  NodeList list = parser.parse (all_tables);
>>>                  Node[] nodelist;
>>>
>>>          // Seperate all table tags
>>>                  for (NodeIterator e = parser.elements (); e.hasMoreNodes
>>>();)
>>>                  e.nextNode ().collectInto (list,all_tables);
>>>
>>>                  nodelist=list.toNodeArray();
>>>
>>>                  for (int h=0; h<nodelist.length;h++)
>>>                  {
>>>                       if (nodelist[h] instanceof TableTag)
>>>                       {
>>>                               //for schleife f r die td's und tr's
>>>
>>>if(((TableTag)nodelist[h]).getAttribute("id")!= null)
>>>                               {
>>>                                       for (int i=0; i<nodelist.length;
>>>i++)
>>>                                       {
>>>
>>>out.println(nodelist.toString());
>>>                                               if(nodelist[i] instanceof
>>>TableRow)
>>>                                               {
>>>                                                       out.println("Row
>>>found!");
>>>
>>>((TableRow)nodelist[i]).removeAttribute ("nowrap");
>>>                                               }
>>>                                               else if (nodelist[i]
>>>instanceof TableColumn)
>>>                                               {
>>>                                                       out.println("Column
>>>found!");
>>>
>>>((TableColumn)nodelist[i]).removeAttribute ("nowrap");
>>>                                               }
>>>                                       }
>>>                                       out.println(nodelist[h].toHtml());
>>>                               }
>>>                       }
>>>                       else if(nodelist[h] instanceof TableRow ||
>>>nodelist[h] instanceof TableColumn)
>>>                       {
>>>                               out.println("Else erreicht!");
>>>
>>>out.println(((TableRow)nodelist[h]).getText());
>>>                       }
>>>                  }
>>>                  //makePdf(out,response);
>>>               }
>>>               catch(Exception e)
>>>               {
>>>                       out.println("Fehler beim Parsen!");
>>>                       e.printStackTrace(out);
>>>               }
>>>       }
>>>
>>>Does my nodelist contain the tr and td tags? Is it right to say instanceof
>>>TableRow????
>>>
>>>Many thanks and best regards
>>>Michael
>>>   
>>>
>>>      
>>>
>>--
>>Naveen K Kohli
>>http://www.netomatix.com
>>N?HY޵隊X???'???u???[???????
>>ަ?k??!???W?~?鮆?zk??C?	塧m????@^ǚ??^??z?Z?f?z?j?!?x2??? ????ɫ,???
>>    
>>
>a{ ??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b?	b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+??޷?b????+-?w??f??????ser=
>  
>
>
>
>
>-------------------------------------------------------
>This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
>for problems?  Stop!  Download the new AJAX search engine that makes
>searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
>http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>N?HY޵隊X???'???u???[???????
>ަ?k??!???W?~?鮆?zk??C?	塧m????@^ǚ??^??z?Z?f?z?j?!?x2???????ɫ,???a{??,?H??4?m???i?(??ܢo?v'??jYhr'ׯ:?rX??{f????????j)b?	b???ZZ?ǫ?ǫ?+-??.?ǟ????a??l??b??,???y?+??޷?b????+-?w??f??????ser=
>