Menu

<input> not getting parsed inside table

Help
2003-10-17
2003-10-17
  • Kedar Panse

    Kedar Panse - 2003-10-17

    I am using html paser v 1.3 for parsing html.  For some reason, its ignoring the <input> tags inside table.  Is something I am missing??
    Program, sample file and output are like:

    <pre>
    Java Program:

    import org.htmlparser.tags.*;
    import org.htmlparser.Node;
    import org.htmlparser.Parser;
    import org.htmlparser.util.*;
    import org.htmlparser.tags.*;
    import org.htmlparser.scanners.*;
    import java.util.*;

    public class HTMLParserbugtest{
       
       
        private Parser parser;
       
       
       
       
       
        public static void main(String[] args){
            if(args.length>0){
            HTMLParserbugtest parsetest=new HTMLParserbugtest();
            parsetest.IntializeParser(args[0]);
                }
                else{
                    System.out.println("Give me something to parse");
                    }
           
            }
       
       
       
       
       
       
       
        //Initialize the html parser   
    private void IntializeParser(String location){
            try{
           
            this.parser = new Parser(location,null);
            this.parser.registerScanners();
            this.extractInputTags();
           
            }
            catch(ParserException pe){
                        pe.printStackTrace(System.err);
                    }
           
        }
       
       
    //extract from the html what we want
    private void extractInputTags(){
           
                   
            try{
                      
                      
                for(NodeIterator nodes= this.parser.elements();nodes.hasMoreNodes();){
                    Node node=nodes.nextNode();
                           
                    if(node instanceof FormTag){
                       
                    FormTag formTag=(FormTag)node;
                        String formName=formTag.getFormName();
                       String formAction=formTag.getFormLocation();

                    //if the form doesn't have any name lets assumesome
                    if((formName == null) || (formName.equals(""))){
                        formName="MYFORMWITHNONAMEONIT";
                        }
                    System.out.println("Parsing form "+formName);
                    boolean firstpage=false;
                    NodeList inputtags=formTag.getFormInputs();
                   
                    NodeIterator inputnodes=inputtags.elements();
                    while(inputnodes.hasMoreNodes()){
                        InputTag inputtag=(InputTag)inputnodes.nextNode();
                       
                        Hashtable mytable=inputtag.getAttributes();
                        String name=(String)mytable.get("NAME");
                        String value=(String)mytable.get("VALUE");
                        String type=(String)mytable.get("TYPE");
                       
                        System.out.println("Name: "+name+" Value: "+value);
                        if(type.equalsIgnoreCase("hidden")){
                           
                            if(null != value){
                                    System.out.println("HIDDEN Name: "+name+" Value: "+value);
                                    }
                            }
                        }
                       
                        }
                       
                        if(node instanceof LinkTag){
                            LinkTag linkTag=(LinkTag)node;
                              if(linkTag.isHTTPLikeLink())
                              {
                              String linktext=linkTag.getLinkText();
                              String link=linkTag.getLink();
                            System.out.println("Link text: "+linktext+" Link: "+link);
                           
                              }
                }               
                    }
           
            }//End try
            catch(ParserException pe){
                pe.printStackTrace(System.err);
                }
           
            }
        }
    ===================================
    HTML File:

    <html>
    <body>
    <form action="/cgi-bin/test.pl" method="post">
    <table><tr><td>
    <INPUT type=hidden NAME="test1" VALUE="insidetable">
    </td></tr>
    </table>

    <INPUT type=hidden NAME="Test2" VALUE="outsidetable">
    <INPUT type=hidden name="a" value="b">
    </form>
    </body>
    </html>
    ==================================
    Output:

    Parsing form MYFORMWITHNONAMEONIT
    Name: Test2 Value: outsidetable
    HIDDEN Name: Test2 Value: outsidetable
    Name: a Value: b
    HIDDEN Name: a Value: b

    </pre>

    </pre>

     
    • Derrick Oswald

      Derrick Oswald - 2003-10-17

      Looks like a bug, although the code looks correct.
      The input tags come from a recursive examination of all the children:

              this.formInputList = compositeTagData.getChildren().searchFor(InputTag.class, true);

      That second argument of 'true' says recursive.

      I find in most of these anomalous cases the table is not correctly formed and the row or column has consumed too little or too much, but in this case it looks too simple to have screwed up.

      As a workaround, try using just a bald "Node [] list = parser.extractAllNodesThatAre(InputTag.class);".

      In any case it looks like you should file a bug report.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.