Menu

Parsing php

octav
2003-10-27
2003-10-27
  • octav

    octav - 2003-10-27

    I need to parse some php files, so i made a scanner for php :
    import org.htmlparser.tags.Tag;
    import org.htmlparser.tags.data.TagData;
    import org.htmlparser.util.ParserException;
    import org.htmlparser.scanners.TagScanner;

    public class PhpScanner extends TagScanner {

        public PhpScanner() {
            super();
        }

        public PhpScanner(String filter) {
            super(filter);
        }

        public String [] getID() {
            String [] ids = new String[3];
            ids[0] = "?";
            ids[1] = "?=";
            ids[2] = "?php";
            return ids;
        }

        protected Tag createTag(TagData tagData, Tag tag, String url)
            throws ParserException {
            String tagContents = tagData.getTagContents();
            tagData.setTagContents(tagContents.substring(1,tagContents.length()-1));
            return new PhpTag(tagData);
        }

    }

    and a PhpTag class :

    import org.htmlparser.tags.Tag;
    import org.htmlparser.tags.data.TagData;

    public class PhpTag extends Tag
    {
       public PhpTag(TagData tagData)
        {
            super(tagData);
        }

        public String toHtml() {
            return "<?"+tagContents+"?>";
        }

        public String toString()
        {
            return "Php Tag : "+tagContents+"; begins at : "+elementBegin()+"; ends at : "+elementEnd();
        }
    }

    then i tried to register the php scanner without success, any php node in the document was seen like a StringNode...
    Donno if this is a bug but the only workaround that i found is modifying
    NodeReader.java like this :
    if ('/' == ch || '%' == ch || Character.isLetter (ch) || '!' == ch)
    was modified with
    if ('/' == ch || '%' == ch || '?' == ch || Character.isLetter (ch) || '!' == ch)
    the same modification was made to StringParser.java in the same beginTag method and now it looks ok, i can register my phpscanner..
    Could all this be made in other way ?

     
    • Derrick Oswald

      Derrick Oswald - 2003-10-27

      I think that's the only way. The low level NodeReader needs to parse it first before your scanner sees it.

       
    • octav

      octav - 2003-10-27

      Ty for the answer, due to the large comunity of php developers, i think that the posibility to parse php should be added by default, anyway good job ... i am happy i found this parser.

       
    • octav

      octav - 2003-10-27

      I still have a problem i cannot parse  "<?php" it looks like the scanner hashmap is somehow altered and  "<?php" is not seen as a scanner anymore... i dunno what is happening... is there someone to help me ... i really don't have too much time for this one.Thanx.

       
    • octav

      octav - 2003-10-27

      Me again :) each new scanner that is created have to return thee ids with upercases.. i didn't knew now everything is og so :
      public String [] getID() {
      String [] ids = new String[3];
      ids[0] = "?";
      ids[1] = "?=";
      ids[2] = "?PHP";("?php" is wrong)
      return ids;
      }

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.