HTML Parser / Discussion / Open Discussion: Parsing php

Parsing php

Forum: Open Discussion

Creator: octav

Created: 2003-10-27

Updated: 2003-10-27

octav - 2003-10-27

I need to parse some php files, so i made a scanner for php :
import org.htmlparser.tags.Tag;
import org.htmlparser.tags.data.TagData;
import org.htmlparser.util.ParserException;
import org.htmlparser.scanners.TagScanner;

public class PhpScanner extends TagScanner {

    public PhpScanner() {
        super();
    }

    public PhpScanner(String filter) {
        super(filter);
    }

    public String [] getID() {
        String [] ids = new String[3];
        ids[0] = "?";
        ids[1] = "?=";
        ids[2] = "?php";
        return ids;
    }

    protected Tag createTag(TagData tagData, Tag tag, String url)
        throws ParserException {
        String tagContents = tagData.getTagContents();
        tagData.setTagContents(tagContents.substring(1,tagContents.length()-1));
        return new PhpTag(tagData);
    }

}

and a PhpTag class :

import org.htmlparser.tags.Tag;
import org.htmlparser.tags.data.TagData;

public class PhpTag extends Tag
{
   public PhpTag(TagData tagData)
    {
        super(tagData);
    }

    public String toHtml() {
        return "<?"+tagContents+"?>";
    }

    public String toString()
    {
        return "Php Tag : "+tagContents+"; begins at : "+elementBegin()+"; ends at : "+elementEnd();
    }
}

then i tried to register the php scanner without success, any php node in the document was seen like a StringNode...
Donno if this is a bug but the only workaround that i found is modifying
NodeReader.java like this :
if ('/' == ch || '%' == ch || Character.isLetter (ch) || '!' == ch)
was modified with
if ('/' == ch || '%' == ch || '?' == ch || Character.isLetter (ch) || '!' == ch)
the same modification was made to StringParser.java in the same beginTag method and now it looks ok, i can register my phpscanner..
Could all this be made in other way ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2003-10-27
  
  I think that's the only way. The low level NodeReader needs to parse it first before your scanner sees it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- octav - 2003-10-27
  
  Ty for the answer, due to the large comunity of php developers, i think that the posibility to parse php should be added by default, anyway good job ... i am happy i found this parser.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- octav - 2003-10-27
  
  I still have a problem i cannot parse "<?php" it looks like the scanner hashmap is somehow altered and "<?php" is not seen as a scanner anymore... i dunno what is happening... is there someone to help me ... i really don't have too much time for this one.Thanx.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- octav - 2003-10-27
  
  Me again :) each new scanner that is created have to return thee ids with upercases.. i didn't knew now everything is og so :
  public String [] getID() {
  String [] ids = new String[3];
  ids[0] = "?";
  ids[1] = "?=";
  ids[2] = "?PHP";("?php" is wrong)
  return ids;
  }
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Parsing php

Forums

Help

Parsing php

Parsing php

Forums

Help

Parsing php document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Parsing php