I need to midfy the source code slightly so that <p>, <i> and <b> tags become the parent of another tag if they come before it. Also, I want to add double quote " and single quote ' characters as a tag so that anything written in quotes become the chlid of a new tag something like <dquote> or <squote>.
Which classes should I modify? Where is the hierarcy saved in the classes? Which class states that <p> does not contain (and cannot be parent of) any other tag?
Thanks,
Kemal
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If I understand your requirements correctly, you would need to create three classes for P, I and B that extend CompositeTag (this is the base class of tags that contain other tags). You would then register instances f these with a node factory as shown here: http://htmlparser.sourceforge.net/wiki/index.php/CustomTagLinks
i.e.
PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
factory.registerTag (new MyPTag ());
factory.registerTag (new MyITag ());
factory.registerTag (new MyBTag ());
parser.setNodeFactory (factory);
This may not yield satisfactory results though, since often these tags are not terminated correctly according to the CompositeTagScanner. You may want to play with the TagEnders and EndTagEnders lists a bit.
The quotes are a bit more problematic. You can try playing around with the CompositeTagScanner, and where it would normally return a string node, examine the text to check for quotes and if found change the string node to only the part of the string up to the quote (node.setEndPosition ()) and back up the lexer to the quote start position using lexer.setPosition ().
Then, next call, when the scanner would normally return a string that starts with a quote (this is what you've just set up to happen by doing the above), replace the node it would return with one of your own choosing like <DQUOTE> containing the quoted string and adjust the string and lexer to the end of the quote, like you did for the start of quote above.
Without actually doing it for you I can't be more specific.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I need to midfy the source code slightly so that <p>, <i> and <b> tags become the parent of another tag if they come before it. Also, I want to add double quote " and single quote ' characters as a tag so that anything written in quotes become the chlid of a new tag something like <dquote> or <squote>.
Which classes should I modify? Where is the hierarcy saved in the classes? Which class states that <p> does not contain (and cannot be parent of) any other tag?
Thanks,
Kemal
If I understand your requirements correctly, you would need to create three classes for P, I and B that extend CompositeTag (this is the base class of tags that contain other tags). You would then register instances f these with a node factory as shown here:
http://htmlparser.sourceforge.net/wiki/index.php/CustomTagLinks
i.e.
PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
factory.registerTag (new MyPTag ());
factory.registerTag (new MyITag ());
factory.registerTag (new MyBTag ());
parser.setNodeFactory (factory);
This may not yield satisfactory results though, since often these tags are not terminated correctly according to the CompositeTagScanner. You may want to play with the TagEnders and EndTagEnders lists a bit.
The quotes are a bit more problematic. You can try playing around with the CompositeTagScanner, and where it would normally return a string node, examine the text to check for quotes and if found change the string node to only the part of the string up to the quote (node.setEndPosition ()) and back up the lexer to the quote start position using lexer.setPosition ().
Then, next call, when the scanner would normally return a string that starts with a quote (this is what you've just set up to happen by doing the above), replace the node it would return with one of your own choosing like <DQUOTE> containing the quoted string and adjust the string and lexer to the end of the quote, like you did for the start of quote above.
Without actually doing it for you I can't be more specific.
Hi,
I'm trying to create a new tag for
<p>
so that i can extract it proprely.
I understand that this new tag needs to extend
the CompositeTag class since it has an ending tag
So what i did was
public class PTag extends CompositeTag
{
}
wat r the methods i need to implement in this class
so that is becomes a <p> tag
thanks
See other examples in the tags package, but mostly:
public String[] getIds ()