How can I detect if h1..h6 tags are read? I`m currently writing a searchengine (on top of lucine) and I need to access those fields. The reason I need to know if the text is within a h1..h6 tag, is because those texts are more important (usually) and the searchengine can adjust the searchresult.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you have the text node and want to find if it's enclosed in a <H1></H1> tag, you could walk up the parent chain checking for the H1 tag, something like.
Tag tag; // this is the tag you are checking
boolean inH1;
inH1 = false;
while (!inH1 && (null != tag.getParent()))
if ("H1".equals (tag.getParent().getTagName()))
inH1 = true;
else
tag = tag.getParent();
If you haven't got the tag already and want to get everything enclosed in a <H1></H1> tag, use a filter, something like:
NodeList list = parser.extractAllNodesThatMatch(new HasParentFilter (new TagNameFilter("H1")));
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How can I detect if h1..h6 tags are read? I`m currently writing a searchengine (on top of lucine) and I need to access those fields. The reason I need to know if the text is within a h1..h6 tag, is because those texts are more important (usually) and the searchengine can adjust the searchresult.
If you have the text node and want to find if it's enclosed in a <H1></H1> tag, you could walk up the parent chain checking for the H1 tag, something like.
Tag tag; // this is the tag you are checking
boolean inH1;
inH1 = false;
while (!inH1 && (null != tag.getParent()))
if ("H1".equals (tag.getParent().getTagName()))
inH1 = true;
else
tag = tag.getParent();
If you haven't got the tag already and want to get everything enclosed in a <H1></H1> tag, use a filter, something like:
NodeList list = parser.extractAllNodesThatMatch(new HasParentFilter (new TagNameFilter("H1")));