Hey,
I'm using htmlparser to cut newspaperarticles out of webpages.
But there occurred one problem using the collectInto() of the Tag class.
All Nodes in the created NodeList are childless!
Why do the nodes loose their children? Is it a bug or a feature?
All I want is to pick some <p> tags out and get the complete textcontent out of them.
But without children there is no Text in them.
I hope someone can help me.
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey,
I'm using htmlparser to cut newspaperarticles out of webpages.
But there occurred one problem using the collectInto() of the Tag class.
All Nodes in the created NodeList are childless!
Why do the nodes loose their children? Is it a bug or a feature?
All I want is to pick some <p> tags out and get the complete textcontent out of them.
But without children there is no Text in them.
I hope someone can help me.
Thanks