Menu

Home

Thomas Sparber

Introduction

I think the best way to explain a library is by using an example:

~~~~~~~~

include <htmlparser.h>

include <htmlelement.h>

include <string>

include <iostream>

using namespace std;

const string page =
"<html>"
" <head>"
" Testpage"
" </head>"
" <body>"
"

Big title

"
"

Here I forgot the closing tag

"
"

And here I used the wrong closing type"
" "
"

Very special characters: <![CDATA[<>&!]]>
new line

"
"

This should be displayed inline.

"
" </body>"
"</html>";

bool checkElement(const HTMLelement &e)
{
if(e.name == "p")return true;
return false;
}

int main(int argc, char **args)
{
HTMLparser p;
HTMLelement e;
p.parse(page, e);

//getFormattedString removes all the tags
//And represents the page in pure text
cout<<e.getFormattedString()<<endl;

//It is also possible to print only certain elements
cout<<e.getFormattedString(&checkElement)<<endl;

cout<<e.createHTML()<<endl;

}
~~~~~~

Output:

~~~~~~~
Testpage

Big title
Here I forgot the closing tag
And here I used the wrong closing type
Very special characters: <>&!
new line
This should be displayed inline.

Here I forgot the
And here I used the wrong closing type
Very special characters: <>&!
new line
should be displayed inline.

<html> <head> Testpage </head> <body>

Big title

Here I forgot the closing tag

And here I used the wrong closing type

Very special characters: <![CDATA[<>&!]]>

new line

This should be displayed inline.

</body></html>
~~~~~

Working with HTMLelement

If you look at the Header file htmlelement.h you can see that it basically provides five properties:

  • type: Is a enum which can be:
    • tag: This means the HTMLelement is a normal tag such as <div\> <a\> <script\> <h1\>...
    • text: The HTMLelement is a text. This means, Content will contain the text value.
    • cdata: This is similar to text, except that it can contain characters which are allowed for cdata
  • name: A string with the Name of the tag. This is only valid if type == tag
  • metaData: A std::map of string which contains the metadata. e.g. <a href=".."\> --> metaData would contain the href.
  • content: As described above, this string contains the text if the tag type is text or cdata
  • children: This std::list contains the HTMLelement-children

Yes, it really is so simple! :-)


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.