PDF contains Page layout information, while HTML (what a browser reads) is content-flow oriented. A full page layout analysis is required to transform a Page Layout format into an HTML document. This pdf2xml does only provide the first step.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
PDF contains Page layout information, while HTML (what a browser reads) is content-flow oriented. A full page layout analysis is required to transform a Page Layout format into an HTML document. This pdf2xml does only provide the first step.