Re: [Podofo-users] podofo vs. libpoppler: What should a pdf-parsing application use?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Frank K=FCster wrote:

> Switching to libpoppler would be much easier in terms of code changes,
> since it is a xpdf fork, but it has other drawbacks: Most importantly,
> the C-only API/ABI is not well-defined and therefore unstable.

PoDoFo's API doesn't change very fast, but it *is* changing. 0.4.0 is=20
not source or binary compatible with current SVN due to the removal of=20
some previously public symbols (including the filter implementations),=20
the restructuring of the PDF parser code, a total rewrite of delayed=20
object loading, and major alterations to data input/output streams.

It's trivial to bring code up to date with the changes between 0.4.0 and=20
SVN, but supporting multiple PoDoFo versions in the same codebase would=20
*NOT* be attractive.

Personally I don't think there's much chance the API will be frozen any=20
time soon (too many fairly high priority TODO items _require_ API=20
changes), so from a stability viewpoint it's not especially attractive.

As for the ABI, that's completely in flux at present. I expect some=20
significant work will need to be done before it's possible to even=20
consider trying to present a stable ABI for PoDoFo. Ensuring a stable=20
ABI for a C++ library is _hard_ at the best of times (one of the few=20
really good efforts I've seen is TrollTech's Qt, and even they make=20
mistakes) - and PoDoFo just isn't written with that as a significant=20
consideration.

On top of this, I don't think 0.4.0 is really solid enough to use as the=20
foundation for a long-term stable branch, so the best they could be=20
offered right now is probably svn (which also has a few known issues).

 From the stability perspective it seems that PoDoFo is probably no=20
better than libpoppler, in that unfortunately neither can presently=20
offer a stable API or ABI. PoDoFo could be used by maintaining an=20
in-tree copy, but that's not an especially attractive way to work and=20
almost inevitably leads to repeated minor forks followed by annoying and=20
long-put-off merges. *cough*XFree86*cough*.

I don't envy the PdfTeX guys, since I honestly don't think there's an=20
obvious best choice here. I'd like to believe that PoDoFo will be the=20
best choice down the track given its goals and design, but right now ...=20
  I'm really not sure what their best option is. I can tell them that no=20
matter which library they choose they'll probably want to use=20
PoDoFoBrowser when debugging their PDF ;-)

> Although
> poppler developers are open to provide a C-only, non-rendering library
> with properly defined API, there doesn't seem to be anyone around who
> will actually write the code and documentation.  And the poppler API
> isn't documented well at all, anyway.
>=20
> Therefore I wonder whether you consider podofo as an alternative, for
> production use and multiplatform (including Windows and OS/2).

PoDoFo SVN builds on Windows and Linux (it should handle any semi-modern=20
*nix fine).

On Windows it's compatible with MSVC++ 8 and MinGW (cygwin should also=20
work with a little bit of tweaking). I haven't tried older MSVC++=20
versions, but the build system supports them and I'd love to hear of=20
reports. I would not expect problems.

I've been trying to get it to build on Mac OS X but there's some broken=20
Mac specific font code in there that I need Leonard's help with. I'm=20
hoping this can be sorted out before 0.5.0 so PoDoFo will build on the=20
big three platforms and their variants.

Given this, cross-platform portability will not be a problem.

> What pdftex does with the xpdf code currently has been described by its
> maintainer:=20
>=20
> ,----
> | pdfTeX uses xpdf to include pages from pdfs
> | into the pdfs it produces, so we need everything to be able to parse,
> | copy and manipulate the document, it's pages and their objects.
> `----

PoDoFo suits that description perfectly - in fact, it's close to a=20
description of the intended purpose of the library. It has the advantage=20
of being quite well documented and not tied to the demands of a=20
renderer, so things in the library are actually fairly easy to extend,=20
improve and fix.

Perhaps most importantly, it provides built-in support for loading=20
documents, modifying them in-memory, and saving them. It also supports=20
extracting pages and merging documents, though there are some=20
indications a bit more debugging is needed in those areas.

On the other hand, I don't think it has anywhere near as much developer=20
manpower behind it as libpoppler and is less widely deployed and tested.

--
Craig Ringer

Re: [Podofo-users] podofo vs. libpoppler: What should a pdf-parsing application use?

A PDF parsing, modification and creation library.

Re: [Podofo-users] podofo vs. libpoppler: What should a pdf-parsing application use?