Re: [Podofo-users] podofo vs. libpoppler: What should a pdf-parsing application use?
A PDF parsing, modification and creation library.
Brought to you by:
domseichter
|
From: Craig R. <cr...@po...> - 2007-02-02 18:43:09
|
Frank K=FCster wrote: > Switching to libpoppler would be much easier in terms of code changes, > since it is a xpdf fork, but it has other drawbacks: Most importantly, > the C-only API/ABI is not well-defined and therefore unstable. PoDoFo's API doesn't change very fast, but it *is* changing. 0.4.0 is=20 not source or binary compatible with current SVN due to the removal of=20 some previously public symbols (including the filter implementations),=20 the restructuring of the PDF parser code, a total rewrite of delayed=20 object loading, and major alterations to data input/output streams. It's trivial to bring code up to date with the changes between 0.4.0 and=20 SVN, but supporting multiple PoDoFo versions in the same codebase would=20 *NOT* be attractive. Personally I don't think there's much chance the API will be frozen any=20 time soon (too many fairly high priority TODO items _require_ API=20 changes), so from a stability viewpoint it's not especially attractive. As for the ABI, that's completely in flux at present. I expect some=20 significant work will need to be done before it's possible to even=20 consider trying to present a stable ABI for PoDoFo. Ensuring a stable=20 ABI for a C++ library is _hard_ at the best of times (one of the few=20 really good efforts I've seen is TrollTech's Qt, and even they make=20 mistakes) - and PoDoFo just isn't written with that as a significant=20 consideration. On top of this, I don't think 0.4.0 is really solid enough to use as the=20 foundation for a long-term stable branch, so the best they could be=20 offered right now is probably svn (which also has a few known issues). From the stability perspective it seems that PoDoFo is probably no=20 better than libpoppler, in that unfortunately neither can presently=20 offer a stable API or ABI. PoDoFo could be used by maintaining an=20 in-tree copy, but that's not an especially attractive way to work and=20 almost inevitably leads to repeated minor forks followed by annoying and=20 long-put-off merges. *cough*XFree86*cough*. I don't envy the PdfTeX guys, since I honestly don't think there's an=20 obvious best choice here. I'd like to believe that PoDoFo will be the=20 best choice down the track given its goals and design, but right now ...=20 I'm really not sure what their best option is. I can tell them that no=20 matter which library they choose they'll probably want to use=20 PoDoFoBrowser when debugging their PDF ;-) > Although > poppler developers are open to provide a C-only, non-rendering library > with properly defined API, there doesn't seem to be anyone around who > will actually write the code and documentation. And the poppler API > isn't documented well at all, anyway. >=20 > Therefore I wonder whether you consider podofo as an alternative, for > production use and multiplatform (including Windows and OS/2). PoDoFo SVN builds on Windows and Linux (it should handle any semi-modern=20 *nix fine). On Windows it's compatible with MSVC++ 8 and MinGW (cygwin should also=20 work with a little bit of tweaking). I haven't tried older MSVC++=20 versions, but the build system supports them and I'd love to hear of=20 reports. I would not expect problems. I've been trying to get it to build on Mac OS X but there's some broken=20 Mac specific font code in there that I need Leonard's help with. I'm=20 hoping this can be sorted out before 0.5.0 so PoDoFo will build on the=20 big three platforms and their variants. Given this, cross-platform portability will not be a problem. > What pdftex does with the xpdf code currently has been described by its > maintainer:=20 >=20 > ,---- > | pdfTeX uses xpdf to include pages from pdfs > | into the pdfs it produces, so we need everything to be able to parse, > | copy and manipulate the document, it's pages and their objects. > `---- PoDoFo suits that description perfectly - in fact, it's close to a=20 description of the intended purpose of the library. It has the advantage=20 of being quite well documented and not tied to the demands of a=20 renderer, so things in the library are actually fairly easy to extend,=20 improve and fix. Perhaps most importantly, it provides built-in support for loading=20 documents, modifying them in-memory, and saving them. It also supports=20 extracting pages and merging documents, though there are some=20 indications a bit more debugging is needed in those areas. On the other hand, I don't think it has anywhere near as much developer=20 manpower behind it as libpoppler and is less widely deployed and tested. -- Craig Ringer |