Share

Xournal

Tracker: Patches

5 Poppler PDF rendering - ID: 2824175
Last Update: Settings changed ( auroux )

This patch uses the Poppler pdf library to render pdf's for xournal.
Perhaps
the most noticeable improvements are that it fixes some off-by-one errors
in
the original code that made pdf's look fuzzy, is a lot faster, and consumes
a
lot less memory. In the long term, using Poppler is desirable because it
enables text searching, and hopefully someday storing annotations in the
pdf.

Additionally it strips out the asychronous pdf rendering code which called
pdftoppm as an external process. (poppler is many times faster than
pdftoppm
and can render on-the-fly in reasonable time) Instead it renders one page
at a
time, when the page is viewed, the same as evince. This additionally
reduces
memory consumption drastically when viewing large pdf's. (I have some 800
page
textbooks as pdfs that I would like to annotate, but they cause xournal to
use
~ 1GB of memory to open them)

This patch is not finished, but it now works well enough that no
functionality
is lost. In particular before incorporating to xournal I want to:

* Simplify the bgpdf code. In particular the BgPdfPage struct is not
needed
anymore and the BgPdf struct can be simplified.

* Use is_visible to figure out which pages need to be rendered.

* Free memory of pages that are not visible.

* There is probably some logic error if you try to load more than one pdf
file
successively in a xournal session.

I am indebted to Mike Ter Louw for creating the original version of this
patch.

If you test this patch, please report your experience and any bugs you
encounter!


Bob McElrath ( mcelrath ) - 2009-07-20 10:05

5

Closed

None

Nobody/Anonymous

None

None

Public


Comments ( 19 )




Date: 2009-10-03 02:07
Sender: aurouxProject Admin

Ok, release 0.4.5 is out and it includes a modified version of this patch.
It's still missing a mechanism to limit the amount of RAM taken, but so
does every single piece of code out there for the time being => it'll wait
until next release.

I'm closing this patch, but you can reopen it (or open a bug about the
memory usage issue, though I'm already aware of it) if you feel that it is
necessary.

Denis


Date: 2009-09-05 07:03
Sender: nobody

Poppler's speed depends on the hardware and on the content of the pdf file.
To scroll a scan from Hume's Enquiry
(http://www.archive.org/details/enquiryconcernin00humerich) in particular,
I have to wait 5 seconds for every page to load. I guess it would be longer
if I used my Eee PC (downclocked to 600Mhz) or a Nokia tablet. So, I think
preloading does make a lot of sense.

Btw, this new update changes everything, now that I can annotate the 300th
page without loading the 299 pages which precede it. And how timely, school
begins in a few days. Thanks!


Date: 2009-09-01 01:24
Sender: aurouxProject Admin

Hi Bob,

1. Names of the options: I agree they're not always clear, and will need
some rationalizing. Your suggestions are overall pretty good; I have a high
level of inertia and it takes me time to get used to new ideas, but expect
some of your advice to slowly make it into future releases. Regarding the
inappropriate explanation of the "progressive backgrounds" option in the
config file, in fact I recently changed it to "just-in-time update of page
backgrounds", but the comments don't get updated when one re-saves an
existing config file.

2. Crash by out-of-memory: this is an issue even without "progressive", so
I'll need to add a memory limit. A single page at 10x zoom level already
occupies about 150 MB of memory... (and needs about 3 times that to
generate via poppler)

3. Poppler might be fast enough for you, but it definitely isn't anywhere
near real-time (sure, it's faster than pdftoppm, no question). Ideally the
rendering would be done completely in background, as was the case with the
old pdftoppm code; the right way would clearly be to use threads, and I'm
not too confident that I can get it right on the first attempt, so it'll
have to be done later.

4. About the various ways of opening PDF files: I really don't want
File-Open to show PDFs (or other files that could be annotated) unless one
specifically asks for them. It's a recipe for confusion, causing you to
open a PDF when you wanted to open a XOJ with the same name (eg a XOJ that
annotates the PDF, or a XOJ that was exported to PDF) and end up with
multiple incompatible versions of your work files. So, officially File-Open
does not open PDFs, and the only way is "Annotate PDF". (Of course, as any
good bureaucratic system, Xournal is actually more flexible than its
official policy).

5. As far as I know, poppler does not generate PDF code or render anything
to PDF. It can generate PS code, and could be used to print to PDF via
poppler + cairo + gtkprint, but that's neither elegant nor safe. Along the
same lines: where did you find out that poppler can load more than PDF (I
think you mentioned PS and SVG)? As far as I know, all it can read is PDF,
and all it outputs is cairo, pixbuf, or PS (except perhaps a very limited
manipulation of PDF forms). Please let me know if I'm wrong.

Denis



Date: 2009-08-30 22:15
Sender: mcelrath

Hi Denis,

The majority of the options in the menus never have been obvious to me.
(And I suspect to most users) The config file says: "progressive scaling
of bitmap backgrounds" which also does not indicate that it does what you
just wrote. Sorry for my confusion here, but I doubt I'm the only one.

I think if the program crashes, regardless of the user's settings, this is
clearly a failure case. Is there ever a good reason *not* to operate in
the "progressive background" mode you indicate? The only thing I can think
is that it slightly improves speed when scrolling, but poppler is so fast I
don't think this is much of a concern. Perhaps a better option would be
"number of pages to pre-render", with a reasonable default. In any case,
there should be some kind of bound on memory usage that does not cause it
to crash. (Indeed with "progressive backgrounds" checked, it behaves as I
expect) Similarly, if a user scrolls through an entire long document in
progressive mode, it should free pages also, so as to prevent the
crash-by-memory-overuse error.

If not simply speed, what is the use case where disabling "progressive
backgrounds" is desirable?

On #4, I don't think anyone has a clue what the difference is among these
three options. How about a Journal->Paper Style->Image option instead,
combined with a single "File open" that can open anything?

And why is printing hard? Poppler should be able to render any of these
formats to pdf. (I haven't looked at this at all...)



Date: 2009-08-30 21:53
Sender: aurouxProject Admin

Hi Bob,

1) That's because you insist on disabling "progressive backgrounds" which
is a default option and GOOD. The point of that option is precisely to load
the background files as they become visible. Please enable this option and
see if it does what you want.

2) Likewise.

3) Agreed. Xournal wouldn't quite know yet what to do with those formats
though when it comes to printing/exporting. Another imminent change is the
switch from libgnomeprint to gtkprint, which will allow the use of poppler
to render backgrounds for printing in a sensible manner. After that's done,
PS and SVG will make more sense. I'm not quite sure that SVG is something
everyone would want to annotate (but it can be done anyway).

4) File->Open and FIle->Annotate PDF differ in principle; it is only for
convenience that File->Open can open a PDF at all, it's not really what one
should be doing. Journal->Load Background is meant for loading bitmap
backgrounds and I agree it shouldn't be offering to load PDFs anymore.

"Antialiased bitmaps" will disappear soon. "Progressive backgrounds" *is*
important, see above. (Both behaviors are useful in different use cases,
though I expect most people will want it turned on).

5) I don't think bgpdf is responsible for extra memory use. It only
contains GdkPixbufs that are already loaded into the gnome-canvas, so no
extra memory should be taken. On the other hand it saves tons of memory
when a same PDF page is present multiple times, e.g. if you're using a
one-page PDF file as your favorite custom paper pad and writing on 50 pages
of that same paper.

Please let me know if this makes sense. I agree points #3 and #4 will
require more work, though The Right Thing is not yet clear enough in my
mind to make it into the next release.

Thanks!
Denis



Date: 2009-08-29 05:37
Sender: mcelrath

Great!!! You also fixed a couple bugs I was planning to attack this
weekend. ;)

1) You seem to be loading the entire document at startup. This is baaad
as it will crash if you load a document that's too long. Better to only
load a page or two ahead of the current viewing position.

2) Zooming in will also cause a crash (because it will re-render the
entire document at the larger size and fill memory)

3) Poppler can load a lot more than PDF but it isn't enabled. (PS, SVG,
etc)

4) We should consolidate the menus: There are three ways to load a pdf
(File->Open, File->Annotate PDF, and Journal->Load Background). The last
one seems broken anyway. Along with #3 we should just have "Open" and add
all the file formats xournal/poppler support to the dialog. Furthermore I
think with poppler "Antialiased bitmaps" and "progressive backgrounds" are
obsolete.

5) You kept bgpdf, which causes the crashes in #1 and #2. :(



Date: 2009-08-29 01:06
Sender: aurouxProject Admin

I've incorporated poppler code (loosely based on this patch) into the CVS
head. Please test prior to wider release. Should be stable and not have any
major regressions compared to either 0.4.2.1 or your patch.
Thanks!
Denis


Date: 2009-08-15 21:59
Sender: arose

Uncheking Antialiased Bitmaps and Progressive Backgrounds helped with that,
there is however a similar issue: opening a PDF in a session where a PDF
has previously been opened results in a gray document with a busy cursor,
then clicking on the document results in a segfault.


Date: 2009-08-13 21:56
Sender: nobody

Make sure "Antialiased Bitmaps" is unchecked.


Date: 2009-08-13 21:50
Sender: nobody

I seem to be having a show stopper here on Ubuntu 9.04, both zooming and
opening a .xoj that's an annotated pdf brake xournal, I get a gray document
with a cursors that shows that it's busy occasionally but never displays
the document.


Date: 2009-07-26 15:53
Sender: mcelrath

Simplifying bgpdf is making my head hurt. We have many functions with
overlapping and duplicated duties:
init_pdf
rescale_bg_pixmaps
update_page_stuff
update_bg
update_canvas_bg
render_pdf_page
create_page_with_bg

I think I can replace these with just two functions:
open_bg (detects file type ps/pdf/svg/xoj and does things from
init_pdf)
update_bg (calls poppler to render if view or zoom has changed,
replacing
update_canvas_bg, render_pdf_page, rescale_bg_pixmaps)
In the process removing the BgPdf data structure and global, as well as
BgPdfPage, in favor of keeping everything in struct Background. The
open_bg is
also intended to unify the two menu options "Annotate PDF" and "Open
Journal"
in favor of just "open" which can do both, as well as annotate other file
types
poppler is capable of loading.

Thoughts?

Yes, mostly I'm whining because I spent most of the day staring at this
code,
now my tree is broken, and it just made my head hurt. I think I'll have
to
start over. Maybe I'm trying to do too much at once.



Date: 2009-07-25 15:27
Sender: schmolch

Since i applied the patch i am no longer able use "Attach file to xournal"
when i open a pdf:

sascha@hund:~/code/xournal-test$ LANG=C xournal

(xournal:18062): Gtk-WARNING **: GtkSpinButton: setting an adjustment with
non-zero page size is deprecated

(xournal:18062): Gtk-WARNING **: GtkSpinButton: setting an adjustment with
non-zero page size is deprecated

(xournal:18062): Gtk-WARNING **: GtkSpinButton: setting an adjustment with
non-zero page size is deprecated

(xournal:18062): GLib-WARNING **: GError set over the top of a previous
GError or uninitialized memory.
This indicates a bug in someone's code. You must ensure an error is NULL
before it's set.
The overwriting error message was: The pathname 'bg.pdf' is not an
absolute path
Segmentation fault (core dumped)



Date: 2009-07-25 14:28
Sender: schmolch

No, thats exactly it. If i try to load another PDF in the same session it
often fails.



Date: 2009-07-25 14:27
Sender: mcelrath

I'm not aware of pdf's it fails to load. I'm aware it fails if you try to
load a pdf in the same session (have to fix that...). But if you have a
different bug, could you let me know how it is failing and/or send me the
pdf/xoj that is causing trouble?


Date: 2009-07-25 13:37
Sender: schmolch

I am very excited about this patch.
I tried it and it works very well.
The pdfs look great, display fast and use littly memory, just like in
evince.
I had it not load one or two pdfs and segfault after a while but still, i
love where this is going and hope that xournal soon will have this
built-in.

Thank you to everyone for their great work!



Date: 2009-07-21 09:13
Sender: mcelrath

I think both "Antialiased Bitmaps" and "Progressive Backgrounds" should go
away. The GnomeCanvas is antialiased itself, whether we want it or not.
Do you agree? If not, I need to understand better what they do...


Date: 2009-07-21 08:13
Sender: aurouxProject Admin

Quick note: this patch only seems to work reasonably well if you disable
"Progressive backgrounds" in the Options menu (if you don't see that
option, first uncheck "Shorten menus"). If you experience lots of slowness
and refresh bugs, this is the cause.

Denis


Date: 2009-07-21 06:53
Sender: ajray

Two things:
1) I had to copy the trunk 'xournal' directory twice into xournal-test and
xournal-poppler. I have no idea which is which.

2) the patch failed against xournal-test on the configure.in script. Its
an easy fix (there's now a conditional on what I think is the Maemo build,
so it was easy to spot and put in by hand).

I'm not noticing any big improvements, but I just checked against a nice
clean LaTeX book i was reading, which renders nicely everywhere. I also
have subpixel rendering turned on (which might be improving it some too).
And I forgot to check memory usage (whoops).


Date: 2009-07-20 22:11
Sender: mcelrath

Added version 0.0.6, which fixes a g_object_unref error, removes some
extraneous debug output, and actually renders the last page of a pdf file.


Log in to comment.




Attached File ( 1 )

Filename Description Download
poppler-pdf-rendering-0.0.6.patch Poppler PDF rendering Download

Changes ( 5 )

Field Old Value Date By
close_date - 2009-10-03 02:07 auroux
status_id Open 2009-10-03 02:07 auroux
File Deleted 335762: 2009-07-20 22:12 mcelrath
File Added 335842: poppler-pdf-rendering-0.0.6.patch 2009-07-20 22:12 mcelrath
File Added 335762: poppler-pdf-rendering-0.0.5.patch 2009-07-20 10:05 mcelrath