Menu

#31 Poppler PDF rendering

closed
nobody
None
5
2015-03-02
2009-07-20
No

This patch uses the Poppler pdf library to render pdf's for xournal. Perhaps
the most noticeable improvements are that it fixes some off-by-one errors in
the original code that made pdf's look fuzzy, is a lot faster, and consumes a
lot less memory. In the long term, using Poppler is desirable because it
enables text searching, and hopefully someday storing annotations in the pdf.

Additionally it strips out the asychronous pdf rendering code which called
pdftoppm as an external process. (poppler is many times faster than pdftoppm
and can render on-the-fly in reasonable time) Instead it renders one page at a
time, when the page is viewed, the same as evince. This additionally reduces
memory consumption drastically when viewing large pdf's. (I have some 800 page
textbooks as pdfs that I would like to annotate, but they cause xournal to use
~ 1GB of memory to open them)

This patch is not finished, but it now works well enough that no functionality
is lost. In particular before incorporating to xournal I want to:

* Simplify the bgpdf code. In particular the BgPdfPage struct is not needed
anymore and the BgPdf struct can be simplified.

* Use is_visible to figure out which pages need to be rendered.

* Free memory of pages that are not visible.

* There is probably some logic error if you try to load more than one pdf file
successively in a xournal session.

I am indebted to Mike Ter Louw for creating the original version of this patch.

If you test this patch, please report your experience and any bugs you
encounter!

Discussion

  • Bob McElrath

    Bob McElrath - 2009-07-20

    Added version 0.0.6, which fixes a g_object_unref error, removes some extraneous debug output, and actually renders the last page of a pdf file.

     
  • Bob McElrath

    Bob McElrath - 2009-07-20

    Poppler PDF rendering

     
  • Alex Ray

    Alex Ray - 2009-07-21

    Two things:
    1) I had to copy the trunk 'xournal' directory twice into xournal-test and xournal-poppler. I have no idea which is which.

    2) the patch failed against xournal-test on the configure.in script. Its an easy fix (there's now a conditional on what I think is the Maemo build, so it was easy to spot and put in by hand).

    I'm not noticing any big improvements, but I just checked against a nice clean LaTeX book i was reading, which renders nicely everywhere. I also have subpixel rendering turned on (which might be improving it some too). And I forgot to check memory usage (whoops).

     
  • Denis Auroux

    Denis Auroux - 2009-07-21

    Quick note: this patch only seems to work reasonably well if you disable "Progressive backgrounds" in the Options menu (if you don't see that option, first uncheck "Shorten menus"). If you experience lots of slowness and refresh bugs, this is the cause.

    Denis

     
  • Bob McElrath

    Bob McElrath - 2009-07-21

    I think both "Antialiased Bitmaps" and "Progressive Backgrounds" should go away. The GnomeCanvas is antialiased itself, whether we want it or not. Do you agree? If not, I need to understand better what they do...

     
  • Anonymous

    Anonymous - 2009-07-25

    I am very excited about this patch.
    I tried it and it works very well.
    The pdfs look great, display fast and use littly memory, just like in evince.
    I had it not load one or two pdfs and segfault after a while but still, i love where this is going and hope that xournal soon will have this built-in.

    Thank you to everyone for their great work!

     
  • Bob McElrath

    Bob McElrath - 2009-07-25

    I'm not aware of pdf's it fails to load. I'm aware it fails if you try to load a pdf in the same session (have to fix that...). But if you have a different bug, could you let me know how it is failing and/or send me the pdf/xoj that is causing trouble?

     
  • Anonymous

    Anonymous - 2009-07-25

    No, thats exactly it. If i try to load another PDF in the same session it often fails.

     
  • Anonymous

    Anonymous - 2009-07-25

    Since i applied the patch i am no longer able use "Attach file to xournal" when i open a pdf:

    sascha@hund:~/code/xournal-test$ LANG=C xournal

    (xournal:18062): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-zero page size is deprecated

    (xournal:18062): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-zero page size is deprecated

    (xournal:18062): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-zero page size is deprecated

    (xournal:18062): GLib-WARNING **: GError set over the top of a previous GError or uninitialized memory.
    This indicates a bug in someone's code. You must ensure an error is NULL before it's set.
    The overwriting error message was: The pathname 'bg.pdf' is not an absolute path
    Segmentation fault (core dumped)

     
  • Bob McElrath

    Bob McElrath - 2009-07-26

    Simplifying bgpdf is making my head hurt. We have many functions with
    overlapping and duplicated duties:
    init_pdf
    rescale_bg_pixmaps
    update_page_stuff
    update_bg
    update_canvas_bg
    render_pdf_page
    create_page_with_bg

    I think I can replace these with just two functions:
    open_bg (detects file type ps/pdf/svg/xoj and does things from init_pdf)
    update_bg (calls poppler to render if view or zoom has changed, replacing
    update_canvas_bg, render_pdf_page, rescale_bg_pixmaps)
    In the process removing the BgPdf data structure and global, as well as
    BgPdfPage, in favor of keeping everything in struct Background. The open_bg is
    also intended to unify the two menu options "Annotate PDF" and "Open Journal"
    in favor of just "open" which can do both, as well as annotate other file types
    poppler is capable of loading.

    Thoughts?

    Yes, mostly I'm whining because I spent most of the day staring at this code,
    now my tree is broken, and it just made my head hurt. I think I'll have to
    start over. Maybe I'm trying to do too much at once.

     
  • Nobody/Anonymous

    I seem to be having a show stopper here on Ubuntu 9.04, both zooming and opening a .xoj that's an annotated pdf brake xournal, I get a gray document with a cursors that shows that it's busy occasionally but never displays the document.

     
  • Nobody/Anonymous

    Make sure "Antialiased Bitmaps" is unchecked.

     
  • Artis Rozentāls

    Uncheking Antialiased Bitmaps and Progressive Backgrounds helped with that, there is however a similar issue: opening a PDF in a session where a PDF has previously been opened results in a gray document with a busy cursor, then clicking on the document results in a segfault.

     
  • Denis Auroux

    Denis Auroux - 2009-08-29

    I've incorporated poppler code (loosely based on this patch) into the CVS head. Please test prior to wider release. Should be stable and not have any major regressions compared to either 0.4.2.1 or your patch.
    Thanks!
    Denis

     
  • Bob McElrath

    Bob McElrath - 2009-08-29

    Great!!! You also fixed a couple bugs I was planning to attack this weekend. ;)

    1) You seem to be loading the entire document at startup. This is baaad as it will crash if you load a document that's too long. Better to only load a page or two ahead of the current viewing position.

    2) Zooming in will also cause a crash (because it will re-render the entire document at the larger size and fill memory)

    3) Poppler can load a lot more than PDF but it isn't enabled. (PS, SVG, etc)

    4) We should consolidate the menus: There are three ways to load a pdf (File->Open, File->Annotate PDF, and Journal->Load Background). The last one seems broken anyway. Along with #3 we should just have "Open" and add all the file formats xournal/poppler support to the dialog. Furthermore I think with poppler "Antialiased bitmaps" and "progressive backgrounds" are obsolete.

    5) You kept bgpdf, which causes the crashes in #1 and #2. :(

     
  • Denis Auroux

    Denis Auroux - 2009-08-30

    Hi Bob,

    1) That's because you insist on disabling "progressive backgrounds" which is a default option and GOOD. The point of that option is precisely to load the background files as they become visible. Please enable this option and see if it does what you want.

    2) Likewise.

    3) Agreed. Xournal wouldn't quite know yet what to do with those formats though when it comes to printing/exporting. Another imminent change is the switch from libgnomeprint to gtkprint, which will allow the use of poppler to render backgrounds for printing in a sensible manner. After that's done, PS and SVG will make more sense. I'm not quite sure that SVG is something everyone would want to annotate (but it can be done anyway).

    4) File->Open and FIle->Annotate PDF differ in principle; it is only for convenience that File->Open can open a PDF at all, it's not really what one should be doing. Journal->Load Background is meant for loading bitmap backgrounds and I agree it shouldn't be offering to load PDFs anymore.

    "Antialiased bitmaps" will disappear soon. "Progressive backgrounds" *is* important, see above. (Both behaviors are useful in different use cases, though I expect most people will want it turned on).

    5) I don't think bgpdf is responsible for extra memory use. It only contains GdkPixbufs that are already loaded into the gnome-canvas, so no extra memory should be taken. On the other hand it saves tons of memory when a same PDF page is present multiple times, e.g. if you're using a one-page PDF file as your favorite custom paper pad and writing on 50 pages of that same paper.

    Please let me know if this makes sense. I agree points #3 and #4 will require more work, though The Right Thing is not yet clear enough in my mind to make it into the next release.

    Thanks!
    Denis

     
  • Bob McElrath

    Bob McElrath - 2009-08-30

    Hi Denis,

    The majority of the options in the menus never have been obvious to me. (And I suspect to most users) The config file says: "progressive scaling of bitmap backgrounds" which also does not indicate that it does what you just wrote. Sorry for my confusion here, but I doubt I'm the only one.

    I think if the program crashes, regardless of the user's settings, this is clearly a failure case. Is there ever a good reason *not* to operate in the "progressive background" mode you indicate? The only thing I can think is that it slightly improves speed when scrolling, but poppler is so fast I don't think this is much of a concern. Perhaps a better option would be "number of pages to pre-render", with a reasonable default. In any case, there should be some kind of bound on memory usage that does not cause it to crash. (Indeed with "progressive backgrounds" checked, it behaves as I expect) Similarly, if a user scrolls through an entire long document in progressive mode, it should free pages also, so as to prevent the crash-by-memory-overuse error.

    If not simply speed, what is the use case where disabling "progressive backgrounds" is desirable?

    On #4, I don't think anyone has a clue what the difference is among these three options. How about a Journal->Paper Style->Image option instead, combined with a single "File open" that can open anything?

    And why is printing hard? Poppler should be able to render any of these formats to pdf. (I haven't looked at this at all...)

     
  • Denis Auroux

    Denis Auroux - 2009-09-01

    Hi Bob,

    1. Names of the options: I agree they're not always clear, and will need some rationalizing. Your suggestions are overall pretty good; I have a high level of inertia and it takes me time to get used to new ideas, but expect some of your advice to slowly make it into future releases. Regarding the inappropriate explanation of the "progressive backgrounds" option in the config file, in fact I recently changed it to "just-in-time update of page backgrounds", but the comments don't get updated when one re-saves an existing config file.

    2. Crash by out-of-memory: this is an issue even without "progressive", so I'll need to add a memory limit. A single page at 10x zoom level already occupies about 150 MB of memory... (and needs about 3 times that to generate via poppler)

    3. Poppler might be fast enough for you, but it definitely isn't anywhere near real-time (sure, it's faster than pdftoppm, no question). Ideally the rendering would be done completely in background, as was the case with the old pdftoppm code; the right way would clearly be to use threads, and I'm not too confident that I can get it right on the first attempt, so it'll have to be done later.

    4. About the various ways of opening PDF files: I really don't want File-Open to show PDFs (or other files that could be annotated) unless one specifically asks for them. It's a recipe for confusion, causing you to open a PDF when you wanted to open a XOJ with the same name (eg a XOJ that annotates the PDF, or a XOJ that was exported to PDF) and end up with multiple incompatible versions of your work files. So, officially File-Open does not open PDFs, and the only way is "Annotate PDF". (Of course, as any good bureaucratic system, Xournal is actually more flexible than its official policy).

    5. As far as I know, poppler does not generate PDF code or render anything to PDF. It can generate PS code, and could be used to print to PDF via poppler + cairo + gtkprint, but that's neither elegant nor safe. Along the same lines: where did you find out that poppler can load more than PDF (I think you mentioned PS and SVG)? As far as I know, all it can read is PDF, and all it outputs is cairo, pixbuf, or PS (except perhaps a very limited manipulation of PDF forms). Please let me know if I'm wrong.

    Denis

     
  • Nobody/Anonymous

    Poppler's speed depends on the hardware and on the content of the pdf file. To scroll a scan from Hume's Enquiry (http://www.archive.org/details/enquiryconcernin00humerich) in particular, I have to wait 5 seconds for every page to load. I guess it would be longer if I used my Eee PC (downclocked to 600Mhz) or a Nokia tablet. So, I think preloading does make a lot of sense.

    Btw, this new update changes everything, now that I can annotate the 300th page without loading the 299 pages which precede it. And how timely, school begins in a few days. Thanks!

     
  • Denis Auroux

    Denis Auroux - 2009-10-03

    Ok, release 0.4.5 is out and it includes a modified version of this patch. It's still missing a mechanism to limit the amount of RAM taken, but so does every single piece of code out there for the time being => it'll wait until next release.

    I'm closing this patch, but you can reopen it (or open a bug about the memory usage issue, though I'm already aware of it) if you feel that it is necessary.

    Denis

     
  • Denis Auroux

    Denis Auroux - 2009-10-03
    • status: open --> closed
     

Log in to post a comment.

MongoDB Logo MongoDB