Menu

#55 More a suggestion than a bug report

open
nobody
None
5
2008-09-05
2008-09-05
No

Hello,

I have used xournal ever since my MIT days. It is a unique application (wish itwas Qt based) that has allowed me to organize my annotations to research papersand reduce the clutter in my office.

Recently, over the summer, we (mis)used the services of a few undergrads to scan in some old lab manuals that are still used by the members of our group from time to time. The manuals in question are almost tattered after years of use. Now, these images were scanned using xsane, the unpaper'ed in a script, followed bysome manual unpaper in gscan2pdf and then saved to PDF.

One of the manuals is 1400 pages long, and needs to be heavily annotated due tochanges in practices over the years. The original annotations are in a separatesheet that have also faded. However, when I open this in xournal, the application stops, complaining that it cannot run pdftoppm anymore. Suspecting that this was a memory issue, I tried running it on a system with more RAM, and sure enough, it ran for longer (36 pages) but stopped again. Reduction of the resolution from 144 dpi to 72 dpi in .xournal did not help much.

I am aweare that xournal tries to load all the pages of the input PDF document into memory, a fact that has caused some consternation among affected users on sourceforge. Elegant solutions like intelligent loading and unloading of ppm files has been suggested as an option. This is a good solution, but it will take time to code, and if the author of the application (who cannot be thanked enough) is to be taken at his word, it might never be coded. In the meantime, could a simpler, slower solution involving saving all the ppm files in a user-defined tmp directory (just like gscan2pdf, which appears to have no issues handling large documents) be implemented ? These ppm files could be loaded one at a time, and then discarded as soon as the user turned to another page. I could help, but it has been at least 10 years since I have written anything non-trivial in C/C++ and in any case, my coding strengths lie in matlab / fortran 95, not C.

Just a suggestion. What do you think ?

Discussion

  • Madhusudan Singh

    • priority: 5 --> 9
     
  • Nobody/Anonymous

    Logged In: NO

    This is already part of the to-do list -- but doing it more smartly so that the backgrounds don't take too long to load, nor clutter the hard disk (your 1400 pages would probably use about 10 GB of hard disk space if all stored on disk), and after switching to poppler instead of pdftoppm in order to make the rendering more efficient.

    I'm not optimistic about Xournal ever behaving well with a 1000+ page document though. In all cases I would recommend segmenting things and annotating one chapter at a time. One thing you can easily do, if you don't want to split the PDF, is create a bunch of .xoj files that only annotate a subset of pages. The minimal xoj file that will do this is just the following type:

    <?xml version="1.0" standalone="no"?>
    <xournal>
    <page width="612.00" height="792.00">
    <background type="pdf" domain="absolute" filename="/home/auroux/file.pdf" pageno="521" />
    <layer>
    </layer>
    </page>
    <page width="612.00" height="792.00">
    <background type="pdf" pageno="522" />
    <layer>
    </layer>
    </page>
    <page width="612.00" height="792.00">
    <background type="pdf" pageno="523" />
    <layer>
    </layer>
    </page>
    </xournal>

    annotates pages 521-523 of /home/auroux/file.pdf which are letter-size paper (612x792 points)

    Denis

     
  • Denis Auroux

    Denis Auroux - 2008-09-05
    • priority: 9 --> 5
     
  • Denis Auroux

    Denis Auroux - 2008-09-05

    Logged In: YES
    user_id=1482965
    Originator: NO

    Also, tracker item moved back to "feature requests", which is where it belongs.
    Denis

     
  • Madhusudan Singh

    Thanks for the quick response.

    Managing a bunch of separate xoj files will be about as messy as managing a bunch of separate pdf files. What does one do if one needs to save a pdf copy of the annotated document. No doubt it can be done using pdfjoin / pdftk etc., but it is still going to be a mess.

    Instead, why not arrive at the following compromise :

    Provide an additional editing mode called the long file mode. This kicks in either by user configuration or simply by calculating the number of pages in the target document, and comparing it against the result of dividing up available system RAM (making a conservative 80% allocation) by the average memory footprint of a single pnm image that is generated by xournal. Either way, what happens in the long file mode is the following :

    Xournal loads the next 10 and the last 5 pages into system RAM. It also writes to the tmp directory, pnm images for the next 20 and the last 20 pages. When the user advances to the next page, it updates its page list in each case. It should be configurable to allow the user to set these numbers of pages depending on their situation.

    This scheme is not particularly intelligent, requires little or no bookkeeping (I imagine that the index of the currently open page is a variable that xournal probably knows pretty well :) ) and will also avoid using too much hard disk space (or at least give the user control over how much is used). The RAM footprint will have an upper limit configurable by the user. To make things more idiot-proof, you can make this long file mode unset initially, so that the users who need it, have to explicitly activate it before using for the first time.

    This will not affect people who use xournal for small documents in any way, while extending the power of xournal for people who need it for bigger documents. What do you think ?

     

Log in to post a comment.