Menu

#34 improve usability on slow content

SVN
closed-accepted
nobody
None
5
2014-09-10
2014-05-22
No

The attached series of patch improve MComix on big and/or slow archives (like HD PDF files), for which a single page extraction can take seconds.

The break down is as follow:

[PATCH 1/12] minor cleanup in file handler:

Remove dead code.

[PATCH 2/12] improve logging traces:

Add thread name for easier multi-threading debugging.

[PATCH 3/12] add debug trace on thread join at exit

[PATCH 4/12] small cleanup in file_handler

Remove unnecessary call to imagehandler.close after a call to
close_file (which already calls it).

[PATCH 5/12] fix callback helper code

Currently, with the following code:

from mcomix import callback

class Test:

    @callback.Callback
    def callback(self, data):
    pass

t1 = Test()
t1.callback += lambda str: sys.stdout.write('t1: %s\n' % str)
t2 = Test()
t2.callback += lambda str: sys.stdout.write('t2: %s\n' % str)

t1.callback('1')
t2.callback('2')

The output is:

t1: 1
t2: 1
t1: 2
t2: 2

Which means that the callbacks are called across object instances! This
fix the callback code so that callbacks are correctly bound to their
object. In the example above the output becomes:

t1: 1
t2: 2

[PATCH 6/12] add worker thread class

[PATCH 7/12] rework extractor and page cache threading:

Improve threading so pages around the current selected one are given
priority for extraction and caching. This prevent possible long waits
when starting to read far from the beginning, or when jumping (forward
or back) many pages. Previously this was only handled when starting to
read an archive, and never updated when changing page: if the archive is
big and/or extraction is slow, the extractor thread could take a long
time to catch up to the current page.

[PATCH 8/12] avoid possible long hang in page selection dialog:

If a page is not yet available, don't wait for it so we can create the
preview, as it may hang the interface for a considerable time
(minutes) when the archive is and big and/or the extraction is slow.

[PATCH 9/12] don't hang waiting for pages in double page mode:

When trying to determine if a page is a virtual double page, don't wait
for pages (and block the interface) if they are not yet available.

[PATCH 10/12] rework thumbnail generation threading

  • don't block the thumbbar thumbnail generation thread if a page is not
    yet available
  • but do try to load visible thumbbar thumbnails again when a new page
    becomes available
  • don't use more than 1 thread for thumbnail generation
  • stop thumbnail generation when hidden (thumbbar/book library)

[PATCH 11/12] improve page selection dialog reactivity

Use asynchronous mode for image preview generation.

[PATCH 12/12] update page selection preview on page availability

12 Attachments

Related

Patches: #32

Discussion

1 2 > >> (Page 1 of 2)
  • Oddegamra

    Oddegamra - 2014-05-22

    Hey,

    great work. Couldn't find any problems in 1~9, so they have all been merged.

    For the remaining patches:

    • 10, rework thumbnail generation threading: I'm wondering what your thoughts/reasons behind this change are, especially cutting down the threading. Generally, threading doesn't make much sense in Python due to the interpreter lock, but under these particular circumstances, where each thread is largely IO-bound, the benefits are still substantial (the interpreter lock is released when a thread waits for IO). While your patch fixed the long-standing problem that scrolling up/down or re-sorting would often queue up an enormous amount of thumbnail requests, thus delaying the display of the currently visible thumbnails, thumbnail generation is also noticeably slower now, especially on PCs that can make use of multiple threads (4+core CPUs).

    • 11~12: Using the scollbar on the right side of the thumbnail to scroll down/up quickly currently breaks the selector dialog. A large amount of threads are spawned, and the thumbnail no longer updates, even when using the page selection text box.

     
  • Benoit Pierre

    Benoit Pierre - 2014-05-22
    • regarding 10: you're absolutely right, attached an updated mini-series with multiple-threads support

    • regarding 11~12: I did test this, as I feared that moving the scrollbar could indeed lead to too many threads, but it does work fine on my end. I'll look into improving the code: I think I'll just add a worker thread (with a flush before each add) to prevent spawning more than 1 thread.

    By the way, on the subject of thumbnail generation, I saw this code (in thumbbar.py):

    if sys.platform == 'win32':
        # Prevents flickering on update
        self._treeview.unset_flags(gtk.DOUBLE_BUFFERED)
    

    And noticed it does prevent flickering on Linux (with PyGTK 2.24) too. Looking at the Subversion log, the platform check was added with this commit message:

    Less flicker in one way, but bad in so many other ways. Thus, reverted.

    But there is no further explanation as to what was troublesome. Do you have any idea what was wrong?

     
  • Ark

    Ark - 2014-05-22

    I'm sorry for trouble, as the commit you refer to is "mine". The problem I had (and still have in r1016) looks like this: I run MComix on Linux with a GTK theme that uses an almost-white color for the default background while my MComix shows a black background. That is, the background colors are totally different from each other. (I don't know whether this is actually important here but you might need similar settings to reproduce this.) Now, every time MComix needs to create a thumbnail because there is no cached one, the entire thumbnailer flashes (which means it turns to a bright color for just a moment, presumably the default GTK background color or simply white, and than switches back to black again). Thus, when I simply try and read a book, the thumbnailer flashes every time I switch to the next page because this is usually the first time the corresponding thumbnail comes into view (due to implicit thumbnailer scrolling) and thus has to be created. (However, going back to pages I've read works fine because the thumbnails for these pages are already cached.)

    This is really annoying. So I tried to fix this by modifying the lines of code you quoted. It kinda worked somehow (the flashing wasn't that prominent anymore), but it had other flickering quirks so I reverted the changes I made. This is where the commit message you quoted comes from.

    I hope this explains that commit message.

    Thank you very much for the patches. :-)

     
    • Benoit Pierre

      Benoit Pierre - 2014-05-22

      Using a theme with a white background indeed make it worse. I don't see any issue with self._treeview.unset_flags(gtk.DOUBLE_BUFFERED) though. Can you elaborate on the other flickering quirks when using it?

       
  • Benoit Pierre

    Benoit Pierre - 2014-05-23

    One final mini-series:

    Add concurrent extractions support to archive extractor.

    This is supported when using an external program to do the extraction, like for PDF, or when using 7-Zip. The preference setting is conservatively set to 1, and available in the "Advanced" tab of the preferences dialog (under a new "Threading" section).

    N.B.:
    - the support is for concurrent extractions from a single archive, not support for extracting from multiple archives at the same time.
    - the series is dependent upon the 0002-improve-worker-thread-class.patch patch

    Here is some resulting timings:

    • CBR 171Mo. 146 pages (~1.2Mo 1523x2343):

      • RarArchive, 1 thread: 0m03s
      • SevenZipArchive, 1 thread: 1m47s
      • SevenZipArchive, 3 thread: 0m35s
    • CBZ 171Mo. 146 pages (~1.2Mo 1523x2343)

      • ZipArchive, 1 thread: 0m01s
      • ZipExecArchive, 1 thread: 0m04s
      • ZipExecArchive, 3 thread: 0m02s
      • SevenZipArchive, 1 thread: 0m07s
      • SevenZipArchive, 3 thread: 0m02s
    • PDF 925Mo, 132 pages (3975x6113)

      • 1 thread: 9m30s
      • 3 threads: 3m15s
      • 8 threads: 1m50s
     
  • Oddegamra

    Oddegamra - 2014-05-23

    Pretty awesome. All commited. I took the liberty to consolidate threading with caching in the preferences, since adding another section takes away so much space that looks wasted on the other tabs. Thanks again for your effort!

     
  • Oddegamra

    Oddegamra - 2014-05-23
    • status: open --> closed-accepted
     
  • Benoit Pierre

    Benoit Pierre - 2014-05-23

    Great, thanks!

     
  • Ark

    Ark - 2014-05-23

    @Benoit Pierre : I'm not sure, actually, but when I enable self._treeview.unset_flags(gtk.DOUBLE_BUFFERED), it seems like the heavy flashing is gone but some other kind of flickering occurs, more like thin horizontal lines that appear here and there. Those lines appear even if all thumbnails are cached, but they are less noticeable in this case. So it's not a perfect solution but the flashing is definitely worse.

    Maybe there were some other problems I cannot reprocude anymore, I don't know. I changed the code so self._treeview.unset_flags(gtk.DOUBLE_BUFFERED) is enabled for all platforms now. Thank you for pointing this out.

    By the way, there is another bug:

    Traceback (most recent call last):
      File "/tmp/mcomix-code_svn/mcomix/dialog_handler.py", line 23, in open_dialog
        dialog_windows[ name_of_dialog ][0] = _dialog[1](window)
      File "/tmp/mcomix-code_svn/mcomix/properties_dialog.py", line 81, in __init__
        thumb = window.imagehandler.get_thumbnail(width=200, height=128)
      File "/tmp/mcomix-code_svn/mcomix/image_handler.py", line 474, in get_thumbnail
        if not self._wait_on_page(page, check_only=nowait):
      File "/tmp/mcomix-code_svn/mcomix/image_handler.py", line 514, in _wait_on_page
        index = page - 1
    TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
    
     
  • Oddegamra

    Oddegamra - 2014-05-24

    Playing around a bit more with the changes, there seems to be an endless wait in one of the threads somewhere in the thumbnail sidebar, or the underlying thumbnail view, probably related to the the functionality that remembers the last read position.

    Steps to reproduce:
    - Enable "Store information about recently opened files" in the preferences.
    - Open the thumbnail sidebar.
    - Open an archive.
    - Skip forward one page and close the archive (so that MComix remembers that page 2 was last read)
    - Open the same archive again.
    - You should be asked now if you want to continue reading from page 2. Press yes or no, doesn't seem to matter.
    - The sidebar now doesn't display any thumbnails, and MComix can only be killed with SIGKILL as one thread cannot be joined on exit.

    Not sure yet how to deal with the problem.

     
  • Benoit Pierre

    Benoit Pierre - 2014-05-24

    That's my normal use case, and it works fine for me. Can you apply the attached patch and enable traces? It changes the thread names to be more descriptive. If one of the thread is hung somewhere, maybe the trace at exit when trying to join it will give us more info.

     
  • Ark

    Ark - 2014-05-24

    I'm not sure whether this is the same thing you're already talking about, but you'll run into trouble whenever you're trying to open an archive that is "solid" (7z terminology). Do we still use the 7z stdout to get the extracted file(s)? In this case, I think the best way to handle this is to extract the entire archive at once, i.e.

    7z x -p -so -- archive.7z
    

    This will dump the whole (uncompressed) content of the entire archive (that is, all files) to stdout. You must split stdout by counting bytes and creating new files (or directories) as soon as the file sizes listed by

    7z l -p -so -- archive.7z
    

    match for the respective files in that order. (At least I believe that the order the files are listed by 7z l is identical to the order 7z x dumps the contents of the files to stdout.) This command also tells you whether an archive is solid (Solid = +) or not (Solid = -). I don't know whether 7z supports files with names that contain \n or \r but if it does, this might be a problem. Various (Unicode) encodings might be problematic, too. But since we are only interested in the sizes and not the names of the files, this shouldn't be that much of an issue.

    Hope that helps.

     

    Last edit: Ark 2014-05-24
  • Oddegamra

    Oddegamra - 2014-05-24

    The hanging thread and the missing thumbnails in the sidebar are not connected, but separate issues.

    As for the sidebar, the problem might have been around for much longer and I just didn't notice, as I never use it myself.
    The problem occurs like this: FileHandler.open_file calls into _open_archive. This call potentially blocks the main thread to open the confirmation window ("Continue reading...") while the extractor is already running in another thread. Unfortunately, the extractor sends callbacks to file_handler and image_handler that files are available, but image_handler ignores these calls, as the list of available images hasn't been set yet. These images are only set after _open_archive returns. I added a workaround for the problem.

    Concerning the thread, I noticed from the debug output that MComix was always waiting for the extraction thread that is spawned from the thumbnailer thread, which is used in the preview image in the "Open file" dialog. I added extractor.close() to the thumbnailer, which in turn now calls extractor.stop(). This should hopefully fix that issue as well.

     
  • Ark

    Ark - 2014-05-24

    @Benoit Pierre: Thanks for the "Properties" patch and all the other patches so far. Did you consider joining the project (if this is okay for Oddegamra)?

    I experimented a bit with the thumbbar and it seems that those white lines appear close to where you move the mouse cursor. That is, when you move the cursor up and down along the thumbbar very often and very quickly, you can see these lines quite easily. This did not happen with the old code, though. I don't know the reason, but the new code somehow appears to be less "double buffering" than the old one. Tearing or vsync related issues or something, I guess. I really don't know, sorry. However, I think the flashing is worse so we should keep the current "solution".

     
  • Benoit Pierre

    Benoit Pierre - 2014-05-24

    @Oddegamra: Great. Would it be possible to commit the thread names patch? I does make debugging easier.

    @Ark: I created 2 7z archives, one solid and the other not, and both work fine with MComix. Extraction is however really slow: it seems 7z is scanning the whole archive every time, so it take as much time to decompress one entry as to decompress the whole archive!

     
  • Benoit Pierre

    Benoit Pierre - 2014-05-24

    @Ark: your welcome!

    Are you saying just hovering the mouse over the thumbnails trigger those lines?

     
  • Ark

    Ark - 2014-05-24

    The extraction time should depend on whether the archive is solid or not. If the user requests to extract a file at position n within a solid archive, 7z will extract all the n-1 files before the file in question first. If we do this with every file in the archive, we will actually extract around n*n/2 files which is terribly slow. Thus, we'll need a switch:

    • If the archive is not solid, random access is available so we can simply extract the file in question by using the code we already have.
    • However, if the archive is solid, there is no random access available so we should do something like this to extract the archive in linear time:
        # pseudocode
        observers = list_of_observers_to_be_notified_when_file_becomes_available()
        archive_list = open_archive_content_list_stream("archive.7z")
        archive_content_stream = open_archive_content_stream("archive.7z")
    
        foreach file_in_archive_list do
            filesize = size(file_in_archive_list)
            filename = name(file_in_archive_list)
            file_stream = open_file(filename)
            copy_bytes_from_stream_to(archive_content_stream, file_stream, filesize)
            close(file_stream)
            notify_file_is_available(observers, filename)
    
        close(archive_content_stream)
        close(archive_list)
    

    @Benoit Pierre: Yes, hovering is sufficient to make them appear, at least for my configuration. I can't tell whether this is a hardware issue or a driver issue or a GTK rendering engine issue or something else.

     
  • Ark

    Ark - 2014-05-24

    If I use PageUp, PageDown, End or similar keys too quickly to switch between different books, I either get an "empty" book (i.e. no images seem to be available until I switch to another book) or I get this:

    Traceback (most recent call last):
      File "/tmp/mcomix-code_svn/mcomix/thumbnail_view.py", line 102, in _pixbuf_finished
        iter = model.get_iter(path)
    ValueError: invalid tree path
    

    This error message appears either once or several hundred times at once (depending on the number of pages in the current book, I guess) whenever this happens.

     
  • Benoit Pierre

    Benoit Pierre - 2014-05-24
    • solid 7z archives: you're right, I thought one of the archive I created was not solid, but since it's the default, both actually were. I agree about your pseudo-code.

    • flickering: what theme/engine do you use? Does a thumbnail change when hovering over it (e.g. gets highlighted)?

    • quick switch between archives: I can reproduce the exception (but not the empty book). The attached patch fix the exception case, can you check if it works for you please?

     
  • Ark

    Ark - 2014-05-24
    • flickering: The theme/engine I use is called Aurora (which seems to work just like Clearlooks). I don't see any hovering effects.

    • switching: Thanks for the patch. This seems to be fixed now. About the "empty" book: Actually, I'm not sure myself anymore, maybe there was simply a high latency I didn't notice. I'll bring this up again if it happens again.

     
  • Ark

    Ark - 2014-07-19

    There are at least two bugs in the new extraction code:

    1. Whenever I try to open an archive that contains an archive itself, I constantly run into an error message "'set' object does not support item assignment". There is no output to stdout or stderr when this happens. Also, it seems to happen with any type of archives. The extraction itself seems to be performed, as it takes a long time to open a quite large archive. However, the GUI is frozen and after that, it looks like you never tried to open anything. (The only thing you see then is said error message.)
    2. The temporary mcomix directory is still present after closing MComix, containing only one image or so. This does not always happen, and I guess this happens if you close MComix while there are still thumbnails being created (or images being extracted for caching or thumbnailing). Looks like a race condition or something.
     
  • Benoit Pierre

    Benoit Pierre - 2014-07-19

    Here is a fix for 1. But so far I cannot reproduce 2.

     
  • Benoit Pierre

    Benoit Pierre - 2014-07-19

    I think I fixed 2 (see attached patch): when creating an archive thumbnail, not all code paths close the extractor and/or delete the temporary directory. Also handled is the case were the archive cannot be opened (e.g. we're missing the required library/executable for support).

     

    Last edit: Benoit Pierre 2014-07-19
1 2 > >> (Page 1 of 2)

Log in to post a comment.