The attached series of patch improve MComix on big and/or slow archives (like HD PDF files), for which a single page extraction can take seconds.
The break down is as follow:
[PATCH 1/12] minor cleanup in file handler:
Remove dead code.
[PATCH 2/12] improve logging traces:
Add thread name for easier multi-threading debugging.
[PATCH 3/12] add debug trace on thread join at exit
[PATCH 4/12] small cleanup in file_handler
Remove unnecessary call to imagehandler.close after a call to
close_file (which already calls it).
[PATCH 5/12] fix callback helper code
Currently, with the following code:
from mcomix import callback class Test: @callback.Callback def callback(self, data): pass t1 = Test() t1.callback += lambda str: sys.stdout.write('t1: %s\n' % str) t2 = Test() t2.callback += lambda str: sys.stdout.write('t2: %s\n' % str) t1.callback('1') t2.callback('2')
The output is:
t1: 1 t2: 1 t1: 2 t2: 2
Which means that the callbacks are called across object instances! This
fix the callback code so that callbacks are correctly bound to their
object. In the example above the output becomes:
t1: 1 t2: 2
[PATCH 6/12] add worker thread class
[PATCH 7/12] rework extractor and page cache threading:
Improve threading so pages around the current selected one are given
priority for extraction and caching. This prevent possible long waits
when starting to read far from the beginning, or when jumping (forward
or back) many pages. Previously this was only handled when starting to
read an archive, and never updated when changing page: if the archive is
big and/or extraction is slow, the extractor thread could take a long
time to catch up to the current page.
[PATCH 8/12] avoid possible long hang in page selection dialog:
If a page is not yet available, don't wait for it so we can create the
preview, as it may hang the interface for a considerable time
(minutes) when the archive is and big and/or the extraction is slow.
[PATCH 9/12] don't hang waiting for pages in double page mode:
When trying to determine if a page is a virtual double page, don't wait
for pages (and block the interface) if they are not yet available.
[PATCH 10/12] rework thumbnail generation threading
[PATCH 11/12] improve page selection dialog reactivity
Use asynchronous mode for image preview generation.
[PATCH 12/12] update page selection preview on page availability
Hey,
great work. Couldn't find any problems in 1~9, so they have all been merged.
For the remaining patches:
10, rework thumbnail generation threading: I'm wondering what your thoughts/reasons behind this change are, especially cutting down the threading. Generally, threading doesn't make much sense in Python due to the interpreter lock, but under these particular circumstances, where each thread is largely IO-bound, the benefits are still substantial (the interpreter lock is released when a thread waits for IO). While your patch fixed the long-standing problem that scrolling up/down or re-sorting would often queue up an enormous amount of thumbnail requests, thus delaying the display of the currently visible thumbnails, thumbnail generation is also noticeably slower now, especially on PCs that can make use of multiple threads (4+core CPUs).
11~12: Using the scollbar on the right side of the thumbnail to scroll down/up quickly currently breaks the selector dialog. A large amount of threads are spawned, and the thumbnail no longer updates, even when using the page selection text box.
regarding 10: you're absolutely right, attached an updated mini-series with multiple-threads support
regarding 11~12: I did test this, as I feared that moving the scrollbar could indeed lead to too many threads, but it does work fine on my end. I'll look into improving the code: I think I'll just add a worker thread (with a flush before each add) to prevent spawning more than 1 thread.
By the way, on the subject of thumbnail generation, I saw this code (in thumbbar.py):
And noticed it does prevent flickering on Linux (with PyGTK 2.24) too. Looking at the Subversion log, the platform check was added with this commit message:
But there is no further explanation as to what was troublesome. Do you have any idea what was wrong?
I'm sorry for trouble, as the commit you refer to is "mine". The problem I had (and still have in r1016) looks like this: I run MComix on Linux with a GTK theme that uses an almost-white color for the default background while my MComix shows a black background. That is, the background colors are totally different from each other. (I don't know whether this is actually important here but you might need similar settings to reproduce this.) Now, every time MComix needs to create a thumbnail because there is no cached one, the entire thumbnailer flashes (which means it turns to a bright color for just a moment, presumably the default GTK background color or simply white, and than switches back to black again). Thus, when I simply try and read a book, the thumbnailer flashes every time I switch to the next page because this is usually the first time the corresponding thumbnail comes into view (due to implicit thumbnailer scrolling) and thus has to be created. (However, going back to pages I've read works fine because the thumbnails for these pages are already cached.)
This is really annoying. So I tried to fix this by modifying the lines of code you quoted. It kinda worked somehow (the flashing wasn't that prominent anymore), but it had other flickering quirks so I reverted the changes I made. This is where the commit message you quoted comes from.
I hope this explains that commit message.
Thank you very much for the patches. :-)
Using a theme with a white background indeed make it worse. I don't see any issue with
self._treeview.unset_flags(gtk.DOUBLE_BUFFERED)
though. Can you elaborate on the other flickering quirks when using it?And here is an updated min-series for 11~12.
Edit: update second patch (need to unregister callback on close)
Last edit: Benoit Pierre 2014-05-23
One final mini-series:
Add concurrent extractions support to archive extractor.
This is supported when using an external program to do the extraction, like for PDF, or when using 7-Zip. The preference setting is conservatively set to 1, and available in the "Advanced" tab of the preferences dialog (under a new "Threading" section).
N.B.:
- the support is for concurrent extractions from a single archive, not support for extracting from multiple archives at the same time.
- the series is dependent upon the
0002-improve-worker-thread-class.patch
patchHere is some resulting timings:
CBR 171Mo. 146 pages (~1.2Mo 1523x2343):
CBZ 171Mo. 146 pages (~1.2Mo 1523x2343)
PDF 925Mo, 132 pages (3975x6113)
Pretty awesome. All commited. I took the liberty to consolidate threading with caching in the preferences, since adding another section takes away so much space that looks wasted on the other tabs. Thanks again for your effort!
Great, thanks!
@Benoit Pierre : I'm not sure, actually, but when I enable
self._treeview.unset_flags(gtk.DOUBLE_BUFFERED)
, it seems like the heavy flashing is gone but some other kind of flickering occurs, more like thin horizontal lines that appear here and there. Those lines appear even if all thumbnails are cached, but they are less noticeable in this case. So it's not a perfect solution but the flashing is definitely worse.Maybe there were some other problems I cannot reprocude anymore, I don't know. I changed the code so
self._treeview.unset_flags(gtk.DOUBLE_BUFFERED)
is enabled for all platforms now. Thank you for pointing this out.By the way, there is another bug:
Oops! Here is a patch to fix this regression.
Playing around a bit more with the changes, there seems to be an endless wait in one of the threads somewhere in the thumbnail sidebar, or the underlying thumbnail view, probably related to the the functionality that remembers the last read position.
Steps to reproduce:
- Enable "Store information about recently opened files" in the preferences.
- Open the thumbnail sidebar.
- Open an archive.
- Skip forward one page and close the archive (so that MComix remembers that page 2 was last read)
- Open the same archive again.
- You should be asked now if you want to continue reading from page 2. Press yes or no, doesn't seem to matter.
- The sidebar now doesn't display any thumbnails, and MComix can only be killed with SIGKILL as one thread cannot be joined on exit.
Not sure yet how to deal with the problem.
That's my normal use case, and it works fine for me. Can you apply the attached patch and enable traces? It changes the thread names to be more descriptive. If one of the thread is hung somewhere, maybe the trace at exit when trying to join it will give us more info.
I'm not sure whether this is the same thing you're already talking about, but you'll run into trouble whenever you're trying to open an archive that is "solid" (7z terminology). Do we still use the 7z stdout to get the extracted file(s)? In this case, I think the best way to handle this is to extract the entire archive at once, i.e.
This will dump the whole (uncompressed) content of the entire archive (that is, all files) to stdout. You must split stdout by counting bytes and creating new files (or directories) as soon as the file sizes listed by
match for the respective files in that order. (At least I believe that the order the files are listed by
7z l
is identical to the order7z x
dumps the contents of the files to stdout.) This command also tells you whether an archive is solid (Solid = +
) or not (Solid = -
). I don't know whether 7z supports files with names that contain\n
or\r
but if it does, this might be a problem. Various (Unicode) encodings might be problematic, too. But since we are only interested in the sizes and not the names of the files, this shouldn't be that much of an issue.Hope that helps.
Last edit: Ark 2014-05-24
The hanging thread and the missing thumbnails in the sidebar are not connected, but separate issues.
As for the sidebar, the problem might have been around for much longer and I just didn't notice, as I never use it myself.
The problem occurs like this: FileHandler.open_file calls into _open_archive. This call potentially blocks the main thread to open the confirmation window ("Continue reading...") while the extractor is already running in another thread. Unfortunately, the extractor sends callbacks to file_handler and image_handler that files are available, but image_handler ignores these calls, as the list of available images hasn't been set yet. These images are only set after _open_archive returns. I added a workaround for the problem.
Concerning the thread, I noticed from the debug output that MComix was always waiting for the extraction thread that is spawned from the thumbnailer thread, which is used in the preview image in the "Open file" dialog. I added extractor.close() to the thumbnailer, which in turn now calls extractor.stop(). This should hopefully fix that issue as well.
@Benoit Pierre: Thanks for the "Properties" patch and all the other patches so far. Did you consider joining the project (if this is okay for Oddegamra)?
I experimented a bit with the thumbbar and it seems that those white lines appear close to where you move the mouse cursor. That is, when you move the cursor up and down along the thumbbar very often and very quickly, you can see these lines quite easily. This did not happen with the old code, though. I don't know the reason, but the new code somehow appears to be less "double buffering" than the old one. Tearing or vsync related issues or something, I guess. I really don't know, sorry. However, I think the flashing is worse so we should keep the current "solution".
@Oddegamra: Great. Would it be possible to commit the thread names patch? I does make debugging easier.
@Ark: I created 2 7z archives, one solid and the other not, and both work fine with MComix. Extraction is however really slow: it seems 7z is scanning the whole archive every time, so it take as much time to decompress one entry as to decompress the whole archive!
@Ark: your welcome!
Are you saying just hovering the mouse over the thumbnails trigger those lines?
The extraction time should depend on whether the archive is solid or not. If the user requests to extract a file at position
n
within a solid archive, 7z will extract all then-1
files before the file in question first. If we do this with every file in the archive, we will actually extract aroundn*n/2
files which is terribly slow. Thus, we'll need a switch:@Benoit Pierre: Yes, hovering is sufficient to make them appear, at least for my configuration. I can't tell whether this is a hardware issue or a driver issue or a GTK rendering engine issue or something else.
If I use PageUp, PageDown, End or similar keys too quickly to switch between different books, I either get an "empty" book (i.e. no images seem to be available until I switch to another book) or I get this:
This error message appears either once or several hundred times at once (depending on the number of pages in the current book, I guess) whenever this happens.
solid 7z archives: you're right, I thought one of the archive I created was not solid, but since it's the default, both actually were. I agree about your pseudo-code.
flickering: what theme/engine do you use? Does a thumbnail change when hovering over it (e.g. gets highlighted)?
quick switch between archives: I can reproduce the exception (but not the empty book). The attached patch fix the exception case, can you check if it works for you please?
flickering: The theme/engine I use is called Aurora (which seems to work just like Clearlooks). I don't see any hovering effects.
switching: Thanks for the patch. This seems to be fixed now. About the "empty" book: Actually, I'm not sure myself anymore, maybe there was simply a high latency I didn't notice. I'll bring this up again if it happens again.
There are at least two bugs in the new extraction code:
Here is a fix for 1. But so far I cannot reproduce 2.
I think I fixed 2 (see attached patch): when creating an archive thumbnail, not all code paths close the extractor and/or delete the temporary directory. Also handled is the case were the archive cannot be opened (e.g. we're missing the required library/executable for support).
Last edit: Benoit Pierre 2014-07-19