Menu

#173 Hashing/removing of files causes UI unresponsiveness, sometimes for excessive periods of time.

2.7.11.0
new
nobody
None
Library
major
2.5.5.0
defect
2017-09-18
2011-07-19
Jake Creely
No

This is worse with fast hashing on, but briefer. The UI becomes stuttery to completely unresponsive, or stuttery with one or more periods of full lock-up, while files are hashing. Moreover, this can continue for an excessive length of time, for instance for over half a minute in the wake of a 30 kilobyte file downloading.

Oddly, there are a few correlations with the duration of the lock-up that have no obvious reason to exist. File size is the only obvious one, or cumulative file sizes if several complete (almost) at once. However, others include:

"seismic gap": If most of the freezes have been brief recently, there's an elevated chance that the next one will be quite a lot longer than typical for the file size.

Size of download list: The average freeze duration seems a lot longer if the download list has 800 pending items than if it has 300, even after controlling for download size. For some reason it's not just marking the one file Completed but doing some sort of processing over the entire list, it seems. Finding a new item to try to download, perhaps? But it should hit one very quickly, and even scanning 800 items checking an in-memory flag or counter or priority index of some sort should not take any human-perceptible amount of time on modern machinery.

Size of library: there may even be a weak correlation here, but I don't remember this happening as much when I had 800 library files as now, when I have 80,000.

Also should be irrelevant. Any library-related data structures that need to be updated when a file has been hashed and added should have efficient indexing: trees or, preferably given what's just been done to the file, hash dictionaries. Even a linear list of 80,000 is quick to search through on modern hardware, making me wonder if it's doing something particularly stupid, like an in-place insert into a growable array, which means copying an expectation of 40,000 items down one slot. Trees and hash dictionaries avoid this problem as well.

There is also a lengthy freeze when files are removed from the library. It seems to correlate with the number of files removed at one time and, possibly, the size of the library. More poor choices of data structures and algorithms? Deletion in an array, without some sort of freelist, is also going to copy tens of thousands of items -- and if the deletions are done sequentially, the same items will get copied again and again, versus marking items for deletion in one pass and then going over in another pass until the first deleted item, then tracking a to-index and a one-higher from-index and every time the from-index points to a kept item, copy it to the to-index and bump both, but when it points to a deleted item bump only the from-index. This takes 1/N the time on average when deleting N items from an array. BUT USE A TREE OR HASH DICTIONARY INSTEAD ANYWAY!

The longer unresponsive periods also show very low I/O and CPU activity as reported by ProcessExplorer. With fast hashing on, I'd think either I/O or CPU would be maxed during hashing, but that seems not to be the case, unless hashing can spend a lot of its time waiting on locks.

Also, a lot of the involved threads appear to run at priorities higher than the Shareaza process's own priority, and so much so that even giving Shareaza the lowest priority available through ProcessExplorer doesn't stop it causing other applications to sometimes stutter when it's hashing, with fast hashing on. Fast hashing should raise the hashing priority above other Shareaza threads, EXCEPT the UI thread, but not way WAY up there. In fact perhaps it should lower all other non-UI Shareaza thread priorities instead of raise the hashing thread priority.

Discussion

  • Jake Creely

    Jake Creely - 2011-07-19

    ProcessExplorer shows some other odd behaviors during Shareaza file hashing, by the way. For example, a svchost.exe sometimes shows spikes in CPU usage -- this particular svchost hosts DCOM and Plug and Pray. DCOM is involved in dllhosts, and the other thing that commonly shows up is a succession of short-lived dllhost.exe tasks, briefly using some CPU, whose launch command has a CLSID corresponding to Windows's thumbnail generator.

    This is odd, because normally that shows up if jpeg, png, gif, bmp, or tif files are created or modified in an Explorer window that is open and set to a Large Icons view, but here it may show up even without any open Explorer window, and if neither the download destination nor the incomplete folder are open in Explorer, and if the destination folder is but is set to e.g. Small Icons view.

    Explorer itself shows spikes of up to 20% CPU use on a fairly beefy machine during hashing as well.

    So it looks like Shareaza is doing more than hashing, and is outsourcing some of that to some rather inefficient Microsoft crudware to boot. I'm guessing it's metadata extraction and Shareaza's own thumbnail generation that's (part of) the culprit here. Of course you can disable metadata extraction, but then all your files look like spam to other Shareaza users. :)

    Here are some suggestions:

    1. Don't bother with metadata extraction if the download directory isn't shared, or if a file is moved to a non-shared library folder. If a file is created in or moved to a shared library folder and extraction hasn't been done yet, do it then.

    2. Remember metadata and hashes for a SHA-1 if a file disappears from the library, and don't redo extraction if the file reappears. When a file is added to a library folder, calculate its SHA-1 first and if all this other stuff is remembered for that SHA-1 just reuse it. Otherwise, compute it.

    3. Use your own thumbnail generator code instead of shelling out to Explorer's DCOM module for it. You've got an inbuilt previewer already so you have decoders for image formats, possibly even for some Explorer doesn't support. Get a raster, bicubic downsample it to the thumbnail size you want, and do it efficiently inside Shareaza's own process instead of using some IPC that Shareaza gets blocked waiting for (this would explain the periods of nearly-zero CPU and I/O by Shareaza during hashing).

    4. Make the hashing not @&! application-modal! I don't see why other Shareaza threads all seem to block waiting for a file to hash. And I doubt it's intended behavior, since it effectively makes the default, non-fast hashing really terrible. Nothing needs* to block while a file hashes -- even if the library is open to its directory and the file is scrolled into view, the library can note that it's hashing, register interest in that file, and update the display when notified the hashing is done. If a query comes in that the file might match, maybe remember it and send out a hit if the file hashes and matches, or just forget about it. The files you have already filter likewise can be asynchronously notified: if a result pane has this filter set it can register interest in all file hashing operations while displayed, and refilter everything when a hash completes.

     
  • Jake Creely

    Jake Creely - 2011-07-19

    P.S. If hashing really must for some reason be modal, can we at least have a proper modal dialog box with a progress indicator instead of an application that simply seems to be hung?!

    P.P.S. Do we really, really need porn ads on this page? Or any ads at all, really? Right now there's a banner at the top showing "Be Naughty" and a girl in her underwear. Strictly NSFW, so it's fortunate that isn't where I'm filing bug reports from. And the machine I'm using is certified 100% adware free so it's not a browser hijack object I've gotten, either.

     
  • Branko Radovanovic

    Okay, my take on this...

    1. Unresponsiveness while adding/removing files is caused by super-heavy disk I/O. I now have an SSD, and it works wonders, the problem is gone.
    2. Why super-heavy disk I/O? My bet is on less than optimal (to put it nicely) algorithms and/or library data structures. The problem lies with writing to the library, it has little or nothing to do with the actual hashing process.
    3. The slowdown is roughly proportional to the size of the library (number of entries, rather than their size!). Effectively, this means that creating the library of n items is proportional to n squared.
    4. Slow/fast hashing switch does not make sense to me, at least not the way it is implemented right now. Slow hashing will hash a 1-Gb file in, say, 3 minutes, while fast hashing will do it in, say, 30 seconds. However, neither will hog your machine, because the sticky part is not the hashing process, but rather writing a new entry to the library. If one tries to add 10,000 100-kb files to the library (again, 1 Gb, "the same thing"), it will hog the machine, slow hashing or not (fast hashing might be only slightly worse). What is this switch good for, then?
    5. Some promising workarounds are already described in the previous posts. Moving hashing to a background thread seems to be the easiest one.
    6. If "add the entry" effectively meant "write the entire library again from scratch", we'd have precisely the described problem. That's not what library code does (well, at least I hope it's not), but it might be something on the same order (i.e. the number of writes appears to be proportional to the size of the library).
    7. If I'm right in point number 6, then batching (if it's possible), would help immensely (e.g. first hash 10 items, then write all 10 at once to the library, rather than hash and write 10 times).
    8. It is hard to suggest solutions if one doesn't know how the library code works. The developers should help us with this one. The solution might even be trivially easy.
     
  • raspopov

    raspopov - 2013-12-14
    • Milestone: --> 2.8.0.0
     
  • raspopov

    raspopov - 2015-10-04
    • Milestone: 2.8.0.0 --> 2.8.10.0
     
  • raspopov

    raspopov - 2017-09-18
    • Milestone: 2.8.10.0 --> 2.7.11.0
     

Log in to post a comment.