Shareaza / Tickets / #173 Hashing/removing of files causes UI unresponsiveness, sometimes for excessive periods of time.

Jake Creely - 2011-07-19

ProcessExplorer shows some other odd behaviors during Shareaza file hashing, by the way. For example, a svchost.exe sometimes shows spikes in CPU usage -- this particular svchost hosts DCOM and Plug and Pray. DCOM is involved in dllhosts, and the other thing that commonly shows up is a succession of short-lived dllhost.exe tasks, briefly using some CPU, whose launch command has a CLSID corresponding to Windows's thumbnail generator.

This is odd, because normally that shows up if jpeg, png, gif, bmp, or tif files are created or modified in an Explorer window that is open and set to a Large Icons view, but here it may show up even without any open Explorer window, and if neither the download destination nor the incomplete folder are open in Explorer, and if the destination folder is but is set to e.g. Small Icons view.

Explorer itself shows spikes of up to 20% CPU use on a fairly beefy machine during hashing as well.

So it looks like Shareaza is doing more than hashing, and is outsourcing some of that to some rather inefficient Microsoft crudware to boot. I'm guessing it's metadata extraction and Shareaza's own thumbnail generation that's (part of) the culprit here. Of course you can disable metadata extraction, but then all your files look like spam to other Shareaza users. :)

Here are some suggestions:

1. Don't bother with metadata extraction if the download directory isn't shared, or if a file is moved to a non-shared library folder. If a file is created in or moved to a shared library folder and extraction hasn't been done yet, do it then.

2. Remember metadata and hashes for a SHA-1 if a file disappears from the library, and don't redo extraction if the file reappears. When a file is added to a library folder, calculate its SHA-1 first and if all this other stuff is remembered for that SHA-1 just reuse it. Otherwise, compute it.

3. Use your own thumbnail generator code instead of shelling out to Explorer's DCOM module for it. You've got an inbuilt previewer already so you have decoders for image formats, possibly even for some Explorer doesn't support. Get a raster, bicubic downsample it to the thumbnail size you want, and do it efficiently inside Shareaza's own process instead of using some IPC that Shareaza gets blocked waiting for (this would explain the periods of nearly-zero CPU and I/O by Shareaza during hashing).

4. Make the hashing not @&! application-modal! I don't see why other Shareaza threads all seem to block waiting for a file to hash. And I doubt it's intended behavior, since it effectively makes the default, non-fast hashing really terrible. Nothing needs* to block while a file hashes -- even if the library is open to its directory and the file is scrolled into view, the library can note that it's hashing, register interest in that file, and update the display when notified the hashing is done. If a query comes in that the file might match, maybe remember it and send out a hit if the file hashes and matches, or just forget about it. The files you have already filter likewise can be asynchronously notified: if a result pane has this filter set it can register interest in all file hashing operations while displayed, and refilter everything when a hash completes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jake Creely - 2011-07-19

P.S. If hashing really must for some reason be modal, can we at least have a proper modal dialog box with a progress indicator instead of an application that simply seems to be hung?!

P.P.S. Do we really, really need porn ads on this page? Or any ads at all, really? Right now there's a banner at the top showing "Be Naughty" and a girl in her underwear. Strictly NSFW, so it's fortunate that isn't where I'm filing bug reports from. And the machine I'm using is certified 100% adware free so it's not a browser hijack object I've gotten, either.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Branko Radovanovic - 2012-06-21

Okay, my take on this...

Unresponsiveness while adding/removing files is caused by super-heavy disk I/O. I now have an SSD, and it works wonders, the problem is gone.

Why super-heavy disk I/O? My bet is on less than optimal (to put it nicely) algorithms and/or library data structures. The problem lies with writing to the library, it has little or nothing to do with the actual hashing process.

The slowdown is roughly proportional to the size of the library (number of entries, rather than their size!). Effectively, this means that creating the library of n items is proportional to n squared.

Slow/fast hashing switch does not make sense to me, at least not the way it is implemented right now. Slow hashing will hash a 1-Gb file in, say, 3 minutes, while fast hashing will do it in, say, 30 seconds. However, neither will hog your machine, because the sticky part is not the hashing process, but rather writing a new entry to the library. If one tries to add 10,000 100-kb files to the library (again, 1 Gb, "the same thing"), it will hog the machine, slow hashing or not (fast hashing might be only slightly worse). What is this switch good for, then?

Some promising workarounds are already described in the previous posts. Moving hashing to a background thread seems to be the easiest one.

If "add the entry" effectively meant "write the entire library again from scratch", we'd have precisely the described problem. That's not what library code does (well, at least I hope it's not), but it might be something on the same order (i.e. the number of writes appears to be proportional to the size of the library).

If I'm right in point number 6, then batching (if it's possible), would help immensely (e.g. first hash 10 items, then write all 10 at once to the library, rather than hash and write 10 times).

It is hard to suggest solutions if one doesn't know how the library code works. The developers should help us with this one. The solution might even be trivially easy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

raspopov - 2013-12-14

Milestone: --> 2.8.0.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

raspopov - 2015-10-04

Milestone: 2.8.0.0 --> 2.8.10.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

raspopov - 2017-09-18

Milestone: 2.8.10.0 --> 2.7.11.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hashing/removing of files causes UI unresponsiveness, sometimes for...

A universal P2P file sharing client for Windows

Milestone

Searches

Help

#173 Hashing/removing of files causes UI unresponsiveness, sometimes for excessive periods of time.

Discussion