Here's a patch to common/awardClass.py that seems to fix things for me. NB: I've tried to avoid triggering whitespace changes on unrelated lines, hopefully it should be clean enough to merge. NB: This doesn't handle potential cases such as a name like "Foo+Bar", only ones where the + is at the end of the name. I dunno if the former cases could be resolved, but the only cases I know of at the moment all fall into the latter basket.
Update: quick test on a local dev system (albeit with slightly old code, although I suspect this stuff hasn't changed recently) shows: What gets stored in the database (specifically award_author column in awards table) looks correct, so it would that it's how that value is manipulated that needs work. Now that I've looked into this, I remember that multiple award recipients are handled differently from authors of a story - there's just a single string value with + used as a separator. In theory,...
Award nominees with a plus sign in their names are handled incorrectly
Here's the test file - I couldn't get the file picker to let me select files from different directories. FWIW, I also documented how you might be able to use this code in a Python REPL session here: http://www.isfdb.org/wiki/index.php/User:ErsatzCulture/RunningScriptsStandalone This isn't massively useful, especially if you're not using a CLI-friendly environment (i.e. Windows ;-)
OK, so here is a patch that implements (much of) this... there should be three files, but the file picker is being a bit awkward, so I might have to attach them to multiple comments. First off isfdb.py has SPECIAL_AUTHORS_TO_IGNORE defined at the end. This includes the names of the author IDs used by some of the reports - but I haven't altered those reports to make use of this. Secondly, mod/marque.py has had an overhaul to break out the code that queries for the top authors, and now makes use of...
Top forthcoming does not exclude all "generic" authors
Re. your points (skipping #1) OK, I'll post something there, although I'm concerned that it'll get bogged down in a "perfect-is-the-enemy-of-good"/"choosing beggars" discussion. My concern here is that trying to do the right thing in all cases is liable to run into awkward edge cases. I haven't looked into the data to see what nasties might lurk, but things that crossed my mind might be cases where some existing values have pipes and some don't, and if you have weird values like "123|456" and "234",...
Implementation of this change attached. As with my previously submitted patch, this is on a version of the file that has all the tabs & spaces made consistent, so will show up with hundreds of changes in a diff. Here is a diff against a version of the file that had the whitespace made consistent, but before any changes were made, that makes it clearer what I've done: common $ diff viewers.py viewers.py.ConsistentSpacing 199,202c199,200 < elif Label == 'Binding': # aka "Format" < if not value or value...