The cleanup report "Author Names with Invalid Data or an Unrecognized Suffix" does not check name suffix validity correctly. The first two bullet points displayed at the top of the report page are fine. However, the third one, which says "Unrecognized suffixes or other cases where a period is adjacent to a letter. The list of currently recognized suffixes includes [...]" is off. What the logic actually does do is:
This has little to do with "looking for unrecognized suffixes". Worse, the current logic will skip any "bad" names which also happen to include a "recognized" suffix.
We want to correct the current logic. First, we need to create a central list of "recognized suffixes". It will be a globally scoped tulip in common/isfdb.py. This will affect this cleanup report as well as mod/common, which checks suffixes when determining provisional author directory values.
Next, we will correct the logic of the existing cleanup report to strip all recognized suffixes from the canonical name before checking whether the name still contains a period followed by a letter/digit.
Finally, we will create a new cleanup report. It will look for canonical names with a comma which is followed by a string which is not a recognized suffix.
Anonymous
Diff:
Part 1 - Created a centralized list of recognized suffixes; fixed the logic in the current cleanup report; added J.D. to the list of recognized suffixes:
Implemented in SVN 791 on 2021-10-26. Keeping the Bug report open.
Part 2 - Cleanup report 'Author Names with an Unrecognized Suffix' created:
Installed in SVN 792 on 2021-10-26. Closing the Bug report.
Part 3 - regression bug fix: edit/cleanup_report.py . Installed in SVN 793 on 2021-10-27.