Menu

#786 Cleanup reports do not check name suffix validity correctly

v1.0 (example)
closed-fixed
None
5
2021-10-27
2021-09-30
Ahasuerus
No

The cleanup report "Author Names with Invalid Data or an Unrecognized Suffix" does not check name suffix validity correctly. The first two bullet points displayed at the top of the report page are fine. However, the third one, which says "Unrecognized suffixes or other cases where a period is adjacent to a letter. The list of currently recognized suffixes includes [...]" is off. What the logic actually does do is:

  1. Look for canonical names where a period or a comma is followed by a letter
  2. Add the name to the report unless the name ends with one of the "recognized" suffixes

This has little to do with "looking for unrecognized suffixes". Worse, the current logic will skip any "bad" names which also happen to include a "recognized" suffix.

We want to correct the current logic. First, we need to create a central list of "recognized suffixes". It will be a globally scoped tulip in common/isfdb.py. This will affect this cleanup report as well as mod/common, which checks suffixes when determining provisional author directory values.

Next, we will correct the logic of the existing cleanup report to strip all recognized suffixes from the canonical name before checking whether the name still contains a period followed by a letter/digit.

Finally, we will create a new cleanup report. It will look for canonical names with a comma which is followed by a string which is not a recognized suffix.

Discussion

  • Ahasuerus

    Ahasuerus - 2021-10-26
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -7,6 +7,6 @@
    
     We want to correct the current logic. First, we need to create a central list of "recognized suffixes". It will be a globally scoped tulip in common/isfdb.py. This will affect this cleanup report as well as mod/common, which checks suffixes when determining provisional author directory values.
    
    -Next, we will correct the logic of the existing cleanup report to strip all recognized suffixes from the canonical name before checking whether the name still contains a period or a comma followed by a letter/digit.
    +Next, we will correct the logic of the existing cleanup report to strip all recognized suffixes from the canonical name before checking whether the name still contains a period followed by a letter/digit.
    
     Finally, we will create a new cleanup report. It will look for canonical names with a comma which is followed by a string which is not a recognized suffix.
    
    • assigned_to: Ahasuerus
     
  • Ahasuerus

    Ahasuerus - 2021-10-26

    Part 1 - Created a centralized list of recognized suffixes; fixed the logic in the current cleanup report; added J.D. to the list of recognized suffixes:

    common/isfdb.py
    edit/cleanup_report.py
    mod/common.py
    nightly/nightly_job.py
    

    Implemented in SVN 791 on 2021-10-26. Keeping the Bug report open.

     
  • Ahasuerus

    Ahasuerus - 2021-10-27
    • status: open --> closed-fixed
     
  • Ahasuerus

    Ahasuerus - 2021-10-27

    Part 2 - Cleanup report 'Author Names with an Unrecognized Suffix' created:

     edit/cleanup_lib.py
    edit/cleanup_report.py
    nightly/nightly_job.py
    

    Installed in SVN 792 on 2021-10-26. Closing the Bug report.

     
  • Ahasuerus

    Ahasuerus - 2021-10-27

    Part 3 - regression bug fix: edit/cleanup_report.py . Installed in SVN 793 on 2021-10-27.

     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB