Menu

#739 A few records in the canonical_author table refer to non-existent authors

v1.0 (example)
closed-fixed
9
2019-10-20
2019-10-03
No

Minor referential integrity issue I stumbled across in one of my projects that uses a local copy of ISFDB - records in canonical_author that point to non-existent author records.

MariaDB [isfdb]> select ca.author_id, author_canonical from canonical_author ca left outer join authors a on a.author_id = ca.author_id WHERE author_canonical IS NULL;
+-----------+------------------+
| author_id | author_canonical |
+-----------+------------------+
| 11231 | NULL |
| 19035 | NULL |
| 19035 | NULL |
| 19035 | NULL |
| 32527 | NULL |
| 32576 | NULL |
+-----------+------------------+
6 rows in set (1.34 sec)

I don't think that this is at all visible to the users of the site - the title which alerted me to this is Gordon Dickson's Spacepaw, and I don't see anything in the view or edit webpages that looks wrong or gives any indication of these bogus records.

http://www.isfdb.org/cgi-bin/title.cgi?431031

As such, I'm logging this ticket not to ask anyone to remove these records from the database or suchlike, but in case anyone wants to check the edit history - which AFAIK isn't in the database download, but does exist? - for these records to see if there's anything to worry about and/or if there's any cron-job report that could maybe benefit from reporting on records like these?

Alternatively, please close this ticket if it's a known issue and/or something you don't think is worth worrying about.

Discussion

  • Ahasuerus

    Ahasuerus - 2019-10-07

    Early on, we had a fair number of problems with referential integrity. As I recall, some of them were due to server stability issues and some were due to user processes erroring out in the middle of filing a transaction.

    At one point I fixed a bunch of database integrity issues using scripts, but it looks like I missed these 6 records. Considering the fact that the author IDs are in the 11,000-33,000 range and we are up to author ID 300,000, it's probably safe to say that these are very old problems. So much has changed in the last 10 years that whatever server/software issues caused thes records to get "stuck" in the first place no longer apply.

    If you would like to create a script to delete these 6 records as a simple exercise, I will be happy to prettify the code, commit it to the repository and then run it on the main server.

     
  • ErsatzCulture

    ErsatzCulture - 2019-10-07

    OK, I'll be going away for a few days shortly - will be online, but won't have anything resembling a useful dev environment to hand - and will pick this issue back up when I return.

     
    • ErsatzCulture

      ErsatzCulture - 2019-10-19

      Attached is a Python script to report and - with -d argument - delete the offending records. Probably over-engineered for something that will (ideally) only ever be run once, but hopefully it's a reasonable compromise between the idioms/style I'm familiar with and those used in the ISFDB code.

      NB: as stated in the header, I have only tested this with Python 2.7.15 (and MariaDB 10.1.33) - but I've avoided (AFAIK) libraries that are only in newer Pythons - in particular, using getopt rather than optparse or argparse.

      This runs OK for me when placed in the scripts/ directory, which I assume is the preferred home for this sort of thing.

      scripts $ ./remove_orphaned_canonical_authors.py -h
      ./remove_orphaned_canonical_authors.py [-h] [-d]
      Report on or delete orphaned canonical_author records
      -d : Delete the orphaned records (default behaviour is dry-run/report only)
      -h : Print this help information
      scripts $ ./remove_orphaned_canonical_authors.py
      1. canonical_author record with ca_id=235146 is orphaned; title URL is http://localhost/cgi-bin/title.cgi?198701 , (broken) author URL is/was http://localhost/cgi-bin/ea.cgi?19035
      2. canonical_author record with ca_id=224977 is orphaned; title URL is http://localhost/cgi-bin/title.cgi?190423 , (broken) author URL is/was http://localhost/cgi-bin/ea.cgi?11231
      3. canonical_author record with ca_id=503021 is orphaned; title URL is http://localhost/cgi-bin/title.cgi?192212 , (broken) author URL is/was http://localhost/cgi-bin/ea.cgi?19035
      4. canonical_author record with ca_id=502481 is orphaned; title URL is http://localhost/cgi-bin/title.cgi?431031 , (broken) author URL is/was http://localhost/cgi-bin/ea.cgi?19035
      5. canonical_author record with ca_id=1549460 is orphaned; title URL is http://localhost/cgi-bin/title.cgi?1291354 , (broken) author URL is/was http://localhost/cgi-bin/ea.cgi?32527
      6. canonical_author record with ca_id=1670420 is orphaned; title URL is http://localhost/cgi-bin/title.cgi?1395397 , (broken) author URL is/was http://localhost/cgi-bin/ea.cgi?32576
      scripts $ ./remove_orphaned_canonical_authors.py -d
      Executing DELETE FROM canonical_author WHERE ca_id in (235146, 224977, 503021, 502481, 1549460, 1670420);

      Checksums:

      scripts $ sha256sum remove_orphaned_canonical_authors.py
      6b81cd7f018b966018b6063629316a84008001805618bc605d7ab4f489133002 remove_orphaned_canonical_authors.py

      scripts $ md5sum remove_orphaned_canonical_authors.py
      1ba9f9c9c0cb662c387e02149df75405 remove_orphaned_canonical_authors.py

      scripts $ sha1sum remove_orphaned_canonical_authors.py
      612dafb0b5b202601ef166b60cc36eb9fe65ba34 remove_orphaned_canonical_authors.py

       
  • Ahasuerus

    Ahasuerus - 2019-10-19

    I've tried running the script under Python 2.5.4. It errored out with the following message:

    remove_orphaned_canonical_authors.py:91: Warning: 'as' will become a reserved keyword in Python 2.6
    File "remove_orphaned_canonical_authors.py", line 91
    except getopt.GetoptError as err:
    ^
    SyntaxError: invalid syntax

     
    • ErsatzCulture

      ErsatzCulture - 2019-10-20

      Sorry about that - forgot about the alternative exception handling syntax not being available in 2.5.

      Attached is a (hopefully) fixed version, here's the diff:

      scripts $ diff remove_orphaned_canonical_authors.py remove_orphaned_canonical_authors.py~
      91c91
      < except getopt.GetoptError, err:


          except getopt.GetoptError as err:
      

      I have tested this on a fairly borked installation of Python 2.5.4, insofar as it fixes the syntax error, but then bombs out on a failure to import the MySQL module - but based on earlier pain with trying to get certain parts of stdlib to compile, I don't think I'll be able to cross that particular hurdle with that version of Python on the distro I'm currently using.

       
  • Ahasuerus

    Ahasuerus - 2019-10-20
    • status: open --> closed-fixed
    • assigned_to: ErsatzCulture
     
  • Ahasuerus

    Ahasuerus - 2019-10-20

    Fixed in scripts/remove_orphaned_canonical_authors.py, installed in SVN 465 on 2019-10-20. Closing the Bug.

     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB