Minor referential integrity issue I stumbled across in one of my projects that uses a local copy of ISFDB - records in canonical_author that point to non-existent author records.
MariaDB [isfdb]> select ca.author_id, author_canonical from canonical_author ca left outer join authors a on a.author_id = ca.author_id WHERE author_canonical IS NULL;
+-----------+------------------+
| author_id | author_canonical |
+-----------+------------------+
| 11231 | NULL |
| 19035 | NULL |
| 19035 | NULL |
| 19035 | NULL |
| 32527 | NULL |
| 32576 | NULL |
+-----------+------------------+
6 rows in set (1.34 sec)
I don't think that this is at all visible to the users of the site - the title which alerted me to this is Gordon Dickson's Spacepaw, and I don't see anything in the view or edit webpages that looks wrong or gives any indication of these bogus records.
http://www.isfdb.org/cgi-bin/title.cgi?431031
As such, I'm logging this ticket not to ask anyone to remove these records from the database or suchlike, but in case anyone wants to check the edit history - which AFAIK isn't in the database download, but does exist? - for these records to see if there's anything to worry about and/or if there's any cron-job report that could maybe benefit from reporting on records like these?
Alternatively, please close this ticket if it's a known issue and/or something you don't think is worth worrying about.
Anonymous
Early on, we had a fair number of problems with referential integrity. As I recall, some of them were due to server stability issues and some were due to user processes erroring out in the middle of filing a transaction.
At one point I fixed a bunch of database integrity issues using scripts, but it looks like I missed these 6 records. Considering the fact that the author IDs are in the 11,000-33,000 range and we are up to author ID 300,000, it's probably safe to say that these are very old problems. So much has changed in the last 10 years that whatever server/software issues caused thes records to get "stuck" in the first place no longer apply.
If you would like to create a script to delete these 6 records as a simple exercise, I will be happy to prettify the code, commit it to the repository and then run it on the main server.
OK, I'll be going away for a few days shortly - will be online, but won't have anything resembling a useful dev environment to hand - and will pick this issue back up when I return.
Attached is a Python script to report and - with -d argument - delete the offending records. Probably over-engineered for something that will (ideally) only ever be run once, but hopefully it's a reasonable compromise between the idioms/style I'm familiar with and those used in the ISFDB code.
NB: as stated in the header, I have only tested this with Python 2.7.15 (and MariaDB 10.1.33) - but I've avoided (AFAIK) libraries that are only in newer Pythons - in particular, using getopt rather than optparse or argparse.
This runs OK for me when placed in the scripts/ directory, which I assume is the preferred home for this sort of thing.
Checksums:
scripts $ sha256sum remove_orphaned_canonical_authors.py
6b81cd7f018b966018b6063629316a84008001805618bc605d7ab4f489133002 remove_orphaned_canonical_authors.py
scripts $ md5sum remove_orphaned_canonical_authors.py
1ba9f9c9c0cb662c387e02149df75405 remove_orphaned_canonical_authors.py
scripts $ sha1sum remove_orphaned_canonical_authors.py
612dafb0b5b202601ef166b60cc36eb9fe65ba34 remove_orphaned_canonical_authors.py
I've tried running the script under Python 2.5.4. It errored out with the following message:
remove_orphaned_canonical_authors.py:91: Warning: 'as' will become a reserved keyword in Python 2.6
File "remove_orphaned_canonical_authors.py", line 91
except getopt.GetoptError as err:
^
SyntaxError: invalid syntax
Sorry about that - forgot about the alternative exception handling syntax not being available in 2.5.
Attached is a (hopefully) fixed version, here's the diff:
I have tested this on a fairly borked installation of Python 2.5.4, insofar as it fixes the syntax error, but then bombs out on a failure to import the MySQL module - but based on earlier pain with trying to get certain parts of stdlib to compile, I don't think I'll be able to cross that particular hurdle with that version of Python on the distro I'm currently using.
Fixed in scripts/remove_orphaned_canonical_authors.py, installed in SVN 465 on 2019-10-20. Closing the Bug.