Create a cleanup report to find mismatches between a publication's title and the title of the pub's reference title. The exact algorithm remains to be determined, but note that TITLE TITLE: ..." and "TITLE TITLE (...)" are valid avariations for publication titles. "SERIES: TITLE" and "SERIES: TITLE: SUBTITLE" are also common, although perhaps suboptimal.
Proposed algorithm:
1.Find all pubs whose "reference" title doesn't contain the pub's title
2. Calculate the two strings' similarity
3. Add the pub to the cleanup report if the calculated similarity value is less than 50%
Anonymous
Original version (CHAPBOOKs only) implemented in:
Installed in r2015-132 on 2015-09-10. Keeping open.
Part 2 - ignore punctuation. Implemented in:
Installed in r2015-137 on 2015-09-16. Keeping the FR open.
Part 3 - Add OMNIBUSES, delete exclamation points from the list of ignored punctuation characters. Implemented in:
Installed in r2015-141 on 2015-09-18. Keeping the FR opem.
Part 4 - Updated the message displayed at the top of the page to mention OMNIBUses. Implemented in edit/cleanup_report.py 1.4, installed in r2015-142 on 2015-09-18.
Part 5 - Added the ability to ignore pubs in edit/cleanup_report.py 1.6. Installed in r2015-145 on 2015-09-21.
Part 6 - Added the rest of the publication types in:
Installed in r2016-170 on 2016-09-29. Closing.