Menu

#788 Create a cleanup report to find 'Suspected Duplicate Authors'

Approved
open-accepted
None
5
2016-01-17
2015-02-19
Ahasuerus
No

Create a cleanup report to find 'Suspected Duplicate Authors'.

Discussion

  • Ahasuerus

    Ahasuerus - 2015-02-19

    Implemented in:

    mod/cleanup_report.py 1.26
    mod/common.py 1.50
    nightly/nightly_update.py 1.93
    scripts/add_cleanup_id_2.sql 1.1
    

    Installed in r2105-055 on 2015-02-19. The current version covers authors whose names start with the letters 'I', 'O', 'Q', 'U', 'V', 'X', 'Y', and 'Z'. It uses the Hamming distance algorithm. In the future we will add all the other letters of the alphabet and consider implementing other, more time-consuming, algorithms like Jaro distance.

     
  • Ahasuerus

    Ahasuerus - 2015-02-19
     
  • Anonymous

    Anonymous - 2015-02-26

    Added the letter 'F' in:

    mod/cleanup_report.py 1.28
    nightly/nightly_update.py 1.95
    

    Installed in r2015-058 on 2015-02-25.

     
  • Ahasuerus

    Ahasuerus - 2015-02-26

    Added the letter 'W' in:

    mod/cleanup_report.py 1.29
    nightly/nightly_update.py 1.96
    

    Installed in r2015-059 on 2015-02-26.

     
  • Ahasuerus

    Ahasuerus - 2015-03-03

    Added 'K' and 'H' and moved the report to a separate weekly run:

    mod/cleanup_report.py 1.30
    nightly/nightly_update.py 1.97
    

    Installed in r2015-061 on 2015-03-02.

     
  • Ahasuerus

    Ahasuerus - 2015-03-03

    Corrected a bug in nightly/nightly_update.py 1.98. Installed in r2105-062 on 2015-03-02.

     
  • Ahasuerus

    Ahasuerus - 2015-03-04

    Corrected a bug with weekly processing colliding with nightly processing in nightly/nightly_update.py 1.99. Installed in r2015-064 on 2015-03-04.

     
  • Ahasuerus

    Ahasuerus - 2015-03-11

    Added the letter 'T' in:

    mod/cleanup_report.py 1.31
    nightly/nightly_update.py 1.100
    

    Installed in r2015-067 on 2015-03-10.

     
  • Ahasuerus

    Ahasuerus - 2015-03-13

    Add the letter G:

    mod/cleanup_report.py 1.32
    nightly/nightly_update.py 1.101
    

    Installed in r2015-069 on 2015-03-12.

     
  • Ahasuerus

    Ahasuerus - 2015-03-16

    Added 'E' in:

    mod/cleanup_report.py 1.36
    nightly/nightly_update.py 1.105
    

    Installed in r2105-076 on 2015-03-16.

     
  • Ahasuerus

    Ahasuerus - 2015-04-14

    Added 'B' in:

    mod/cleanup_report.py 1.38
    nightly/nightly_update.py 1.107
    

    Installed in r2015-083 on 2015-04-14.

     
  • Anonymous

    Anonymous - 2015-04-27

    Added 'P' in:

    mod/cleanup_report.py 1.41
    nightly/nightly_update.py 1.110
    

    Installed in r2015-087 on 2015-04-26.

     
  • Ahasuerus

    Ahasuerus - 2015-05-05

    Added 'L' in:

    mod/cleanup_report.py 1.43
    nightly/nightly_update.py 1.112
    

    Installed in r2015-090 on 2015-05-05.

     
  • Ahasuerus

    Ahasuerus - 2015-07-14

    Added 'S' in:

    mod/cleanup_report.py 1.49
    nightly/nightly_update.py 1.117
    

    Installed in r2015-101 on 2015-07-14.

     
  • Ahasuerus

    Ahasuerus - 2015-08-04

    Added 'D' in:

    mod/cleanup_report.py 1.51
    nightly/nightly_update.py 1.118
    

    Installed in r2015-106 on 2015-08-04.

     
  • Ahasuerus

    Ahasuerus - 2015-09-10
    • status: open --> open-accepted
     
  • Ahasuerus

    Ahasuerus - 2015-09-26

    Added 'A' in:

    edit/cleanup_report.py 1.7
    nightly/nightly_update.py 1.132
    

    Installed in r2015-146 on 2015-09-26.

     
  • Ahasuerus

    Ahasuerus - 2015-10-15

    Added 'R' in:

    edit/cleanup_report.py 1.10
    nightly/nightly_update.py 1.135
    

    Installed in r2015-171 on 2015-10-15. 'J' and 'M' are still outstanding.

     
  • Ahasuerus

    Ahasuerus - 2015-12-05

    Added 'M' in:

    edit/cleanup_report.py 1.17
    nightly/nightly_update.py 1.141
    

    Installed in r2015-246 on 2015-12-05. 'J' is still outstanding.

     
  • Ahasuerus

    Ahasuerus - 2016-01-17

    'J' added in:

    edit/cleanup_report.py 1.19
    nightly/nightly_update.py 1.143
    

    Installed in r2016-002 on 2016-01-16. Keeping the FR open since we may want to add suspect duplicate authors where the first letter is different. Preliminary logic:

    select a1.author_id, a1.author_canonical, a2.author_canonical
    from authors a1, authors a2
    where substr(a1.author_canonical,1,1)='Z'
    and a1.author_id != a2.author_id
    and substr(a1.author_canonical,2,999)=substr(a2.author_canonical,2,999)

    We may also want to check authors whose names start with non-alpha characters like apostrophe.

     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB