Installed in r2105-055 on 2015-02-19. The current version covers authors whose names start with the letters 'I', 'O', 'Q', 'U', 'V', 'X', 'Y', and 'Z'. It uses the Hamming distance algorithm. In the future we will add all the other letters of the alphabet and consider implementing other, more time-consuming, algorithms like Jaro distance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Installed in r2016-002 on 2016-01-16. Keeping the FR open since we may want to add suspect duplicate authors where the first letter is different. Preliminary logic:
select a1.author_id, a1.author_canonical, a2.author_canonical
from authors a1, authors a2
where substr(a1.author_canonical,1,1)='Z'
and a1.author_id != a2.author_id
and substr(a1.author_canonical,2,999)=substr(a2.author_canonical,2,999)
We may also want to check authors whose names start with non-alpha characters like apostrophe.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Implemented in:
Installed in r2105-055 on 2015-02-19. The current version covers authors whose names start with the letters 'I', 'O', 'Q', 'U', 'V', 'X', 'Y', and 'Z'. It uses the Hamming distance algorithm. In the future we will add all the other letters of the alphabet and consider implementing other, more time-consuming, algorithms like Jaro distance.
Added the letter 'F' in:
Installed in r2015-058 on 2015-02-25.
Added the letter 'W' in:
Installed in r2015-059 on 2015-02-26.
Added 'K' and 'H' and moved the report to a separate weekly run:
Installed in r2015-061 on 2015-03-02.
Corrected a bug in nightly/nightly_update.py 1.98. Installed in r2105-062 on 2015-03-02.
Corrected a bug with weekly processing colliding with nightly processing in nightly/nightly_update.py 1.99. Installed in r2015-064 on 2015-03-04.
Added the letter 'T' in:
Installed in r2015-067 on 2015-03-10.
Add the letter G:
Installed in r2015-069 on 2015-03-12.
Added 'E' in:
Installed in r2105-076 on 2015-03-16.
Added 'B' in:
Installed in r2015-083 on 2015-04-14.
Added 'P' in:
Installed in r2015-087 on 2015-04-26.
Added 'L' in:
Installed in r2015-090 on 2015-05-05.
Added 'S' in:
Installed in r2015-101 on 2015-07-14.
Added 'D' in:
Installed in r2015-106 on 2015-08-04.
Added 'A' in:
Installed in r2015-146 on 2015-09-26.
Added 'R' in:
Installed in r2015-171 on 2015-10-15. 'J' and 'M' are still outstanding.
Added 'M' in:
Installed in r2015-246 on 2015-12-05. 'J' is still outstanding.
'J' added in:
Installed in r2016-002 on 2016-01-16. Keeping the FR open since we may want to add suspect duplicate authors where the first letter is different. Preliminary logic:
select a1.author_id, a1.author_canonical, a2.author_canonical
from authors a1, authors a2
where substr(a1.author_canonical,1,1)='Z'
and a1.author_id != a2.author_id
and substr(a1.author_canonical,2,999)=substr(a2.author_canonical,2,999)
We may also want to check authors whose names start with non-alpha characters like apostrophe.