Menu

#1114 Duplicate finder fooled by hyphens

Approved
open
nobody
None
4
2017-11-30
2017-11-27
No

If two titles differ only in the type of hypen, e.g. " - " vs. "—" (hypen vs. M-dash), this is not detected by "Similar Title Mode" duplicate title check (it is caught by an "Aggressive Title Mode" search). I suggest that it really would be helpful if this very simple difference between titles were caught by the "Similar" check. An example, which I have left unmerged, is doing a duplicate check for the author "Roger L. Rogers". Rogers only has two titles to his credit, but should only have one.

Update by Ahasuerus: It turns out that the Similar Title mode of the Duplicate Finder already treats hyphens and m-dashes as the same character. The problem with Roger L. Rogers is that one title has spaces around the hyphen while the other title doesn't have spaces around the m-dash. We'll need to update the software to handle this case.

Discussion

  • Ahasuerus

    Ahasuerus - 2017-11-30
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1 +1,3 @@
     If two titles differ only in the type of hypen, e.g. " - " vs. "—" (hypen vs. M-dash), this is not detected by "Similar Title Mode" duplicate title check (it is caught by an "Aggressive Title Mode" search). I suggest that it really would be helpful if this very simple difference between titles were caught by the "Similar" check. An example, which I have left unmerged, is doing a duplicate check for the author "Roger L. Rogers". Rogers only has two titles to his credit, but should only have one.
    +
    +Update by Ahasuerus: It turns out that the Similar Title mode of the Duplicate Finder alread treats hypen and m-dash as the same character. The problem with Roger L. Rogers is that one title has spaces around the hypen while the other title doesn't have spaces around the m-dash. We'll need to update the software to handle this case.
    
     
  • Ahasuerus

    Ahasuerus - 2017-11-30
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,3 +1,3 @@
     If two titles differ only in the type of hypen, e.g. " - " vs. "—" (hypen vs. M-dash), this is not detected by "Similar Title Mode" duplicate title check (it is caught by an "Aggressive Title Mode" search). I suggest that it really would be helpful if this very simple difference between titles were caught by the "Similar" check. An example, which I have left unmerged, is doing a duplicate check for the author "Roger L. Rogers". Rogers only has two titles to his credit, but should only have one.
    
    -Update by Ahasuerus: It turns out that the Similar Title mode of the Duplicate Finder alread treats hypen and m-dash as the same character. The problem with Roger L. Rogers is that one title has spaces around the hypen while the other title doesn't have spaces around the m-dash. We'll need to update the software to handle this case.
    +Update by Ahasuerus: It turns out that the Similar Title mode of the Duplicate Finder already treats hyphens and m-dashes as the same character. The problem with Roger L. Rogers is that one title has spaces around the hyphen while the other title doesn't have spaces around the m-dash. We'll need to update the software to handle this case.
    
     

Anonymous
Anonymous

Add attachments
Cancel