Menu

#278 Search on backslash characters fails

v1.0 (example)
closed-fixed
3
2022-06-08
2012-10-28
Ahasuerus
No

Title and publication searches that include the backslash () character in the search string come back with results whose titles end with the percent (%) character.

2013-12-26 update: It turns out that the MySQL LIKE operator (unlike the "=" operator) requires all backslashes to be escaped twice. db.escape_string can't do it because it doesn't know whether the string will be used with a "=" or a LIKE, so we need to do the double-escaping ourselves.

2022-06-02 update: The immediate problem with double-escaping of backslashes has been fixed. However, the collation that we use for ISFDB tables, latin1_swedish_ci (see https://collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html) considers the backslash character, "Ä", "Æ", "ä" and "æ" to be the same for search purposes. Similarly, "[" - "Å" - "å" are the same and "]" - "Ö" - "ö" are the same.

As near as I can tell, there isn't much we can do about this problem as long as we are using this collation. We will need to revisit the issue once we upgrade to Unicode.

Discussion

  • Ahasuerus

    Ahasuerus - 2013-12-27
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1 +1,3 @@
     Title and publication searches that include the backslash \(\\\) character in the search string come back with results whose titles end with the percent \(%\) character.
    +
    +2013-12-26 update: It turns out that the MySQL LIKE operator (unlike the "=" operator) requires all backslashes to be escaped twice. db.escape_string can't do it because it doesn't know whether the string will be used with a "=" or a LIKE, so we need to do the double-escaping ourselves.
    
    • Group: --> v1.0 (example)
    • Priority: 5 --> 3
     
  • Ahasuerus

    Ahasuerus - 2022-06-02
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,3 +1,5 @@
    -Title and publication searches that include the backslash \(\\\) character in the search string come back with results whose titles end with the percent \(%\) character.
    +Title and publication searches that include the backslash (\) character in the search string come back with results whose titles end with the percent (%) character.
    
     2013-12-26 update: It turns out that the MySQL LIKE operator (unlike the "=" operator) requires all backslashes to be escaped twice. db.escape_string can't do it because it doesn't know whether the string will be used with a "=" or a LIKE, so we need to do the double-escaping ourselves.
    +
    +2022-06-02 update: The immediate problem with double-escaping of backslashes has been fixed. However, the collation that we use for ISFDB tables, latin1_swedish_ci (see https://collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html) considers the backslash character, "Ä", "Æ", "ä" and "æ" to be the same for search purposes. As near as I can tell, there isn't much we can do about it as long as we are using this collation. We will need to revisit the issue once we upgrade to Unicode.
    
    • status: open --> closed-fixed
    • assigned_to: Ahasuerus
     
  • Ahasuerus

    Ahasuerus - 2022-06-02

    Fixed in:

    biblio/adv_search_results.py
    biblio/se.py
    

    Installed in SVN 928 on 2022-06-02. Note the Description update to reflect the current state of this issue. Closing the Bug report.

     
  • Ahasuerus

    Ahasuerus - 2022-06-02
    • summary: Search on backslah fails --> Search on backslash characters fails
     
  • Ahasuerus

    Ahasuerus - 2022-06-03
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -2,4 +2,6 @@
    
     2013-12-26 update: It turns out that the MySQL LIKE operator (unlike the "=" operator) requires all backslashes to be escaped twice. db.escape_string can't do it because it doesn't know whether the string will be used with a "=" or a LIKE, so we need to do the double-escaping ourselves.
    
    -2022-06-02 update: The immediate problem with double-escaping of backslashes has been fixed. However, the collation that we use for ISFDB tables, latin1_swedish_ci (see https://collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html) considers the backslash character, "Ä", "Æ", "ä" and "æ" to be the same for search purposes. As near as I can tell, there isn't much we can do about it as long as we are using this collation. We will need to revisit the issue once we upgrade to Unicode.
    +2022-06-02 update: The immediate problem with double-escaping of backslashes has been fixed. However, the collation that we use for ISFDB tables, latin1_swedish_ci (see https://collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html) considers the backslash character, "Ä", "Æ", "ä" and "æ" to be the same for search purposes. Similarly, "[" - "Å" - "å" are the same and "]" - "Ö" - "ö" are the same.
    +
    +As near as I can tell, there isn't much we can do about this problem as long as we are using this collation. We will need to revisit the issue once we upgrade to Unicode.
    
     
  • Ahasuerus

    Ahasuerus - 2022-06-08

    Part 2 - Fixed a conflict with db.escape which was causing a Python error in some Advanced Searches. Implemented in biblio/adv_search_results.py, installed in SVN 930 on 2022-06-08. Keeping the Bug report closed.

     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB