KDBX Non-Protected ProtectedString "Localized Names" Write handling code: Huh?

Rassilon
2013-01-14
2013-01-21
  • Rassilon
    Rassilon
    2013-01-14

    I'm trying to work on porting some of KeePass' .kdbx writing code to the keepassdroid code base in Java and the "Localized Name" section of writing out non-Protected ProtectedStrings had me very confused. (in 2.20.1 src)

    Maybe I'm just being dense, but the code doesn't make much sense at all since the data is being written out in UTF-8. (i.e. it's all Unicode all of the time)

    Is the code trying to prevent SafeXmlString from clobbering the data?

    Additionally, shouldn't SafeXmlString allow the surrogate range characters through to the XML layer since "char" in .Net is a UCS-2/UTF-16 representation of Unicode? i.e. the UTF-8 conversion will map the surrogate pairs of .Net strings into the correct 0x10000-0x1FFFF range of emitted UTF-8 bytes.

    This code is extra odd because I don't see any reversal of the transformation on the read side either.

    Apologies if you've covered this before or I'm being extra dense.

    Thanks,
    Bill

     
  • Dominik Reichl
    Dominik Reichl
    2013-01-14

    You can safely ignore the localized name code. In standard KeePass builds, m_bLocalizedNames is false.

    In the early days, there was a plugin that enabled this functionality to work around a bug. However, on modern operating systems, it's not required anymore.

    Best regards
    Dominik

     
    • Rassilon
      Rassilon
      2013-01-18

      Thanks! I'll ignore the localized name code then.

      What about my comment about SafeXmlString? Shouldn't it allow surrogate ranges through so that the UTF-8 encoding code can encode them properly?

      Bill

       
  • Dominik Reichl
    Dominik Reichl
    2013-01-19

    Yes, I think that's a good idea. I've now added support for surrogates.

    As you seem to be a Unicode expert, it would be great if you could verify the new code. The intention is to pass through valid surrogate pairs and silently ignore invalid ones (SafeXmlString must remove all invalid XML characters, no throw).

    public static string SafeXmlString(string strText)
    {
        Debug.Assert(strText != null); // No throw
        if(string.IsNullOrEmpty(strText)) return strText;
    
        int nLength = strText.Length;
        StringBuilder sb = new StringBuilder(nLength);
    
        for(int i = 0; i < nLength; ++i)
        {
            char ch = strText[i];
    
            if(((ch >= '\u0020') && (ch <= '\uD7FF')) ||
                (ch == '\u0009') || (ch == '\u000A') || (ch == '\u000D') ||
                ((ch >= '\uE000') && (ch <= '\uFFFD')))
                sb.Append(ch);
            else if((ch >= '\uD800') && (ch <= '\uDBFF')) // High surrogate
            {
                if((i + 1) < nLength)
                {
                    char chLow = strText[i + 1];
                    if((chLow >= '\uDC00') && (chLow <= '\uDFFF')) // Low sur.
                    {
                        sb.Append(ch);
                        sb.Append(chLow);
                        ++i;
                    }
                    else { Debug.Assert(false); } // Low sur. invalid
                }
                else { Debug.Assert(false); } // Low sur. missing
            }
    
            Debug.Assert((ch < '\uDC00') || (ch > '\uDFFF')); // Lonely low sur.
        }
    
        return sb.ToString();
    }
    

    Thanks and best regards,
    Dominik

     
  • Rassilon
    Rassilon
    2013-01-21

    Looks good to me.