Menu

"bugs" decoding Non-ASCII headers

Help
Stefano
2009-03-22
2013-04-26
  • Stefano

    Stefano - 2009-03-22

    I made a couple of simple bug fixes:

    1)
    <<
       An encoded-word may not be more than 75 characters long, including
       charset, encoding, encoded-text, and delimiters.  If it is desirable
       to encode more text than will fit in an encoded-word of 75
       characters, multiple encoded-words (separated by CRLF SPACE) may be
       used.
    >> (see: http://www.faqs.org/rfcs/rfc1522.html\)

    When parsing a header with multiline encoded text, the input parameter in ImapDecode.Decode(string input) contains a SPACE (no CRLF though...) between the encoded lines that is not supposed to appear in the decoded string.

    Therefore I added the following lines of code:

                if (matches.Count > 1)
                    ret = ret.Replace("?= =?", "?==?");

    There is still a slight chance of "false positives" in case of a mixed encoded/unencoded header (I hardly found any) which happens to contain the "?= =?" substring in the unencoded text, but it is probably a risk we can take.

    The modified function is:

            internal static string Decode(string input)
            {
                if (input == "" || input == null)
                    return "";

                Regex regex = new Regex(@"=\?(?<Encoding>[^\?]+)\?(?<Method>[^\?]+)\?(?<Text>[^\?]+)\?=");
                MatchCollection matches = regex.Matches(input);

                string ret = input;

                //added lines
                if (matches.Count > 1)
                    ret = ret.Replace("?= =?", "?==?");

                foreach (Match match in matches)
                {
                    string encoding = match.Groups["Encoding"].Value;
                    string method = match.Groups["Method"].Value;
                    string text = match.Groups["Text"].Value;
                    string decoded;
                    if (method == "B")
                    {
                        byte[] bytes = Convert.FromBase64String(text);
                        Encoding enc = Encoding.GetEncoding(encoding);
                        decoded = enc.GetString(bytes);
                    }
                    else
                        decoded = Decode(text, Encoding.GetEncoding(encoding));
                    ret = ret.Replace(match.Groups[0].Value, decoded);
                }
                return ret;
           }

    2)
    <<
           The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
           represented as "_" (underscore, ASCII 95.).  (This character may
           not pass through some internetwork mail gateways, but its use
           will greatly enhance readability of "Q" encoded data with mail
           readers that do not support this encoding.)  Note that the "_"
           always represents hexadecimal 20, even if the SPACE character
           occupies a different code position in the character set in use.
    >> (see: http://www.faqs.org/rfcs/rfc1522.html\)

    In this case, I simply replaced the '_' character with a space in the ImapDecode.Decode(string input, Encoding enc).

            internal static string Decode(string input, Encoding enc)
            {
                if (input == "" || input == null)
                    return "";
                string decoded;
                byte[] bytes;
               
                //added line
                input = input.Replace("_", " ");
                MatchCollection matches = Regex.Matches(input, @"\=(?<num>[0-9A-Fa-f]{2})");// Substring(input.IndexOf('=') + 1, 2);

                foreach (Match match in matches) //while (input.Contains("="))
                {
                    //string ttr = Regex.Match("input", @"=(?<num>[0-9A-Fa-f]{2})").Groups[num].Substring(input.IndexOf('=') + 1, 2);
                    //int i = int.Parse(ttr, System.Globalization.NumberStyles.HexNumber);
                    int i = int.Parse(match.Groups["num"].Value, System.Globalization.NumberStyles.HexNumber);
                    char str = (char)i;
                    input = input.Replace(match.Groups[0].Value, str.ToString());
                }
                bytes = System.Text.Encoding.Default.GetBytes(input);
                decoded = enc.GetString(bytes);
                return decoded;
            }

    Great library!

    Ciao
    Stefano

     
    • Michal Ziemski

      Michal Ziemski - 2009-04-02

      The code will still fail to decode non-ASCII characters.
      The problem is that
      char str = (char)i;
      bytes = System.Text.Encoding.Default.GetBytes(input);
      will not always convert back to "i" in bytes as (char) and GetBytes() are different conversion methods.

      I would write the procedure as follows:

              internal static string Decode(string input, Encoding enc)
              {
                  if (string.IsNullOrEmpty(input)) return string.Empty;

                  char[] chars = input.ToCharArray();
                  byte[] bytes = new byte[chars.Length];

                  int j = 0;
                  for (int i = 0; i < chars.Length; i++, j++)
                  {
                      if (chars[i] == '=')
                      {
                          i++;
                          if (chars.Length >= i + 2 &&
                              byte.TryParse(new string(chars, i, 2), System.Globalization.NumberStyles.HexNumber, System.Globalization.CultureInfo.InvariantCulture, out bytes[j]))
                              i++;
                          else
                              j--;
                      }
                      else if (chars[i] == '_')
                          bytes[j] = (byte)' ';
                      else
                          bytes[j] = (byte)chars[i];
                  }
                  return new string(enc.GetChars(bytes, 0, j));
              }

       
      • Stefano

        Stefano - 2009-04-03

        well done, thank you

         
    • Keith Kikta

      Keith Kikta - 2009-04-24

      Thanks the changes you guys made should be in the current version

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.