#65 ReportPhonetics crashes with German umlauts

2.4
closed-fixed
Nils Meier
Report (12)
5
2006-02-27
2006-02-26
Carsten Müssig
No

I don't know why, but ReportPhonetics crahes with
German umlauts, maybe other special characters are
affected as well. I saved my gedcom data in different
encodings (Ansel, ASCII, Unicode), but it didn't change
anything. Although the Unicode version definately saves
the umlauts ö, ä, ü in its original format, the
internal representation in GenJ seems to be different
and may cause this bug.

Discussion

  • Nils Meier
    Nils Meier
    2006-02-27

    • status: open --> pending
     
  • Nils Meier
    Nils Meier
    2006-02-27

    Logged In: YES
    user_id=118458

    Carsten,

    can you please elaborate on "crashes" a little bit? I ran
    the report here and it works fine. What kind of behaviour do
    you see (any error-message, stacktrace, etc.)?

    Thanks
    Nils

     
  • Nils Meier
    Nils Meier
    2006-02-27

    • milestone: --> 2.4
     
    • status: pending --> open
     
  • Logged In: YES
    user_id=871965

    Hi Nils,

    sorry, my fault. I should have been more precise. Here is
    additional information:

    a)
    The report only crashes having the Soundex code choosen.

    b)
    The string causing the crash is "1 NAME Kurt /Brèuggemann/".
    This example is ANSEL-encoded, but GenJ also crashes when it
    is saved in Unicode. The encoding seems to have no effect.

    c)
    The error message is as follows (please note: there is also
    a poing d) further down!)
    java.lang.ArrayIndexOutOfBoundsException: 252
    at ReportPhonetics$Soundex.substituteAccents(Unknown Source)
    at ReportPhonetics$Soundex.encode(Unknown Source)
    at ReportPhonetics.encode(Unknown Source)
    at ReportPhonetics.printPhonetic(Unknown Source)
    at ReportPhonetics.start(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
    Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at genj.report.Report.start(Unknown Source)
    at genj.report.ReportView$ActionStart.execute(Unknown Source)
    at genj.util.swing.Action2$CallAsyncExecute.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

    d)
    I added some debug code to
    ReportPhonetics.Soundex.substitute() and modified it as
    follows where I added the lines beginning with !!:

    public String substituteAccents(String str) {
    !! System.out.println(str);
    StringBuffer result = new
    StringBuffer(str.length() * 2);
    for (int i = 0; i < str.length(); i++) {
    char c = str.charAt(i);
    !! System.out.println(c);
    String substitute = ACCENTS[c];
    if (substitute!=null) result.append(substitute);
    else result.append(c);
    }
    return result.toString();
    }

    It came to my attention that the String GenJ is trying to
    process is "Br?ggemann::". For some reason, even with
    Unicode, 'ü' is replaced by a '?'.

    Conclusion / Ideas:

    a)
    The problem is caused by the internal String reprensentation
    during GenJ's runtime.

    b)
    The ACCENTS array is filled with the values in
    ReportPhonetics.properties. Maybe a new version of this file
    was uploaded containing faulty data.

    c)
    Or is it because last names are starting with a back-slash.
    In Unicode they are used to start an escape sequence.

     
  • Nils Meier
    Nils Meier
    2006-02-27

    • status: open --> closed-fixed
     
  • Nils Meier
    Nils Meier
    2006-02-27

    Logged In: YES
    user_id=118458

    thanks Carsten, it's b) - we switched to utf8 encoding of
    the report properties files and that broke the intializer
    that reads umlaut/accent mappings from it.

    Fixed