Why JOrtho is not suitable

2009-04-11
2012-10-07
  • Hello,

    I am localizing FreeMind into Slovenian and found this project by browsing. I saw that spell-check support with JOrtho will be added in the future. Which is great, to have spell-check support. But ...

    The JOrtho project uses Wiktionary word files. Which is great for English language, which doesn't use cases. But most European languages use cases which means that the basic word form is not used for all cases but also very strange derivatives of the basic word form are "legal". Thus the Wiktionary words will spell-check only the first case as completely correct.

    Example: in Slovenian, word for bread is "kruh". But that is first case only. Here are all legal forms for all cases and all plural forms: kruh, kruha, kruhu, kruhom. But Wiktionary contains only "kruh". In English it is always only "bread", whatever the position and combination.

    I would rather see some spell-checking using the OpenOffice.org dictionaries, that support all word forms and really correctly spell-check almost any language as long as the wordlists are representative enough.

    Or at least adapt the JOrtho to use aff files and logic of OpenOffice.org/Hunspell project. Which leads me to the final question which is also a solution ...

    Why don't you use http://dren.dk/hunspell.html

    That way it will work for all western and probably eastern languages of this world!

    Best regards + you're welcome ;)
    Martin

     
    • Hello,

      I use jortho because it is java written.

      I found out that aspell (http://aspell.net/) dictionaries can be easily converted for use with jortho. It means that dictionaries for very many languages are already available under gnu license.

      http://aspell.net/man-html/Supported.html

      This procedure is based on web blog

      http://typethinker.blogspot.com/2008/02/fun-with-aspell-word-lists.html

      1. Install aspell with the dictionaries you want to convert
      2. Call it with as follows:

      aspell -l <language> dump master | aspell -l <language> expand | tr ' ' '\n' > dict.txt

      where <language> means the ISO 639 language code (see man aspell for details)

      for example for dutch you could use command

      aspell -l nl dump master | aspell -l nl expand | tr ' ' '\n' > dict.txt

      The generated file has your standard system encoding, probably UTF-8

      1. Use the following java program for generating the jortho dictionary file from the dict.txt

      calling it with arguments <language> <input file name> <input file encoding>

      for example

      java Text2Book nl dict.txt UTF-8

      /
      Copyright (C) 2009 Dimitry Polivaev

      This program is free software: you can redistribute it and/or modify
      it under the terms of the GNU General Public License as published by
      the Free Software Foundation, either version 2 of the License, or
      (at your option) any later version.

      This program is distributed in the hope that it will be useful,
      but WITHOUT ANY WARRANTY; without even the implied warranty of
      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
      GNU General Public License for more details.

      You should have received a copy of the GNU General Public License
      along with this program. If not, see <http://www.gnu.org/licenses/>.
      /
      package com.inet.jorthodictionaries;

      import java.io.BufferedOutputStream;
      import java.io.BufferedReader;
      import java.io.File;
      import java.io.FileInputStream;
      import java.io.FileOutputStream;
      import java.io.InputStreamReader;
      import java.io.OutputStream;
      import java.io.PrintStream;
      import java.util.Arrays;
      import java.util.LinkedList;
      import java.util.SortedSet;
      import java.util.TreeSet;
      import java.util.zip.Deflater;
      import java.util.zip.DeflaterOutputStream;

      /
      @author Dimitry Polivaev
      Mar 15, 2009
      */
      public class Text2Book {
      private String[] words;
      /

      @param args
      @throws Exception
      */
      public static void main(String[] args) throws Exception {
      if(args.length != 3){
      System.err.println("usage: java Text2Book <language> <input file name> <input file encoding>");
      }
      File in = new File(args[1]);
      String encoding = args[2];
      final Text2Book text2Book = new Text2Book();
      text2Book.loadWords(in, encoding);
      text2Book.save(args[0]);
      }
      private void loadWords(File in, String encoding) throws Exception {
      final FileInputStream fileInputStream = new FileInputStream(in);
      BufferedReader reader = new BufferedReader(new InputStreamReader(fileInputStream, encoding));
      SortedSet<String> wordList = new TreeSet<String>();
      for (String line = reader.readLine(); line != null; line = reader.readLine()){
      if(line.equals("")){
      continue;
      }
      final String[] words = line.split("/");
      for(String word:words){
      wordList.add(word);
      }
      }
      words = wordList.toArray(new String[wordList.size()]);
      }
      void save(String language) throws Exception{
      File dictFile = new File("dictionary_"+language+".ortho");
      OutputStream dict = new FileOutputStream(dictFile);
      dict = new BufferedOutputStream(dict);
      Deflater deflater = new Deflater();
      deflater.setLevel(Deflater.BEST_COMPRESSION);
      dict = new DeflaterOutputStream(dict, deflater);
      dict = new BufferedOutputStream(dict);
      PrintStream dictPs = new PrintStream(dict, false, "UTF8");

      //Speichern als Wordliste
      for(int i=0; i<words.length; i++){
      dictPs.print( words[i] +'\n' );
      }
      //ps.close();
      dictPs.close();
      System.out.println("Dictionary size on disk (bytes):" + dictFile.length());
      }
      }

      Please write how it works for you.

      Dimitry Polivaev

       
    • Sorry, this is just too complicated for me and probably for other users as well. Ortho demands word lists (wiktionaries) of at least 50000 entries, not really many languages have such wiktionaries.

      The procedure for conversion you suggest is too complicated for ordinary users.

      Maybe this Java spellchecker would be better? I do not know what dictionaries it uses:
      http://jazzy.sourceforge.net/
      http://www.jroller.com/JamesGoodwill/entry/using_jazzy_the_java_open

      Lp, m.

       
      • We shall offer jortho dictionaries for download after our web site goes online.

        Dimitry

         
      • Eric L.
        Eric L.
        2009-06-09

        Hi Martin,

        I had already looked at Jazzy a while ago, and:
        1.it seems pretty much dead (last sign of life in 2005)
        2. the only available dictionary is English...

        So, there are not too many alternatives,
        Eric