Converting of aspell dictionaries to jortho

Forum
2009-03-16
2013-05-08
  • Dimitry Polivaev

    I found out that aspell (http://aspell.net/) dictionaries can be easily converted for use with jortho. It means that dictionaries for very many languages are already available under gnu license.

    http://aspell.net/man-html/Supported.html

    This procedure is based on web blog

    http://typethinker.blogspot.com/2008/02/fun-with-aspell-word-lists.html

    1. Install aspell with the dictionaries you want to convert
    2. Call it with as follows:

    aspell -l <language> dump master | aspell -l <language> expand | tr ' ' '\n' > dict.txt

    where <language> means the ISO 639 language code (see man aspell for details)

    for example for dutch you could use command

    aspell -l nl dump master | aspell -l nl expand | tr ' ' '\n' > dict.txt

    The generated file has your standard system encoding, probably UTF-8

    3. Use the following java program for generating the jortho dictionary file from the dict.txt

    calling it with arguments <language>  <input file name> <input file encoding>

    for example

    java Text2Book nl dict.txt UTF-8

    /*
    *  Copyright (C) 2009 Dimitry Polivaev
    *
    *  This program is free software: you can redistribute it and/or modify
    *  it under the terms of the GNU General Public License as published by
    *  the Free Software Foundation, either version 2 of the License, or
    *  (at your option) any later version.
    *
    *  This program is distributed in the hope that it will be useful,
    *  but WITHOUT ANY WARRANTY; without even the implied warranty of
    *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    *  GNU General Public License for more details.
    *
    *  You should have received a copy of the GNU General Public License
    *  along with this program.  If not, see <http://www.gnu.org/licenses/>.
    */
    package com.inet.jorthodictionaries;

    import java.io.BufferedOutputStream;
    import java.io.BufferedReader;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.io.PrintStream;
    import java.util.Arrays;
    import java.util.LinkedList;
    import java.util.SortedSet;
    import java.util.TreeSet;
    import java.util.zip.Deflater;
    import java.util.zip.DeflaterOutputStream;

    /**
    * @author Dimitry Polivaev
    * Mar 15, 2009
    */
    public class Text2Book {
        private String[] words;
        /**
         * @param args
         * @throws Exception
         */
        public static void main(String[] args) throws Exception {
            if(args.length != 3){
                System.err.println("usage: java Text2Book <language>  <input file name> <input file encoding>");
            }
            File in = new File(args[1]);
            String encoding = args[2];
            final Text2Book text2Book = new Text2Book();
            text2Book.loadWords(in, encoding);
            text2Book.save(args[0]);
        }
        private void loadWords(File in, String encoding) throws Exception {
            final FileInputStream fileInputStream = new FileInputStream(in);
            BufferedReader reader = new BufferedReader(new InputStreamReader(fileInputStream, encoding));
                SortedSet<String> wordList = new TreeSet<String>();
                for (String line = reader.readLine(); line != null;  line = reader.readLine()){
                    if(line.equals("")){
                        continue;
                    }
                    final String[] words = line.split("/");
                    for(String word:words){
                        wordList.add(word);
                    }
                }
               words = wordList.toArray(new String[wordList.size()]);
        }
        void save(String language) throws Exception{
                File dictFile = new File("dictionary_"+language+".ortho");
                OutputStream dict = new FileOutputStream(dictFile);
                dict = new BufferedOutputStream(dict);
                Deflater deflater = new Deflater();
                deflater.setLevel(Deflater.BEST_COMPRESSION);
                dict = new DeflaterOutputStream(dict, deflater);
                dict = new BufferedOutputStream(dict);
                PrintStream dictPs = new PrintStream(dict, false, "UTF8");
               
                //Speichern als Wordliste
                for(int i=0; i<words.length; i++){
                    dictPs.print( words[i] +'\n' );
                }
                //ps.close();
                dictPs.close();
                System.out.println("Dictionary size on disk (bytes):" + dictFile.length());
            }
    }

    Dimitry Polivaev

     
    • i-net software

      i-net software - 2009-03-17

      Thanks for the great tip.

      Volker Berlin

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks