Menu

Tree [r2187] /
 History

HTTPS access


File Date Author Commit
 branches 2013-09-14 janschreiber [r278] Branching smaller version.
 tools 2021-02-25 janschreiber [r2003] toAdd.txt: 1140 new words
 Binnen-I.txt 2012-03-10 janschreiber [r161] + Binnen-I.txt
 Linux.txt 2021-01-25 janschreiber [r1963] Try to remove trailing whitespace
 Quellen und Methoden.txt 2016-09-01 janschreiber [r604] Hinzugefügt Quellen und Methoden.txt
 Riesig.1-1-0.dic 2014-04-19 janschreiber [r312] Minor cleanup.
 add-to-hunspell.txt 2017-01-22 janschreiber [r775] + 2,500 word forms
 austriazismen.txt 2017-10-31 janschreiber [r1118] toAdd.txt: now almost 1000 word forms
 autocomplete.txt 2021-09-24 janschreiber [r2183] 1320 new word forms
 blacklist.txt 2019-04-26 janschreiber [r1449] Remove unwanted word forms, thanks to Ivan Panc...
 german.dic 2021-10-01 janschreiber [r2187] + 1,600 word forms
 german_r261.7z 2013-09-05 janschreiber [r264]
 german_r271.7z 2013-09-11 janschreiber [r272] Last minor update before adding 200,000 word fo...
 helvetismen.txt 2021-07-08 janschreiber [r2132] 800 new word forms
 hunspell_false_negatives.txt 2015-02-21 janschreiber [r382] ~ 2,000 new word forms.
 hunspell_words.txt 2021-10-01 janschreiber [r2187] + 1,600 word forms
 philo.dic 2012-03-24 janschreiber [r175] A few minor additions.
 readme.txt 2021-05-01 janschreiber [r2078] toAdd.txt: 870 new words
 recent.txt 2021-10-01 janschreiber [r2187] + 1,600 word forms
 similar_words.txt 2020-05-25 janschreiber [r1730] similar word
 toAdd.txt 2021-10-01 janschreiber [r2187] + 1,600 word forms
 toRemove.txt 2020-04-08 janschreiber [r1684] + 400 word forms, remove duplicates
 uppercase_candidates.txt 2018-08-22 janschreiber [r1278] toAdd.txt: now more than 230 word forms
 variants.dic 2019-09-08 janschreiber [r1488] remove some invalid forms, thanks to Ivan Panch...
 words_to_add.txt 2021-10-01 janschreiber [r2187] + 1,600 word forms

Read Me

Free German Dictionary Readme
=============================

What is it?
¯¯¯¯¯¯¯¯¯¯¯
That's easy enough to answer: A list of German words as plain text file
with slightly more than two million entries (including inflected forms).
The format is one word per line, alphabetically ordered, ANSI-encoded 
(Latin-1), with Windows line endings (CR-LF).
It is mainly distributed as a 7-Zip archive (german.7z).
Other downloads available include binary Aspell dictionaries, for both 
German and Swiss spelling (aspell_dict.zip), and a version that works
with the free editor PSPad (PSPad.7z).


How was it made?
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
The short answer is: basically by running Hunspell on a huge corpus. For 
a somewhat longer version, please refer to "Quellen und Methoden.txt" 
(in German) for now. I'm in the process of writing a Medium 
article with more details on how I made this.


What is it good for?
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
That's another easy question. It can be used for all kinds of purposes,
including dictionary attacks on weak passwords, but also autocompletion, 
word games, and such. Its chief purpose, however, is to be used as the 
main dictionary for the free command-line based spell checker GNU Aspell. 
That's why most entries were, and all forthcoming entries will be, very 
carefully spell-checked.


GNU Aspell? Why would I want to use that instead of Hunspell?
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Most modern spell-checking software uses advanced algorithms that allow
for compound words, such as the (in)famous "Donaudampfschifffahrts-
gesellschaftskapitänsmützenabzeichen." While this is undeniably an
advantage over the list-based approach taken by Aspell, the downside
of it is that a large number of misspellings will not be detected unless
they are expressly blacklisted. "Vorgesetze," "Währungsfond," "National-
soziallisten," "Uhrlaubantrag" are cases in point. Most spell checkers do
not recognize those as errors, but Aspell does. On the minus side, there
will inevitably be plenty of false positives, even with a huge dictionary
like this one.


How to use the word list as input for Aspell
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Obviously, you'll need GNU Aspell as a prerequisite. Get it from
http://aspell.net

Most Windows users will want to download the binary Win32 port
and the precompiled German dictionaries by Björn Jacke, available from
http://aspell.net/win32/

Install both to a directory of your choice, in an MS Windows environment
preferably something like %PROGRAMFILES%\aspell\

After completing the installation process, navigate to your aspell\bin
directory in a command line and type:

	aspell --lang=de create master ./de-only.rws < path\to\german.dic

This will create the file de-only.rws, which you should move to the
aspell\dict folder, replacing the one that comes with Björn's German
language pack for Aspell.

If you want a more tolerant check, get the file
http://sourceforge.net/p/germandict/code/HEAD/tree/variants.dic
and merge it into the word list before compiling. Be warned that this
file contains some words that aren't correct according to Duden, though.

Rather than compiling the dictionary for yourself, you may want to 
download my precompiled Aspell dictionary files provided at
http://sourceforge.net/projects/germandict/files/aspell_dict_bin_windows.zip
(compiled with Aspell 0.50.3 on 32-bit Windows).
Just copy the contents of the archive to the dict subfolder of your
Aspell installation.


Looks like a lot of tedious work. Can I help?
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Yes. The Austrian and Swiss word lists are quite poor and need 
someone to look after them, for one thing. Also, you could help 
by proofreading various lists with tens of thousands of words. 
Contact me via the forum or janschreiber (at) users.sf.net.
And I never say no to a user dictionary sent via email.


Credits
¯¯¯¯¯¯¯
This project is a dwarf on the shoulders of giants. Thanks to the following 
people who made it possible:
- Klaus Reimann (for creating a huge word list)
- Ivan Panchenko and Werner Lemberg (for pointing out errors)
- Kevin Atkinson (GNU Aspell)
- László Németh (Hunspell)
- Björn Jacke (igerman98 dictionaries)
- Franz Michael Baumann (extended "DE-frami" Hunspell dictionary)
- Marcin Miłkowski and Daniel Naber (LanguageTool)
- Wolfgang Lezius (Morphy)
- Jan Fiala (PSPad)
- Don Ho (Notepad++)
- Jens Lorenz (old spell checker for Notepad++)
- Sergey Semushin (newer DSpellCheck plugin for Notepad++)
- Kim Haskell and Denis G. Sureau (Dictionary maker tools)
- Matthias Hüning (TextSTAT)
- Jimbo Wales and the Wikipedia community (Wikipedia)
- various OpenThesaurus.de contributors (old Joe and Synonymfresser in particular)
- canoo.net (online dictionary)
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.