Re: [ciphertool-devel] more ideas for standards
Status: Beta
Brought to you by:
wart
|
From: Wart <wa...@ko...> - 2004-03-03 17:19:47
|
On Wed, 2004-03-03 at 07:49, Alex Griffing wrote: > Hi, > > >I think we should continue to use the above text, without the > >legalese at the top of the file... > > > > > OK. I took off everything before the line 'Frankenstein, or the Modern > Prometheus', and I removed everything after the line 'lost in darkness > and distance.' > The new MD5 is: 8306191547cfd7ca2d04a0d7604fc852 > > The main thing I'm trying to do is provide some standards for comparing > search and scoring techniques. The standards I'm proposing aren't even > necessarily for inclusion and distribution with the end-user ciphertools > package, but rather for researching better methods. Also, people from > outside the project might be interested in comparing their techniques > and this would make sure everyone is on the same page. Setting up a fixed standard for comparing methods is a good idea, but I think we do need to take it a step further and publish those standards so that others can reproduce the results, either with ciphertool or their own software. That's why I think we should make an add-on package that includes only these standard wordlists and n-gram source texts. > >The word list that I've been using for my personal solving is version 15 > >of the UKACD: > >http://www.ori.org/~kenl/projects/wordlist/UKACD-readme.htm > >I've made some slight modifications, adding a few new words and removing > >a few uncommon others. > > > I've read a bit more about the SCOWL word lists: > http://wordlist.sourceforge.net/scowl-readme > UKACD is apparently included in the SCOWL 'size 80' list and higher. > Also, according to this site*:** > http://bryson.ltd.uk/wordlist.html > UK Advanced Cryptics Dictionary* has been superseded by the *Edited > English* word list provided as part of TEA Crossword Helper > <http://bryson.ltd.uk/tea.html> and Sympathy > <http://bryson.ltd.uk/sympathy.html>. > > However, I think this new list isn't public. I took a closer look at the SCOWL word lists. They look like they'd be a more complete source than my modified ukacd list. > >The nice thing about this word list is that it > >includes plurals and all of the various verb conjugations. > > > These inflected forms are used in the SCOWL word lists as well. > > I suggest that we standardize on either a frozen version of your list or > the SCOWL size 80 list. Ultimately though, I think it would be best to > use a standard wordlist from one of the maintained public lists here: > http://wordlist.sourceforge.net/ > Again, this would be the list for comparing techniques, not necessarily > the most opimized for a certain method or ciphertext. Sounds good then. I still think this wordlist should be included as an extra download for ciphertool. If you can create a zip file with the Frankenstein text, the SCOWL size 80 word list, and a README with any courteous source acknowledgements, then we can put it up on the ciphertool download page. I just added a "massadd.tcl" script to the ciphertool CVS repository that can be used to convert a large wordlist into a ciphertool-ready dictionary. --Wart |