ciphertool-devel Mailing List for Tcl ciphertool (Page 3)
Status: Beta
Brought to you by:
wart
You can subscribe to this list here.
| 2004 |
Jan
|
Feb
(3) |
Mar
(23) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
| 2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
|
From: Wart <wa...@ko...> - 2004-03-24 22:00:54
|
On Mon, 2004-03-22 at 13:53, Alex Griffing wrote: > > I think this is enough of an improvement to warrant a new release. > > I'm willing to help with the windows version if necessary. That would be really helpful. The only thing I think we should add to the Windows installer is an informational pane that tells the user that they must install Tcl first, with a link to the ActiveTcl download page. I'm also looking at providing a Debian ".deb" package for ciphertool. I installed Debian Linux on a spare hard drive to test this out. > > Any takers on providing some of this ciphertext? To > > make it easy, we could probably just use the same plaintext string > > encrypted with different cipher algorithms. > > I've included a list of single-line messages that I've been using for > monoalphabetic cryptanalysis testing. These are supposed to be short, > not maliciously constructed, content-neutral, and in the public domain. > Maybe they could be of use somewhere in ciphertool. Longer text items > could be needed for ciphertext-only analysis of messages encoded by > stronger methods. In any case I'm fine with the inclusion of ciphertext > not holding up the ciphertool release. Thanks. I'll pick a long one from your list and encrypt it using various techniques (aristocrat, bifid, vigenere) for the ciphertool samples. I'd rather include the ciphertext, just so that users have a starting point when they first use the GUI. The included ciphertext doesn't necessarily have to be the same as the "standard reference ciphertext" that we will want later on. > I'm also interested in finding a longer list of public domain quotes or > sayings like these. I couldn't find any in project gutenberg, and the > other sources I found were non-public-domain collections. I created the > short list here mostly from various proverbs. I probably won't get to work on the release until this weekend. I've got some non-ciphertool work that has priority. --Mike |
|
From: Alex G. <za...@za...> - 2004-03-22 21:53:40
|
> I think this is enough of an improvement to warrant a new release. I'm willing to help with the windows version if necessary. > Any takers on providing some of this ciphertext? To > make it easy, we could probably just use the same plaintext string > encrypted with different cipher algorithms. I've included a list of single-line messages that I've been using for monoalphabetic cryptanalysis testing. These are supposed to be short, not maliciously constructed, content-neutral, and in the public domain. Maybe they could be of use somewhere in ciphertool. Longer text items could be needed for ciphertext-only analysis of messages encoded by stronger methods. In any case I'm fine with the inclusion of ciphertext not holding up the ciphertool release. I'm also interested in finding a longer list of public domain quotes or sayings like these. I couldn't find any in project gutenberg, and the other sources I found were non-public-domain collections. I created the short list here mostly from various proverbs. - Alex Always be a little kinder than necessary. If you think education is expensive, try ignorance. Time is a great teacher, but unfortunately it kills all its pupils. It's tough to make predictions, especially about the future. Failure is only the opportunity to begin again more intelligently. Act as if it were impossible to fail. It is better to be defeated on principle than to win on lies. Genius is nothing but a great aptitude for patience. Old age isn't so bad when you consider the alternative. Any sufficiently advanced technology is indistinguishable from magic. To be or not to be, that is the question. I came, I saw, I conquered. Now is the time for all good men to come to the aid of their country. I think, therefore I am. There are three kinds of lies: lies, damned lies, and statistics. Give me a place to stand, and I will move the Earth. Advice is least heeded when most needed. After the game, the king and the pawn go into the same box. Examine what is said, not him who speaks. He who builds by the roadside has many surveyors. |
|
From: Wart <wa...@ko...> - 2004-03-22 18:40:19
|
I think it's about time that we had another release of ciphertool. The new scoring code is pretty much complete now. The unit tests and my own autosolving experience has shown it to be pretty stable. I've also gone through and updated/added to much of the documentation. The the old "stat digram", "stat trigram", and "stat tetragram" methods have also been removed in favor of the new scoring commands. I think this is enough of an improvement to warrant a new release. I'd also suggest that we bump the minor vesion from 1.5 to 1.6, since this new scoring code is a pretty big change. The only thing that still needs to be addressed before the new release is the addition of some sample ciphers, like Alex suggested in an earlier message. Any takers on providing some of this ciphertext? To make it easy, we could probably just use the same plaintext string encrypted with different cipher algorithms. We could provide the ciphertext in simple text files for now, and use the next release to change the file format to something more portable. Any takers? --Wart |
|
From: Wart <wa...@ko...> - 2004-03-19 19:51:52
|
Ok, that sounds good to me. I'll leave the 4-gram log table and remove the rest. The 4-gram log table will serve as an example of how to name the scoring tables so that they can be loaded automatically by ciphertool. --Wart On Wed, 2004-03-17 at 19:44, Alex Griffing wrote: > My suggestion would be to include the minimum number of ngram statistic > files necessary for the solvers to work to some degree. If someone > wants to experiment further, they can generate all the other data files > from the source frankenstein text by running a script that will be > included with ciphertool. This seems like a good compromise between > download speed and ease of use, and it covers future ngram file types. > To be honest though, I have a fast enough connection that as long as the > download is under about 10MB, it doesn't really matter to me. > > - Alex > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > ciphertool-devel mailing list > cip...@li... > https://lists.sourceforge.net/lists/listinfo/ciphertool-devel |
|
From: Alex G. <za...@za...> - 2004-03-18 03:44:58
|
My suggestion would be to include the minimum number of ngram statistic files necessary for the solvers to work to some degree. If someone wants to experiment further, they can generate all the other data files from the source frankenstein text by running a script that will be included with ciphertool. This seems like a good compromise between download speed and ease of use, and it covers future ngram file types. To be honest though, I have a fast enough connection that as long as the download is under about 10MB, it doesn't really matter to me. - Alex |
|
From: Wart <wa...@ko...> - 2004-03-18 03:15:15
|
I just checked a set of 4- and 5-gram data files into CVS. There are a total of 4 files: 4-gram raw frequency counts, 4-gram log frequencies, 5-gram raw frequency counts, and 5-gram log frequencies. The 4 files will double the size of the ciphertool download package, making the source archive just over 1 Mb. Is it worth it? Or should I just include the 4-gram frequency logs and no others? --Wart |
|
From: Wart <wa...@ko...> - 2004-03-08 04:33:15
|
The CVS repository now contains the new pluggable scoring system.
Here's a short summary of the new commands:
score create <type>
Creates a new scoring object. The type must be one of "digramlog",
"digramcount", "trigramlog", "trigramcount", "ngramlog", "ngramcount",
or "wordtree". The return value is the name of a new Tcl command that
will be used to store the scoring data.
$scoreObj add <element> <value>
Add values to the new scoring table using the new scoring command.
$scoreObj normalize
"normalize" the data in the scoring table. This is not a real
mathematical normalization, but instead represents an operation that is
performed on the entire table. For example, the "normalize" subcommand
for the digramlog scoring table will take the natural logarithm of each
element in the scoring table. The "normalize" subcommand for the
2,3,n-gram count scoring tables does nothing.
$scoreObj value <element>
Get a value from the scoring table.
Sample usage:
set scoreObj [score create digramlog]
$scoreObj add er 100
$scoreObj add ab 20
$scoreObj normalize
set plaintextValue [$scoreObj value "mydoghasfleas"]
There are also 3 new Tcl procedures for loading and saving scoring
tables:
Scoredata::generate <scoreobj> <filename>
Generate scoring data from a text file.
Scoredata::saveData <scoreObj> <filename>
Save a scoring table to a file sp that it can be loaded at a later
time.
Scoredata::loadData <scoreobj> <filename>
Load a scoring table that was previously saved. If no filename is
given then an attempt is made to locate an appropriate default scoring
table.
One new program was also added to generate scoring tables from a text
file:
genscores -type <scoretype> -output <outfile> [-verbose] [-elemsize n]
[-nonormalize] [-validchars ...]
-type must be one of the known scoring types above.
-output is the name of the file where the table will be saved.
-elemsize must be used with the ngramlog and ngramcount types to
indicate the size of the ngrams.
-nonormalize skips the step of normalizing the data. By default all
tables are normalized before saving.
-validchars is the set of characters that are allowed in the scoring
table. By default this is a-z, but it can be any set of ascii
characters.
The default scoring tables were generated using the reference data using
the commands:
genscores -type digramlog -output digramlogData.tcl frank14.txt
genscores -type digramcount -output digramcountData.tcl frank14.txt
genscores -type trigramlog -output trigramlogData.tcl frank14.txt
genscores -type trigramcount -output trigramcountData.tcl frank14.txt
I'll add html documentation over the next few days, including tips on
writing your own scoring routine.
--Wart
|
|
From: Wart <wa...@ko...> - 2004-03-05 00:45:57
|
Hi Alex, I'm forwarding to the ciphertool-devel list since this is relevant to ciphertool development. On Thu, 2004-03-04 at 16:13, Alex Griffing wrote: > Hi Mike, > > >I had a concern about the characters that > >should be allowed in the n-grams. Currently any ascii value is allowed > >for the digrams and general n-grams, but only a-z is allowed for the > >trigams. I only allowed a-z for the trigrams to preserve space. If I > >allow any ascii character for the trigrams, the space requirements go up > >from 140k (26^3*8) to 134M (256^3*8) for a single trigram table. > >However, the general n-gram data structure is better suited for sparse > >matrices, so if you need full ascii support for trigrams, just use a > >general n-gram with an element size of 3. I don't use the general > >n-gram data structure for digrams because the digram lookups (simple > >array lookups) are much much faster than the general n-gram lookup. > > > > > > > Ahh, sparse general n-grams. Woo hoo! I don't know what data structure > you're using, but I've tried both a tree structure and a hash > structure. The tree structure might save a little memory, but hash > structure is so much faster that it's not even a contest. In fact, I've > empirically found that a hash lookup is about the same time as a table > lookup. I'm imagining the cpu pipeline calculating the next hash while > it waits for the current lookup to come back from the memory fetch, if > that makes sense. If not, ignore me :) Also, you can do blazing fast > hashes with 4, 5, and 6grams by packing [a..z]s into a 32 bit word > before hashing. (|a..z| == 26 <= 32 == 5 bits. 6(gram)*5bits == 30 > bits <= 32 == computer word). /rambling You're right, a hash table would have been faster. I already use a tree structure for storing the entire dictionary in memory, so I figured I would just reuse it for the general n-grams. I've found that the tree takes up about 1/3 the space of storing each word in its entirety. Maybe if I get motivated I'll try out a hash table for the n-grams and compare the memory usage.. But since I just got n-grams working with the tree structure, I'm going to leave it alone for now. > >However, the utility procedures that I wrote for populating the scoring > >tables restrict the contents from a-z. I've always used only a-z in my > >n-gram tables because I only ever use the n-grams for solving ciphers > >that don't expose the word divisions. I'll modify it so that you can > >specify a list of allowed characters for the n-grams, but default it to > >"a-z" if you don't specify a character list. > > > > > > > That makes sense, since you use wordlists instead for dealing with word > divisions. I'm fine with just using a-z since it's simpler. In fact, > it would be even better for scoring comparisons. I've used n-grams for a few aristocrats, and the ragbaby, and I haven't noticed that the lack of word divisions in the n-gram tables have had any effect on the outcome. This is something that would be interesting to test. > >I didn't write it so that it would "wrap" the corpus because the total > >contribution from the wrapped portion is miniscule compared to the > >contribution of the rest of the text. > > > > > Right. Do you take the 'wrapped' ngrams of a potential *decryption* (as > opposed to corpus) into account when scoring? I've noticed that if I > forget to do this that I end up with a decryption that either starts or > ends with a bunch of junk letters, since ngrams would be penalized > relatively less in those places. Nope. I hadn't thought about this either. I've never encountered the extra junk letters at the start/end of a decryption. I suspect this is because the plaintext at the very start/end is still highly correllated with the plaintext in the middle of the cipher. I can see how this would happen with shorter ciphers, though. I just checked in the last changes for general n-grams. The few items left to fix before I'm done are: 1) Allow custom sets of allowed characters to be specified when building scoring tables from samples of plaintext. 2) Add a "saveData" command for saving an existing scoring table to disk so that it can be loaded again at a later time. 3) Modify the existing programs to use the new scoring system instead of the older "stat digram", "stat trigram" methods. 4) Generate new default 2,3,4-gram tables from the standard text. 5) Add documentation for the new commands and procedures. These shouldn't take too long to fix, but they also shouldn't keep you from being able to use the new scoring system if you're going to be comparing scoring fucntions and corpuses (corpi?). --Wart |
|
From: Wart <wa...@ko...> - 2004-03-03 17:19:47
|
On Wed, 2004-03-03 at 07:49, Alex Griffing wrote: > Hi, > > >I think we should continue to use the above text, without the > >legalese at the top of the file... > > > > > OK. I took off everything before the line 'Frankenstein, or the Modern > Prometheus', and I removed everything after the line 'lost in darkness > and distance.' > The new MD5 is: 8306191547cfd7ca2d04a0d7604fc852 > > The main thing I'm trying to do is provide some standards for comparing > search and scoring techniques. The standards I'm proposing aren't even > necessarily for inclusion and distribution with the end-user ciphertools > package, but rather for researching better methods. Also, people from > outside the project might be interested in comparing their techniques > and this would make sure everyone is on the same page. Setting up a fixed standard for comparing methods is a good idea, but I think we do need to take it a step further and publish those standards so that others can reproduce the results, either with ciphertool or their own software. That's why I think we should make an add-on package that includes only these standard wordlists and n-gram source texts. > >The word list that I've been using for my personal solving is version 15 > >of the UKACD: > >http://www.ori.org/~kenl/projects/wordlist/UKACD-readme.htm > >I've made some slight modifications, adding a few new words and removing > >a few uncommon others. > > > I've read a bit more about the SCOWL word lists: > http://wordlist.sourceforge.net/scowl-readme > UKACD is apparently included in the SCOWL 'size 80' list and higher. > Also, according to this site*:** > http://bryson.ltd.uk/wordlist.html > UK Advanced Cryptics Dictionary* has been superseded by the *Edited > English* word list provided as part of TEA Crossword Helper > <http://bryson.ltd.uk/tea.html> and Sympathy > <http://bryson.ltd.uk/sympathy.html>. > > However, I think this new list isn't public. I took a closer look at the SCOWL word lists. They look like they'd be a more complete source than my modified ukacd list. > >The nice thing about this word list is that it > >includes plurals and all of the various verb conjugations. > > > These inflected forms are used in the SCOWL word lists as well. > > I suggest that we standardize on either a frozen version of your list or > the SCOWL size 80 list. Ultimately though, I think it would be best to > use a standard wordlist from one of the maintained public lists here: > http://wordlist.sourceforge.net/ > Again, this would be the list for comparing techniques, not necessarily > the most opimized for a certain method or ciphertext. Sounds good then. I still think this wordlist should be included as an extra download for ciphertool. If you can create a zip file with the Frankenstein text, the SCOWL size 80 word list, and a README with any courteous source acknowledgements, then we can put it up on the ciphertool download page. I just added a "massadd.tcl" script to the ciphertool CVS repository that can be used to convert a large wordlist into a ciphertool-ready dictionary. --Wart |
|
From: Alex G. <za...@za...> - 2004-03-03 16:03:39
|
Hi, >I think we should continue to use the above text, without the >legalese at the top of the file... > > OK. I took off everything before the line 'Frankenstein, or the Modern Prometheus', and I removed everything after the line 'lost in darkness and distance.' The new MD5 is: 8306191547cfd7ca2d04a0d7604fc852 The main thing I'm trying to do is provide some standards for comparing search and scoring techniques. The standards I'm proposing aren't even necessarily for inclusion and distribution with the end-user ciphertools package, but rather for researching better methods. Also, people from outside the project might be interested in comparing their techniques and this would make sure everyone is on the same page. >The word list that I've been using for my personal solving is version 15 >of the UKACD: >http://www.ori.org/~kenl/projects/wordlist/UKACD-readme.htm >I've made some slight modifications, adding a few new words and removing >a few uncommon others. > I've read a bit more about the SCOWL word lists: http://wordlist.sourceforge.net/scowl-readme UKACD is apparently included in the SCOWL 'size 80' list and higher. Also, according to this site*:** http://bryson.ltd.uk/wordlist.html UK Advanced Cryptics Dictionary* has been superseded by the *Edited English* word list provided as part of TEA Crossword Helper <http://bryson.ltd.uk/tea.html> and Sympathy <http://bryson.ltd.uk/sympathy.html>. However, I think this new list isn't public. >The nice thing about this word list is that it >includes plurals and all of the various verb conjugations. > These inflected forms are used in the SCOWL word lists as well. I suggest that we standardize on either a frozen version of your list or the SCOWL size 80 list. Ultimately though, I think it would be best to use a standard wordlist from one of the maintained public lists here: http://wordlist.sourceforge.net/ Again, this would be the list for comparing techniques, not necessarily the most opimized for a certain method or ciphertext. -Alex > > > |
|
From: Wart <wa...@ko...> - 2004-03-03 07:07:47
|
On Tue, 2004-03-02 at 21:04, Alex Griffing wrote: > Hi, > > I suggest that we identify some standard sources of language statistics > for the purposes of comparing search methods and scoring methods. Since > ciphertools already uses Frankenstein for n-grams, I propose that we use > it as a standard. Specifically, > http://www.gutenberg.net/etext93/frank14.txt > with MD5: > dc18c8d4c9ef449796f85e138f38d6f5 This sounds fine to me. The current n-gram tables in ciphertool aren't based on the exact text in this file, but it's pretty close. Chapter headings and the legal notice were removed before generating the n-grams. I think we should continue to use the above text, without the legalese at the top of the file, and distribute the text as an additional add-on package to future releases of ciphertool so that others can reproduce the n-grams. Luckily, the text is in the public domain, so we can redistribute it freely. > Presumably we will want to create another standard in the future based > on more texts. I agree. This is a good start though. My experience has been that the 2,3-gram frequencies from this text are good enough to solve most classical ciphers that aren't maliciously created. > A few standard word lists would also be useful for comparing methods > that act on words as the fundamental units. > This page might be a useful start: > http://wordlist.sourceforge.net/ > The SCOWL project (Spell Checker Oriented Word Lists) listed there looks > particularly interesting. > > Any specific suggestions or other thoughts? The word list that I've been using for my personal solving is version 15 of the UKACD: http://www.ori.org/~kenl/projects/wordlist/UKACD-readme.htm I've made some slight modifications, adding a few new words and removing a few uncommon others. The nice thing about this word list is that it includes plurals and all of the various verb conjugations. I've rearranged it so that it spans multiple files, each sorted by word length. This word list is also redistributable with the obligatory copyright acknowledgement. This should be included as part of the new add-on data package with the Frankenstein text above. If there are other word lists that look interesting, we should simply merge them with the existing word list to form one large complete list. Or do you think there is a need to keep separate word lists for different uses? Foreign languages word lists would be very useful in the future. I've got a few already for czech, danish, espreanto, french, german, interlingua, italian, latin, norwegian, spanish, and swedish. They're not as complete as the UKACD english word list, but still very useful. I have to check the licensing on these word lists before we start redistributing them. --Mike |
|
From: Alex G. <xa...@us...> - 2004-03-03 05:18:12
|
Hi, I suggest that we identify some standard sources of language statistics for the purposes of comparing search methods and scoring methods. Since ciphertools already uses Frankenstein for n-grams, I propose that we use it as a standard. Specifically, http://www.gutenberg.net/etext93/frank14.txt with MD5: dc18c8d4c9ef449796f85e138f38d6f5 Presumably we will want to create another standard in the future based on more texts. A few standard word lists would also be useful for comparing methods that act on words as the fundamental units. This page might be a useful start: http://wordlist.sourceforge.net/ The SCOWL project (Spell Checker Oriented Word Lists) listed there looks particularly interesting. Any specific suggestions or other thoughts? - Alex |
|
From: Wart <wa...@ko...> - 2004-03-02 21:53:19
|
Folks,
I've added a new score command to enable more flexible and
pluggable plaintext scoring methods. The 'score' command
uses the sum of digram log frequencies as a default scoring
method, but new scoring methods can be added and used
instead of the default. The current builtin scoring
methods are:
digramlog - Sum of logs of digram frequencies
digramcount - Sum of raw digram frequencies
wordtree - Sum of square of word lengths
More scoring methods, such as 3,4-gram frequency tables and
n-gram frequency and wordtree tables for foreign languages,
will be added in the future.
In addition, you can create a new Tcl procedure to return
the scoring values. The procedure must accept 2 arguments:
"value <string>". The first argument is always "value".
The second argument is the plaintext string to score. This
example scoring method scores by returning the length of the
string.
proc myScoringMethod {command string} {
# command is always "value" for now. Later we will
# implement new subcommands.
if {$command == "value"} {
return [string length $string]
} else {
error "Unknown subcommand $command"
}
}
The new scoring system is not yet part of any of the existing
ciphertool programs. It's just for testing purposes right now.
Hopefully within the next week or two I'll have added more
scoring methods (3,4-grams in particular) and will modify
the existing ciphertool programs to use the new scoring system.
Usage:
# Use the default sum-of-digram-logs on a string of plaintext
% score value "my dog has fleas"
1302.0
# Create a new sum-of-digram-logs scoring table based on a
# custom frequency table. Note that the "normalize"
# sub-command is a misnomer. In this case, it merely
# computes the log of every value that was added by "score
# add". This allows you to enter the raw frequency counts
# and let the score command calculate the logs for you.
% score create digramlog
score1
% score1 add my 2
my
% score add do 2
do
% score1 normalize
% score1 value "my dog has fleas"
1.38629436112
# Create a new sum-of-digram-logs scoring table based on a
# custom frequency table. In this example the input digram
# values have already been converted to log values, so the
# normalize sub-command is not used.
% score create digramlog
score1
% score1 add my 0.693
my
% score add do 0.693
do
% score1 normalize
% score1 value "my dog has fleas"
1.386
# The normalize sub-command for the sum-of-frequency-counts
# scoring table does nothing. This table stores only the
# raw frequency counts.
% score create digramcount
score1
% score1 add my 2
my
% score add do 2
do
% score1 normalize
% score1 value "my dog has fleas"
4.0
# The "wordtree" scoring table calculates scores based on
# the square of the lengths of valid words in the plaintext.
# 1- and 2-letter words are ignored. Again, normalization
# is not needed here.
% score create wordtree
score1
% score1 add my
my
% score1 add dog
dog
% score1 add has
has
% score1 add fleas
fleas
% score1 value "my dog has fleas"
43.0
# Change the default scoring method to a custom "wordtree"
# table. Note that we use the "score" command to get the
# value here instead of calling the new "score1" command.
# The "score default score1" command associates "score1" as
# the default scoring method.
% score create wordtree
score1
% score1 add dog
dog
% score default score1
score1
% score value "my dog has fleas"
9.0
# Change the default scoring method to the new custom
# scoring method above.
% score default myScoringMethod
myScoringMethod
% score value "my dog has fleas"
16
--Wart
|
|
From: Alex G. <za...@za...> - 2004-03-02 21:52:42
|
That file format looks about like what I had in mind, and I have a suggestion for cleaning it up a bit as more cipher types are added. Each cipher type could have expected and optional attributes. So an aristocrat might have 'plaintext' and 'key' expected, 'k1-key' optional, and 'period' neither expected nor optional. -Alex |
|
From: Wart <wa...@ko...> - 2004-02-28 06:17:30
|
That sounds like a good idea to me. ciphertool already has a format for
storing this kind of information. While the syntax looks very Tcl-ish,
it can be easily parsed in any language. Alternately, I wouldn't be
opposed to changing the ciphertool file format to make it more friendly
to other languages.
Here's an example of the current format:
type aristocrat
period 0
ciphertext
{qeceoemechkdioenkroczkdngxgwnefezrnkyirocdqeceinrubewzdmnbgmmhiuepruinwrqvynwgjgwikezgxgwneinyfhejvgdwfi}
plaintext {pogonology has nothing whatever to do with using a pogo
stick or walt kellys comic strip but refers however to study of beards}
key {abcdefghijklmnopqrstuvwxyz { kgaodeysfh ltnmpi cbrvuw}}
language english
keyword {abcdefghijklmnopqrstuvwxyz { kgaodeysfh ltnmpi cbrvuw}}
# Digram value: 0
# Trigram value: 0
# K1 key: kgaodeysfh ltnmpi cbrvuw
# abcdefghijklmnopqrstuvwxyz
# K2 key: dvufgjckr bmpoeq winyxz h
author g-man
title {Not what you might think.}
Each line contains a key/value pair separated by whitespace. Curly
braces can be used to enclose the value if it contains whitespace. A
single key/value pair must be all on the same line; they can't span
multiple lines. Lines beginning with '#' are comments that can be
ignored. A well known set of key values is used by the various
reading/writing programs to maintain consistency:
type
period
plaintext
ciphertext
key
keyword
author
title
language
Other keys can be used, but these should be a standard set used in all
files.
Feel free to make suggestions on how to improve it. I've got a few
ideas, but I'd like to hear from others first.
--Wart
On Fri, 2004-02-27 at 19:50, Alexander Griffing wrote:
> Hi all,
>
> I have a suggestion for a simple project that we might
> be able to successfully cooperate on. It wouldn't be
> tied to a specific programming language or operating
> system, and it would be re-usable in many different
> frameworks.
>
> The project would be to create a library of plaintext
> / ciphertext pairs, including the cipher type and key.
> This library would have several uses:
> *) As a way for verifying the accuracy of enciphering
> / deciphering implementations.
> *) As a benchmarking suite for comparing autosolving
> methods.
> *) As a source of preset cipher examples for users to
> solve while they're learning to use a program.
>
> Thoughts?
>
> - Alex
>
>
>
> __________________________________
> Do you Yahoo!?
> Get better spam protection with Yahoo! Mail.
> http://antispam.yahoo.com/tools
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> ciphertool-devel mailing list
> cip...@li...
> https://lists.sourceforge.net/lists/listinfo/ciphertool-devel
|
|
From: Alexander G. <upo...@ya...> - 2004-02-28 04:01:52
|
Hi all, I have a suggestion for a simple project that we might be able to successfully cooperate on. It wouldn't be tied to a specific programming language or operating system, and it would be re-usable in many different frameworks. The project would be to create a library of plaintext / ciphertext pairs, including the cipher type and key. This library would have several uses: *) As a way for verifying the accuracy of enciphering / deciphering implementations. *) As a benchmarking suite for comparing autosolving methods. *) As a source of preset cipher examples for users to solve while they're learning to use a program. Thoughts? - Alex __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools |
|
From: Wart <wa...@ko...> - 2004-02-26 05:02:39
|
I can build the tcipher.exe shell with mingw on Windows, but it crashes immediately upon startup. What gives? (just in case you can't guess, this is a test message. The crash is real, though.) --Wart |