Re: [ciphertool-devel] next steps
Status: Beta
Brought to you by:
wart
|
From: Wart <wa...@ko...> - 2004-03-31 07:48:43
|
On Tue, 2004-03-30 at 20:43, Alex Griffing wrote: > > >Major tasks: > > > >1) Add an "encode pt key" subcommand to all cipher types to make it easy > >to encrypt plaintext using a given key. This will make it possible to > >write scripts that encode the same plaintext or different plaintext > >samples in a variety of ways to analyze the statistical characteristics > >of the encryption methods. > > > >2) Implement a new more portable file format for cipher data. Some > >discussion has already begun on this. We need to continue the > >discussion and come to a conclusion on the final format. > > > These sound like projects that could be helpful for lots of people > working on classical cryptology. The first project was spawned out of laziness. I got annoyed by hand-encoding the sample bifid cipher included with ciphertool. :) As I might have mentioned before, the > cipher file format problem seems like an almost contrived example for > what xml is for. I've never used xml though, and I don't know if this > would be a sledgehammer vs. fly situation. Having worked a lot with XML, I think it would be a little bit of overkill for this situation. XML is much more suited for highly structured and hierarchical data. Here we just have a simple list of key/value pairs with no strong structure. The big problem with using XML, though, is the amount of work required to read/write it. XML parsers do exist for Tcl. But the API for using XML parsers in general tends to be somewhat verbose. And the use of XML would add another dependency on an external software package. If possible, I like to keep the number of external dependencies to a minimum. The Java property file syntax might be better suited for this data. The format for this file type is a set of key/value pairs separated by a single '=' character. Keys can only contain alphanumeric characters, including '.', '-', and '_', but not '='. Values can contain any character. A backslash '\' at the end of a line indicates that the value continues on to the next line. For example: type=aristocrat ciphertext=ab cde fgh ijklm plaintext=my dog \ has fleas key=abcde fleas title=Sample author=wart If we want to support multiple ciphers in a single file then we can prefix each key name with a unique, but arbitrary, cipher id: cipher1.type=aristocrat cipher1.ciphertext=ab cde efghi ... cipher2.type=vigenere cipher2.period=6 > type > period > plaintext > ciphertext > key > keyword > author > title > language I'll add to this: for aristocrat types: k1key fixedkey k2key for morse types (pollux, morbit, fractionated morse): morsetext for baconian types: bacontext for progressive types: progressionEncoding progressionIncrement > might all fit in well as attributes. Some of these could be forbidden > from use in certain cipher types (eg. period/aristocrat). Others could > be mandatory (eg. period/vigenere). Author and title could, for > example, always be optional regardless of cipher type. Instead of being forbidden, couldn't they just be ignored by the function that reads the file? For example, if an aristocrat is loaded, then the period field would just get ignored. All of these > conditions could be enforced by the xml DTD (document type definition) xml dtds (and xml schema) introduce a little more complexity. If you don't use a validating xml parser then the DTD and schema aren't used and are pretty much useless. If you use a validating xml parser when you read the files, then the DTD or schema must be made available to the parser. This means they have to be made available in a well-known location. This is all entirely possible, but I feel that it's a little bit of overkill for ciphertool. The property file syntax can also be validated, but the validation would have to be part of the function that reads the file. I don't think that's such a bad thing. I'm not sure that there's too much we need to validate anyway. The only real required field that I see is the 'ciphertext' field. I envision using the files to store everything from unidentified ciphertext (type is 'unknown' or missing), to half-solved ciphers (type is known, period is '0' or missing), to completed ciphers (keyword is known). In some cases a field might be required but not known (the period for an unsolved vigenere for example). On the other hand, having stricter validation of > Or it could all be in plain text files which might make more sense if > not many people are using it. I think the property file format would be a little better suited for our purpose. I could probably still be convinced to make more fields required or make more fields required only for certain cipher types (period, for example). It's getting late, I'll sleep on it. --Wart |