III) The choice between an in-house encryption algorithm and a preexisting one:
1.0) A sketch of the in-house encryption algorithm:
The in-house algorithm would be an open one (as opposed to a restricted one), which will be revealed and open for analyzes and tests by all, in order to make sure it is reliable.
As Kerckhoffs’s said: If the strength of your new cryptosystem relies on the fact that the attacker does not know the algorithm’s inner workings, you’re sunk. If you believe that keeping the algorithm’s insides secret improves the security of your cryptosystem more than letting the academic community analyze it, you’re wrong. And if you think that someone won’t disassemble your code and reverse-engineer your algorithm, you’re naïve.
Thus, leet's algorithm will be composed of at least two major components: the algorithm and the “captured live material”. The first component is self explanatory while the latter, if not understood, will be explained later. In addition to those two components there will also be a dynamic dictionary (as in customizable by the user) the content of which will be a language vocabulary serving for the last step of the information securing, a form of steganography which will be explained later.
The theoretical strength of this proposed cryptosystem relies on the fact that the key which would serve for ciphering or deciphering a message, would itself also be the algorithm, or at least most of the algorithm, this will be further explained later.
The “captured live material” as referred to earlier will be this mentioned key. It will simply be live (as in new) captured information (live scenery, i.e.: video, audio, etc..) which would have to be recorded by a device in order to make it digital (a file containing a suite of characters and numbers for example (hex.)).
This “captured live material” can be either noise/sound, a view (visual) or anything happening in the surroundings which can be recorded by a camera, a microphone or any other captor which can produce complex enough data. The source of the captured data having to be anything which cannot be reproduced, like for example the sounds of an outdoor area, including people chatter, bird whistles, barking dogs, passing cars, etc...
The dictionary on the other hand will contain a suite of words, like in a regular language dictionary, but without the definitions. This dictionary will represent the vocabulary necessary for the steganography part, but let's let this for later, since it's the last step of the securing of the information.
The content of the captured data (as referred to as the “captured live material” earlier) should make a formidable source of randomness, which cannot be known by anybody else if: 1) one hasn't access to this digitized data, or 2) if one is not present at the moment of the capture, with the same capture hardware, and the same other parameters (distance, position, etc...), which is practically impossible in theory.
There is a section of a page from the Gibson Research Corporation website (GRC.com) which describes and confirms this method of generating True random data, an excerpt of which says:
There are ways to generate absolutely random numbers, but computer algorithms cannot be used for that, since, by definition, no deterministic mathematical algorithm can generate a random result. Electrical and mechanical noise found in chaotic physical systems can be tapped and used as a source of true randomness, but this is much more than is needed for our purposes here. High quality algorithms are sufficient.
Now granted that we have a digitized recording containing random, unique content, we need a way to make it usable for the encryption of a given message, because that recording is what will be the key (the secret part). The rest of the cryptosystem (algorithm, dictionary, etc...), not compromising the encrypted message in case they were to be found or intercepted separately from the rest.
The real algorithm, in fact, will be the random content of the captured data. The other part of the algorithm which better resembles regular existing encryption algorithms, will merely take instructions from the captured data, or in other words will be told what to do to with the clear message. We can thus consider that there will be two algorithms involved in the encryption of the information, one that would be open and static and the other one secret since it's also the deciphering key. Therefore the captured data can be considered the key and an algorithm at the same time.
At this point we have barely covered the surface, and the exposed idea might not be clear enough yet. So in order to better explain and further develop the idea, we are going to use diagrams and more examples.
1.1) The algorithm and the key:
It was said earlier in this document that the key (the captured live material) will also constitute an algorithm. We will thus, for differentiating purposes, refer to it as the “secret algorithm” or “the key” from now on. The other part of the algorithm, which doesn't necessarily have to be kept secret, will be referred to as the “open algorithm”.
1.2) How will this work?
For example, let's assume we have a recorded video file from a digital camera, of which we are going to view the content under an hexadecimal editor:
$ shed myrecordedvideo.mp4
The output of which under shed (a Linux command line hexadecimal editor) will be:
offset asc hex dec oct bin
00000000: 00 000 000 00000000
00000001: 00 000 000 00000000
00000002: 00 000 000 00000000
00000003: 1C 028 034 00011100
00000004: f 66 102 146 01100110
00000005: t 74 116 164 01110100
00000006: y 79 121 171 01111001
00000007: p 70 112 160 01110000
00000008: m 6D 109 155 01101101
00000009: p 70 112 160 01110000
00000010: 4 34 052 064 00110100
00000011: 2 32 050 062 00110010
00000012: 00 000 000 00000000
00000013: 00 000 000 00000000
00000014: 00 000 000 00000000
I have stopped at line 14, because that's all we need now for the example. These are by the way merely 14 lines out of the 17964577 ones contained in this video file which is 17.1 MiB large.
I have put the hexadecimal values in bold, because this is the only part we are interested in for now.
The open algorithm, will receive instructions (of when and how to modify (cypher) the content of the clear message) via a table of equivalence of each instruction and possible hexadecimal value read from the secret algorithm.
The instructions (which equal hexadecimal values) will therefore decide which part (word, punctuation, special character, phrase, etc...) of the content of the clear message will be replaced by which other element, this latter too being also defined with the help of the hexadecimal values and their instruction counterparts.
Let's for instance, use the previous output of shed, with an example of an open algorithm, and demonstrate how the combination of those two cryptosystem components (the secret algorithm and the open algorithm) is supposed to function.
Here is the series of hex values taken apart from the rest of shed's output:
00 00 00 1C 66 74 79 70 6D 70 34 32 00 00 00
Let's assume this string of hexadecimal values is the key.
Let's also assume the clear message is a phrase constituted of a few words:
Clear message: The man will drink water.
Assuming that it reads the hexadecimal string from left to right, the first value it is going to encounter is “00”, then “00”, then “00” again and then “1C” and so on...
Let's say (in example) 00 is the equivalent in the open algorithm to a “change nothing” instruction,
it's going to hold during the first three hexadecimal values which are three times 00, then encounter 1C which (always for the sake of example purposes) will equal a “replace the first word (man in this case) in the clear message with such or such word from the “dictionary” which we mentioned earlier, of course this is merely a possibility of an instruction, the hex value could as well equal an instruction which dictates changing the first indicative (The in this case) in the clear message with a given word, indicator, verb, number, or anything within the range of readable information (for the sake of the last step of the encryption which will be a form of steganography).
Of course the equivalence of the hexadecimal values and the instructions would still have to be defined, the previous examples serving only for the purpose of suggesting the way the cryptosystem could work.
But we are not done yet, now comes the steganography part which will make sure that the encrypted message will not look like one, but something else of trivial appearance, in example a short story or an ebook or in the case we will use here, and which should also be relatively easy to apply, a word combination rule taken from the “Exquisite Corpse” game, the one invented by the Surrealists around 1925.
The rule consists of making a sentence composed of: e.g. "The adjective noun adverb verb the adjective noun"), the words in italics being variables. It should give a result looking similar to the result of the fist game of the Exquisite Corpse ever done: "The exquisite corpse will drink the new wine.".
Alternatively, we could use a technology similar to “Automatic Summarization”, which will use the result of the two first levels of encryption (element swapping and order changing) and make it coherent, so that the produced encrypted message passes for a trivial one as in the former example.