VeraCrypt / Forums / Technical Topics: Unicode passwords, and the "Pigeonhole Principle"

Sam Hopkin - 2017-10-21

Hello. I have three questions. These are probably silly questions and if I had a better understanding of encryption and binary I could answer them myself. Sorry.

1: It says in the User Guide that Unicode passwords are supported. Are all 130,000+ Unicode characters supported? Or is there some restricted character-set?

2: I would imagine that using a Unicode password greatly improves security. After-all, it increases the number of possible characters from the ~100 of ASCII, to over 130,000 (assuming all characters are supported). But is there some form of compression, truncation, or interpretation of the Unicode characters used that might make it less secure than it would at first seem?

For example, the binary value of the Unicode character "Ⱞ" is "10110000101110". However, the first 6 bits of that binary are identical the last 6 bits of the "l" ASCII character (01101100); and the last 8 bits of that binary are identical to the binary needed for the "." ASCII character (00101110). Does Veracrypt properly distinquish an "Ⱞ" as a different character to 6/8ths of an "l" and a "."?

I assume that binary doesn't work this way. Obviously if I type "Ⱞ" into a word processor, then save and reopen it, it would save that character correctly and not instead show me an error message (for the 6/8ths of the "l") and a ".". Does Veracrypt operate in a similar way? Or is the Unicode altered in some way to accomodate being used in a password?

3: How is the "Pigeonhole Principle" avoided within Veracrypt? I am struggling to find any info in the guide that might help me answer this. As far as I am aware there are more possible passwords than there can be hashes produced from passwords. Does Veracrypt use any method to prevent a long, highly-complex password from sharing the same hash as the password "Cat"? It would be a shame for my 128 character randomized Unicode password to only be as secure as the weakest password that produces the same hash.

Thank you.

Last edit: Sam Hopkin 2017-10-21

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Hopkin - 2017-10-24

Would using cascaded encryption help prevent the pigeonhole principle?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Enigma2Illusion - 2017-10-24

I cannot answer your other questions, however I can answer the third question.

Does Veracrypt use any method to prevent a long, highly-complex password from sharing the same hash as the password "Cat"?

https://www.veracrypt.fr/en/Header%20Key%20Derivation.html

The salt prevents the same password from generating the same hash value.

Hash is not the same as the encryption key for a volume. The encryption key is generated from the Random Number Generator.

https://www.veracrypt.fr/en/Random%20Number%20Generator.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Sam Hopkin - 2017-10-24
  
  Thanks for the reply.
  
  Are passwords not hashed at all, then?
  
  I'm definitely not an expert at this stuff, but I imagine that there are only two ways the password itself could be used:
  
  A: The password is salted and hashed, and that hash produced is used for the encryption scheme calculation that then reveals the encryption key
  B: The password is salted but not hashed, and the total binary value of that password's characters is used for the encryption scheme calculation that then reveals the encryption key
  
  If A is the case, then assuming a 512-bit hash is used then there are 2^512 possible hashes. Since that number is a lot smaller than the total number of possible passwords which is 2^(32x64) (2 possible values per bit, up to 32 bits per character, and a total of 64 characters), then each password would have trillions of other passwords that produce the same exact hash. That would mean that someone trying to brute-force your encryption would not necessarily need to use your exact password. They would only need to use a password that just so happens to produce the same hash value... which could be a single dictionary word that could be cracked within minutes. Even if you salt the password before the hash, you are still restricted by the total number of different possible hashes. That salt would only protect against a rainbow-table; not a brute-force.
  
  Of course, if the password is hashed using a 2048-bit hash, or if it is hashed 4 times with different salt with a 512-bit hash, then there is no problem (as 2^2048 is equal to the total number of possible passwords (2^(32x64)), which would give you a 50% chance that your password shares no hash outcome with any other password.
  
  If B is the case, then there is no problem. And only your exact password can be used to unencrypt the header, which could take millions of years by brute-forcing alone.
  
  Last edit: Sam Hopkin 2017-10-24
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Enigma2Illusion - 2017-10-24

The password is hashed with salt.

https://www.veracrypt.fr/en/Header%20Key%20Derivation.html

The documentation explains how VeraCrypt uses the password to unlock the header which contains the encryption key.

https://www.veracrypt.fr/en/Encryption%20Scheme.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Sam Hopkin - 2017-10-24
  
  If the password is hashed with the salt and then the hash is used to decrypt the header, that would mean the pigeonhole principle is not avoided.
  
  With 2^512 possible hashes, and 2^2048 possible passwords, there are at least trillions of passwords that could be used to decrypt the same header.
  
  That means if someone were to brute-force your encryption, they would only need to try 2^512 different passwords before having a 50% chance of getting through. Instead of the 2^2048 to have a 50% chance of getting through were the password not hashed.
  
  2^512 is not a small number. But it's scary to think that a 64-character Unicode password could be functionally identical to a dictionary word.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Enigma2Illusion - 2017-10-25

Your assumptions are incorrect and if there were the shortcomings you believe exist, then this issue would have been flagged in the two audits performed on TrueCrypt, which VeraCrypt is based on, and the one audit performed on VeraCrypt.

https://sourceforge.net/p/veracrypt/discussion/general/thread/9490dbcc/

Sam Hopkin wrote:

That means if someone were to brute-force your encryption, they would only need to try 2^512 different passwords before having a 50% chance of getting through.

Incorrect. You would need to try 2^512 for each password since the salt provides 2^512 combinations for each password.

https://www.veracrypt.fr/en/Header%20Key%20Derivation.html

512-bit salt is used, which means there are 2^512 keys for each password.

However, the salt is known value since it is stored unencrypted in the header.

Therefore, the attacker would need to know your password, PIM, and/or keyfiles, and which hash you are using to successfully brute force attack the header.

You can read the developer's response regarding passwords at the links below.

https://sourceforge.net/p/veracrypt/discussion/general/thread/09696187/#491d

https://sourceforge.net/p/veracrypt/discussion/general/thread/8fe82679/#cf87

Last edit: Enigma2Illusion 2017-10-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Hopkin - 2017-10-25

then this issue would have been flagged in the two audits performed on TrueCrypt, which VeraCrypt is based on, and the one audit performed on VeraCrypt.

Possibly. But there is no documentation (at least that I can find and understand) that confirms the pigeonhole principle either way. I would have thought that the audits would have mentioned this if it was a problem, but there is also the possibility that people accept the pigeonhole principle as an inevitable part of cryptology, and why would they make an issue out of something that can't be avoided?

It also isn't a huge problem unless you are very unlucky. 2^512 is still a huge number... it's just not as big as 2^2048.

Incorrect. You would need to try 2^512 for each password since the salt provides 2^512 combinations for each password.

If the password is not hashed, then that is fine. The pigeonhole principle only applies if the password is hashed in order to decrypt the header. If the hash is used to decrypt the header, then the password doesn't matter at this point. You could try all possible 2^512 hashes until you get the one that is needed to decrypt the header. Salting doesn't increase the number of passwords or something; it only increases the number of possible hashes that could result from the same password.

Again, 2^512 is not a small number. It would still take an average of millions of years before someone got the right hash. But the possibility is there that they could get it first time.

https://sourceforge.net/p/veracrypt/discussion/general/thread/8fe82679/#cf87
Internally, VeraCrypt always uses 64 bytes passwords.
For example, if you use a 20 characters password with a 1KB key file, VeraCrypt will end up using a 64-bytes password derived from the password and the key files.

Wait so is the maximum password length 64 bytes, not 64 characters?

EDIT: I've tested the maximum password length the Veracrypt client will accept when making a volume. The maximum password length is definitely 64 bytes. Infact, it's a little bit less when using Unicode or extended ASCII. For some reason every Unicode or extended ASCII character takes up 8 bits more than it should (e.g. § is an 8 bit character, but I can only fit 32 of them in the password field. Meaning it's being treated as a 16 bit character). I guess that's got something to do with the Unicode version it uses, or some padding... or something like that.

That means if passwords are a maximum of 64 bytes (512 bits), the pigeonhole principle is largely a non-issue. There are 2^512 possible passwords, and there are 2^512 possible hashes. The odds that a single password will produce the same hash as at least one other password is 50:50. Pretty negligible. Of course, unless some method is used to avoid the pigeonhole principle, there is still the possibility that the password shares the same hash output as a dictionary word, although the odds would be 200000:2^512 (200,000 dictionary words, with 2^512 possible hashes). That's a 1 in 60000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 chance for a password to share its hash with a lower-case dictionary word. Pretty small. But some poor sod may be unlucky enough to be that guy.

Last edit: Sam Hopkin 2017-10-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Enigma2Illusion - 2017-10-25

If I understand your posts correctly, you are trying to determine the number of collisions in a given hash aka "Pigeonhole Principle".

RIPEMD-160
SHA hashes used in VeraCrypt aka the SHA-2
Whirlpool
Streebog

VeraCrypt also uses HMAC to strengthen the hashes.

https://www.veracrypt.fr/en/Hash%20Algorithms.html

My understanding is since the salt is added along with HMAC, that the risk of two or more passwords resulting in the same hash value is greatly reduced even if they are the same password.

Last edit: Enigma2Illusion 2017-10-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Sam Hopkin - 2017-10-25
  
  So does that mean that if two completely different passwords produce the same hash, it won't be accepted due to the HMAC?
  
  I've done a bit of Googling, and I'm really not too sure how HMACs work. But according to this
  https://www.quora.com/Hash-Functions-What-is-an-intuitive-explanation-of-the-HMAC-algorithm
  It seems as though the general idea is that the hash output is hashed again with the password.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Enigma2Illusion - 2017-10-25

So does that mean that if two completely different passwords produce the same hash, it won't be accepted due to the HMAC?

There would be no way for the software to know that a hash collision has resulted from two or more different VeraCrypt volumes using same/different passwords but somehow manage to produce the same hash. The role of the hash function is to prevent or minimize collisions. However, the salt being added creating 2^512 combinations for a password makes hash collision less likely.

https://www.veracrypt.fr/en/Header%20Key%20Derivation.html

512-bit salt is used, which means there are 2^512 keys for each password. This significantly decreases vulnerability to 'off-line' dictionary/'rainbow table' attacks (pre-computing all the keys for a dictionary of passwords is very difficult when a salt is used). The salt consists of random values generated by the VeraCrypt random number generator during the volume creation process.

I am not clear on the purpose of HMAC as specified in PKCS #5 v2.0 as used in VeraCrypt.

Searching the VeraCrypt forums for HMAC yielded the following pertinent information.

https://sourceforge.net/p/veracrypt/discussion/technical/thread/ad0bcd60/#fc20

Follow-up to previous link:
https://sourceforge.net/p/veracrypt/discussion/technical/thread/ad0bcd60/#b98d

https://veracrypt.codeplex.com/discussions/577716

However, the above links were discussing changing the password length from 64 to 128 which was going to double the wait time to mount a volume for SHA-256, Whirlpool and RIPEMD-160.

https://veracrypt.codeplex.com/workitem/71

Last edit: Enigma2Illusion 2017-10-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bob BIlly - 2017-10-27
  
  However, the above links were discussing changing the password length from 64 to 128 which was going to double the wait time to mount a volume for SHA-256, Whirlpool and RIPEMD-160.
  
  Couldn't the password length determine how many times the compression is done? Over 64 characters and it's done twice. Under 64 and it's done once.
  
  Last edit: Bob BIlly 2017-10-27
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Enigma2Illusion - 2017-10-27

Bob Billy wrote:

Couldn't the password length determine how many times the compression is done? Over 64 characters and it's done twice. Under 64 and it's done once.

Please read the links I provided to better understand the extra calls to HMAC that is performed.

Mounir IDRASSI wrote:

Actually, the second call to the hash compression function I was referring to is linked to the HMAC implementation where we need to do an extra hash if the password is longer than the block size.

As I provided in my previous post, this would double the mount times for SHA-256, Whirlpool and RIPEMD-160.

Per the threads and ticket that was opened, the developer, Mounir Idrassi, considered the idea of handling the passwords with the extra HMAC hash calls when password was > 64 characters for SHA-256, Whirlpool and RIPEMD-160.

Last edit: Enigma2Illusion 2017-10-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Unicode passwords, and the "Pigeonhole Principle"

Open source disk encryption with strong security for the Paranoid

Forums

Help

Unicode passwords, and the "Pigeonhole Principle"

Unicode passwords, and the "Pigeonhole Principle"

Open source disk encryption with strong security for the Paranoid

Forums

Help

Unicode passwords, and the "Pigeonhole Principle" document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Unicode passwords, and the "Pigeonhole Principle"