[Passwordsafe-devel] Timestamps, character encodings in .dat files
Popular easy-to-use and secure password manager
Brought to you by:
ronys
From: Frank P. <fp...@fp...> - 2004-12-11 18:04:24
|
Hi, in the archives, I found discussion about the timestamp fields that are defined in the formatV2.txt document, but unused so far. The answer was roughly, "whoever implements them first, decides." Also, Password Safe at the moment assumes that all strings are encoded according to the current locale, i.e., in ASCII, ISO-8859-1, or whatever codepage is in use, resulting in the potential for problems when a .dat file is moved between locales, or if a user switches back and forth between, say, cyrillic and latin. I'd like to offer suggestions for both. Timestamps are defined, by formatV2.txt, to be of time_t. My suggestion is to clarify that, and to store timestamps as 32 bit, little endian, unsigned integers, representing the number of seconds since the "POSIX epoch" (Midnight, January 1, 1970, in the GMT timezone). That value can be computed and used intuitively on most systems, and has the following properties: - On Windows and POSIX, this is equivalent to the time_t type, using time() and related library functions. On the Mac, the value has to be corrected by the constant 2082844800 (66 years, of which 17 are leap years) to account for the beginning of the Mac epoch in 1904. - The value is timezone-independent. Applications can present the time according to the local timezone, if desired. - Software implemented on little-endian processors, like the x86, can read the little-endian value naively. Elsewhere (PPC, Sparc), the software has to do a trivial byte swap. Now for character encodings. My suggestion here is to encode all strings as UTF-8. UTF-8 is able to encode all of Unicode, it is easily converted from and to, and it is "safe" for existing C code (i.e., no embedded nulls). To maintain as much compatibility as possible with existing software, I suggest to add a boolean "UsesUtf8Strings" preference. This way, software could fall back to the existing assumption (of all strings being encoded in the current codepage) if the preference is missing or false. Software can also expose this preference to the user, to allow backwards compatibility (with the existing cross-locale incompatibility). (Note that the preference would also have to apply to the string preferences, like DefUserName, which might be non-ASCII. The software can ensure that the UseUtf8Strings preference is put first in the preferences string.) The worst that could happen is that a .dat file is written using UTF-8 strings, and then read by an old version of the software; in that case, non-ASCII characters would appear "garbled". To me, that seems like a change with the least impact. What do you think? Maintainers, if you agree with the encodings suggestion, can you please reserve a number for the "UsesUtf8Strings" preference, and post it here? Frank -- Frank Pilhofer, fp...@fp... |