On 11/24/06, Adam R. Maxwell <amaxwell@...> wrote:
> On Nov 24, 2006, at 10:26, Christiaan Hofman wrote:
> > On 11/24/06, Adam R. Maxwell <amaxwell@...> wrote:
> > On Nov 24, 2006, at 08:44, Christiaan Hofman wrote:
> > >
> > >
> > > On 11/24/06, Adam R. Maxwell <amaxwell@...> wrote:
> > >
> > > On Nov 24, 2006, at 05:47, Christiaan Hofman wrote:
> > >
> > > > I can only think of one solution: always hand UTF-8 to the parser
> > > > (or perhaps for some encodings). This has a problem though. First,
> > > > the parser should always parse the data instead of the file. This
> > > > means that the filepath for errors is wrong.
> > >
> > > And that's a useful feature I'd hate to lose. We could do this just
> > > for certain encodings, though.
> > >
> > > I wonder which. Do we really need the original filepath, can't we
> > > just use a temporary one?
> > Unicode and non-Western encodings are likely problem candidates. It's
> > possible to pass a temporary file to the parser, but it just means
> > more of a mess. If we go that route, let's do the conversion(s) in
> > the document, then pass BibTeXParser an NSString just like the other
> > parsers. We used to create a temporary file for paste/drag data, but
> > I removed that after discovering Omni's NSData-based file stream,
> > since it was slow and messy.
> > Can every string be converted to UTF-8?
> I believe anything that can be represented in Unicode can be saved as
> UTF-8. I don't think all characters have a Unicode representation,
> though (but don't quote me on either of those).
> > > > Also we then can't save groups in UTF-8 anymore, because we need
> > to
> > > > convert the whole file content. This could give problems with
> > > > compatibility and when saving in ASCII.
> > >
> > > I don't see any way around that, other than saving the groups in a
> > > file wrapper plist or as xattrs.
> > >
> > > We can escape non-ascii characters.
> > That seems kind of fragile, at least from a compatibility standpoint.
> > Compatibility would be a problem, yes. It would be a one-time switch
> > though. An ASCII file could always be opened as UTF-8. I never liked
> > the mixed encoding for group files, it's fragile by itself. It makes
> > external editing of the file difficult.
> Shouldn't other editors just save the byte sequence back as-is?
no, other editors will probably save the string in some encoding, just as we
do with a @comment we don't recognize.
Someone who manually edited the plist could obviously cause problems.
But I think it should be possible. It is one of the strong points of using a
human readable format like bibtex.
The switch would mean that previous versions of BibDesk couldn't read
> groups saved in the latest version, which would be an issue for people
> who have to work across OS versions.
That is indeed an issue. It would only be a problem when there are non-ascii
characters in group names and such though (I think), so it is easily
> I really think we're just avoiding the inevitable switch away from
> > BibTeX as a file format, though. How much work would it be to add the
> > BibTeX exporter infrastructure that Mike originally talked about? We
> > can already serialize an array of BibItems.
> > I would prefer not to use a binary format in BD1, stay with bibtex
> > format.
> Well, I'm just concerned that we keep changing things to suck
> differently instead of just getting away from all these problems.
Changing to a custom format would really break the format. One advantage of
using bibtex is that there are other ways to fix an encoding problem, as you
can edit using a plain text editor. For a custom format when it's broken
there is nothing the user can do. Moreover it is definitely incompatible
with older versions. I think quite a lot of users like bibdesk because of
the human readable format.
> far as BD2 is concerned, we have a roadmap, but there doesn't seem to
> be much gas for the car :). Personally, I don't have a strong desire
> to relive the years of development and polishing that have gone into BD.
Yea, I also doubt if there ever will be a BD2.
But I'm trying to see how we can support Japanese encodings while keeping
the groups as is, but I don't (yet) see how to do it. As the OP of the bug
report noted, Shift-JS seems to be the standard encoding (necessary?) to run
Japanese latex, so I think it's important to support it.