Menu

#106 Apertium-tagger can't flush error messages/woes of mixing cerr and wcerr.

closed
nobody
None
2016-06-12
2016-06-10
No

Apertium and lttoolbox mix wcerr and cerr pretty much everywhere. Grepping reveals they're quite evenly split. This is against the spec but works somewhat. In my own toy program I noticed this sometimes causes the output to get mixed up - but I haven't actually seen anything like that for Apertium.

The specific bug I've observed is in apertium-tagger. The problem is the error messages on a bad command line argument are sent to cerr after wcerr is written to which stop them being flushed before program termination so nothing is displayed. (I actually encountered this bug before while first testing apertium-tagger but assumed the program just wasn't doing any checking properly.) Adding << endl doesn't seem to work. Maybe it's possible to work around this? I think ultimately the solution which would cause the least headaches long term is to agree upon one and document "only use this one" somewhere (coding standards wiki page?)

I've experimented with search and replace + fixing errors that come up to convert to wostreams in this commit: https://github.com/frankier/apertium-core/commit/f418d24dfeccc51feb618e5991f86ca9e53c362b but after doing so I thought about it and decided this might be the wrong way. The tmx serialiser in Apertium tries to write to cout if no file is given and I think this use case of writing octets to a a pipe might be needed in general so maybe we could switch everything to all cout and just assume utf-8 (then fix any additional bugs caused by this on a case by case basis).

Input very welcome.

Discussion

  • Frankie Robertson

    Okay so I've changed all cerr to wcerr in Apertium and Lttoolbox and the new approach at: http://wiki.apertium.org/wiki/Code_style#String_encoding This is fairly safe since nothing mission critical should be being sent to stderr anyway. The one problem this may have introduces is it's possible before utf-8 data was being sent to cerr. Now this could cause problems since I piping char* or string to wcerr probably only works properly if the data is 7-bit clean. This can be fixed on a case by case basis by utf8 decoding the string/char* before outputting it.

     
  • Frankie Robertson

    • status: open --> closed
     

Log in to post a comment.