#204 C++ tags file signature bad character

open
None
5
2007-12-16
2007-11-29
Massimo Cora'
No

Hi,

Ctags output a bad character (value hex 0xC3) instead of a single space. I parse ctags' output with Anjuta but it has problems recognizing chars different from a-zA-z0-9, e.g. normal ascii chars.
The attached file contains the C++ header file and the ctags 5.7 output.

cmd line was:

ctags --fields=afmiKlnsStz --c++-kinds=+p Accessor.h

bad line [copy and paste from a `less` output] is:

SafeGetCharAt Accessor.h /^ char SafeGetCharAt(int position, char chDefault=' ') {$/;" kind:function line:43 language:C++ class:Accessor access:public signature:(int position, char chDefault=<C3>)

As you can see on signature field there's a 0xc3 char which shouldn't be there.

thanks and regards,
Massimo

Discussion

  • Massimo Cora'
    Massimo Cora'
    2007-11-29

    C++ header and ctags output

     
  • Elliott Hughes
    Elliott Hughes
    2007-12-16

    • assigned_to: nobody --> dhiebert
     
  • Elliott Hughes
    Elliott Hughes
    2007-12-16

    Logged In: YES
    user_id=1127237
    Originator: NO

    i've seen this before; Ctags effectively does some "preprocessing" of its input, in particular turning all character and string literals into magic characters:

    STRING_SYMBOL = ('S' + 0x80),
    CHAR_SYMBOL = ('C' + 0x80)

    this has two bad side-effects. one is the one you're seeing, and the other is that there are now UTF-8 identifiers that appear to have character and/or string literals embedded within them.

     
  • Massimo Cora'
    Massimo Cora'
    2007-12-19

    Logged In: YES
    user_id=815090
    Originator: YES

    Ok. So should I expect to see this bug (feature?) squashed in next ctags release? Or should I find a workaround to manage that utf-8 chars?

    thanks and regards,
    Massimo

     
  • Darren Hiebert
    Darren Hiebert
    2007-12-28

    Logged In: YES
    user_id=38016
    Originator: NO

    Elliott is correct about the internal parser. Prior to this problem, there was no reason for ctags to retain string values, so that once parsed, they were replaced with an custom token for internal processing.

    It would seem to me that the best way to fix this would be to eliminate the default values from the signature. In this particular case, it would mean changing the sigature to "signature:(int position, char chDefault)". Thoughts?

     
  • Massimo Cora'
    Massimo Cora'
    2008-01-03

    Logged In: YES
    user_id=815090
    Originator: YES

    Well if storing default values into tags file is a problem, it's ok for me to eliminate them. At least my program won't suffer for this removal.

     
  • Logged In: YES
    user_id=68699
    Originator: NO

    We came around this problem by reading the ctags stream as binary data without the need to convert this for utf-8, so this is no longer a major issue for us.