#163 ctags Bus error while processing unicode 16 files.



1) Create a folder on your machine and extract the attached tar/zip file inot it.
2) The tar files ocntains two files 1.java and 1.js in two separate directories.
3) Run ctags on the parent directory.

ctags crashes on MAC OS X 10.5 with Bus Error.
The file 1.js is a text file in unicode 16 encoding.

The crash happens in read.c inside Function readline at
line# 510.

505 {
506 char* eol;
507 vStringSetLength (vLine);
508 /* canonicalize new line */
509 eol = vStringValue (vLine) + vStringLength (vLine) - 1;
510 if (*eol == '\r')
511 *eol = '\n';
512 else if (*(eol - 1) == '\r' && *eol == '\n')
513 {
514 *(eol - 1) = '\n';
515 *eol = '\0';
516 --vLine->length;
517 }

if you somehow fix this issue here in read.c it crashes further in file entry.c inside Function writePatternEntry at line# 777.

768 static int writePatternEntry (const tagEntryInfo *const tag)
769 {
770 char *const line = readSourceLine (TagFile.vLine, tag->filePosition, NULL);
771 const int searchChar = Option.backward ? '?' : '/';
772 boolean newlineTerminated;
773 int length = 0;
775 if (tag->truncateLine)
776 truncateTagLine (line, tag->name, FALSE);
777 newlineTerminated = (boolean) (line [strlen (line) - 1] == '\n');
779 length += fprintf (TagFile.fp, "%c^", searchChar);
780 length += writeSourceLine (TagFile.fp, line);
781 length += fprintf (TagFile.fp, "%s%c", newlineTerminated ? "$":"", searchChar);
783 return length;
784 }

From what it looks is and having seen that ctags operates on char in its codebase, it seems ctags is not able to handle unicode 16 encoding files for parsing.

Is this a known bug? Is there a workaround for it?
However is seems to work fine on 10.4.x systems.


  • Hardeep Parmar

    Hardeep Parmar - 2007-04-26

    tar of Bug files.

  • Darren Hiebert

    Darren Hiebert - 2007-06-30

    Logged In: YES
    Originator: NO

    I am confused by this problem. Even if Unicode encoding is ignored by ctags, it should still appear as a simple sequence of 8-bit bytes. Because Unicode characters (I believe) can only appear in strings and comments, ctags should ignore them. For ctags to crash at the lines you indicate, the pointer or array index must be bad. I do not understand how this can happen. Perhaps I need your help here since I only have access to Mac OSX 10.4.10. Can you somehow provide more information on what the pointer and index calculations at these lines turns out to be (i.e. how is it different that what is expected)?

  • Darren Hiebert

    Darren Hiebert - 2007-06-30
    • assigned_to: nobody --> dhiebert
  • Hardeep Parmar

    Hardeep Parmar - 2007-06-30

    Logged In: YES
    Originator: YES

    Yes in normal course source file(e.g cpp,h) would have unt-8 chars.
    In this case the file "1.js" which is a javascript file is utf16 encoded.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks