#163 ctags Bus error while processing unicode 16 files.

open
None
5
2007-06-30
2007-04-26
Hardeep Parmar
No

Steps...

1) Create a folder on your machine and extract the attached tar/zip file inot it.
2) The tar files ocntains two files 1.java and 1.js in two separate directories.
3) Run ctags on the parent directory.

RESULT:
ctags crashes on MAC OS X 10.5 with Bus Error.
The file 1.js is a text file in unicode 16 encoding.

The crash happens in read.c inside Function readline at
line# 510.

505 {
506 char* eol;
507 vStringSetLength (vLine);
508 /* canonicalize new line */
509 eol = vStringValue (vLine) + vStringLength (vLine) - 1;
510 if (*eol == '\r')
511 *eol = '\n';
512 else if (*(eol - 1) == '\r' && *eol == '\n')
513 {
514 *(eol - 1) = '\n';
515 *eol = '\0';
516 --vLine->length;
517 }

if you somehow fix this issue here in read.c it crashes further in file entry.c inside Function writePatternEntry at line# 777.

768 static int writePatternEntry (const tagEntryInfo *const tag)
769 {
770 char *const line = readSourceLine (TagFile.vLine, tag->filePosition, NULL);
771 const int searchChar = Option.backward ? '?' : '/';
772 boolean newlineTerminated;
773 int length = 0;
774
775 if (tag->truncateLine)
776 truncateTagLine (line, tag->name, FALSE);
777 newlineTerminated = (boolean) (line [strlen (line) - 1] == '\n');
778
779 length += fprintf (TagFile.fp, "%c^", searchChar);
780 length += writeSourceLine (TagFile.fp, line);
781 length += fprintf (TagFile.fp, "%s%c", newlineTerminated ? "$":"", searchChar);
782
783 return length;
784 }

From what it looks is and having seen that ctags operates on char in its codebase, it seems ctags is not able to handle unicode 16 encoding files for parsing.

Is this a known bug? Is there a workaround for it?
However is seems to work fine on 10.4.x systems.

Discussion

  • Hardeep Parmar
    Hardeep Parmar
    2007-04-26

    tar of Bug files.

     
    Attachments
  • Darren Hiebert
    Darren Hiebert
    2007-06-30

    Logged In: YES
    user_id=38016
    Originator: NO

    I am confused by this problem. Even if Unicode encoding is ignored by ctags, it should still appear as a simple sequence of 8-bit bytes. Because Unicode characters (I believe) can only appear in strings and comments, ctags should ignore them. For ctags to crash at the lines you indicate, the pointer or array index must be bad. I do not understand how this can happen. Perhaps I need your help here since I only have access to Mac OSX 10.4.10. Can you somehow provide more information on what the pointer and index calculations at these lines turns out to be (i.e. how is it different that what is expected)?

     
  • Darren Hiebert
    Darren Hiebert
    2007-06-30

    • assigned_to: nobody --> dhiebert
     
  • Hardeep Parmar
    Hardeep Parmar
    2007-06-30

    Logged In: YES
    user_id=584079
    Originator: YES

    Yes in normal course source file(e.g cpp,h) would have unt-8 chars.
    In this case the file "1.js" which is a javascript file is utf16 encoded.