#44 More features for -I (ignored identifiers)

open
None
5
2008-04-13
2008-04-06
Derek Peschel
No

The -I feature is not as general-purpose as GNU etags's regexps, but is still very useful if you want to parse a slightly altered version of C (with extra type names, or similar).

Unfortunately it doesn't seem to work with the code I'm trying to parse. I believe I need two things:

For cases like

void ___disable_user_interrupt ___PVOID
{
}

the ___PVOID must expand into () so that ctags knows this is a function definition. I'd like to be able to give -I ___PVOID=() as an option.

Perhaps ctags can already handle this type of macro, and I just haven't gotten it working yet? The ARGDECL4 example in the manual looks promising.

For cases like

___HIDDEN void extensible_string_cleanup
___P((extensible_string *str),
(str)
extensible_string *str;)
{
}

the ___P and parenthesized list afterward must also expand into (). I'd like to be able to give -I ___P+=() as an option but currently + and = can't be used together as far as I know.

In both these examples, the line with the function name should be added to the tags file, not any of the other lines scanned while matching parentheses, or the lne with the brace.

Thanks!

Discussion

  • Elliott Hughes
    Elliott Hughes
    2008-04-13

    Logged In: YES
    user_id=1127237
    Originator: NO

    yes, it's not obvious without looking at the source, but when the documentation says that -Ia=b replaces the identifier "a" with the identifier "b", it means exactly that: it only works on identifiers. '-I__PVOID=()', which is what i assume you tried, doesn't work the way you intend because the '()' is parsed as an identifier.

    i think the -I stuff would need rewriting to take place much earlier than it currently does.

    -*-

    while i'm here, i'll mention that there's something i'd like to do with -I that i can't currently. i'd like to be able to translate something like:

    MACRO_NAME(ClassName, MemberFunctionName) {...}

    to something like:

    void ClassName::MemberFunctionName() {...}

    at the moment, these files have lots of MACRO_NAME tags instead. (this, though, would be even more complicated and deserves a separate feature request, but seemed worth mentioning to anyone looking at this issue with an eye to improving -I!)

     
  • Elliott Hughes
    Elliott Hughes
    2008-04-13

    • assigned_to: nobody --> dhiebert
     
  • Derek Peschel
    Derek Peschel
    2008-05-01

    Logged In: YES
    user_id=1258800
    Originator: YES

    You're right, Elliott, I did '-I___PVOID=()'. I didn't completely expect
    it to work but it was worth trying. How hard the implementation actually
    is, if it would mess something else up, or if there are other common simple
    substitutions that should be supported, I don't know. But this request
    has been assigned which is a good sign!

    IMO your additional feature should be a separate request with a link back
    to this one, so that the two requests can be dealt with separately.

    For the record, just now I finally realized why the ARGDECL4 example works.
    It is contrived so that Exuberant can remove the word "ARGDECL4" without
    knowing that the following list was meant for the preprocessor. Then
    Exuberant's parser can treat the leftover list as the signature part of a
    function definition (or declaration, I'm not sure which, since the example
    doesn't follow the list with a left brace or semicolon).

    Darren -- The prototype macros I've seen usually take one argument in an
    extra set of parentheses. The 4 in "ARGDECL4" hints at a fixed number of
    arguments. But if the man page had a second copy of the line without the
    "ARGDECL4", or maybe a definition of the ARGDECL4 macro, then the principle
    of being able to read the argument list as normal C, and the difference
    between ARGDECL4 and the usual modern version, would be clearer. What do
    you think?

     
  • Derek Peschel
    Derek Peschel
    2008-05-13

    Logged In: YES
    user_id=1258800
    Originator: YES

    After experimenting with etags, I understand the limitations of its
    regexp extension feature. Perhaps my experience will help someone
    design a better extension language (which -I is a form of) for Exuberant.

    The biggest limitation is that etags is running its built-in C parser
    at the same time it's scanning extension regexps. You can give a few
    command-line options to include or exclude certain entries from the tags
    file, but those options don't control the behavior of the parser.
    (By the time the entries are generated, the parser has already handled
    at least one line and maybe more.) I know very little about how the parser
    reacts to regexp-scanned text. Certainly there is no syntax for forcing
    the scan to work only in a given parser state, or putting the parser in a
    given state after the scan.

    Also, multiline regexps don't cause all lines to be skipped by the parser.
    My multiline regexp matching ___P macro bodies worked, but some of the
    lines matched a second regexp I wrote to detect global variables. (I had
    to detect them myself because the built-in global variable parser worked
    so badly with my files.) That made me give up on the whole idea of using
    extension regexps.

    Also, the extension language doesn't give you much control over which
    part of the source text to put into the tag entry. You can control the
    name stored with the entry. You might be able to control the source text
    by making the regexp match more or less of the original, but that would
    probably create extra entries or miss entries that should be found.

    I tried preprocessing the source. Then etags's parser sees normal C,
    no regexp matching needs to happen, and the tags file contains the right
    portions of the text (as determined by the parser, which is smart about
    where an entry should start and stop). Unfortunately the text is the
    preprocessed text and not the original. So the tags file is useless for
    looking things up in the original.

    But I thought I was on the right track. I ended up writing a specialized
    expander that only knows about certain macros. The expanded text doesn't
    have to compile. It has to not confuse etags's parser and it has to cause
    etags to generate appropriate entries. (For example, if a macro is used
    in function prototypes, the expanded text must work as part of a function
    prototype.)

    I needed to use a few more tricks. The expander preserves line breaks and
    keeps the same number of characters in each line (which the C preprocessor
    doesn't) so the line numbers and byte offsets in the tags file can be used
    with the original file. The only remaining problem is the text portions
    in the tags file. Luckily the expansion is reversible. I wrote an un-
    expander which works on the tags file, so that the final tags file
    contains text from the original source file.

    It's interesting that this complicated preprocessing method works better
    than the supposedly-simple regexp method.