Exuberant Ctags / Feature Requests / #44 More features for -I (ignored identifiers)

Elliott Hughes - 2008-04-13

Logged In: YES
user_id=1127237
Originator: NO

yes, it's not obvious without looking at the source, but when the documentation says that -Ia=b replaces the identifier "a" with the identifier "b", it means exactly that: it only works on identifiers. '-I__PVOID=()', which is what i assume you tried, doesn't work the way you intend because the '()' is parsed as an identifier.

i think the -I stuff would need rewriting to take place much earlier than it currently does.

-*-

while i'm here, i'll mention that there's something i'd like to do with -I that i can't currently. i'd like to be able to translate something like:

MACRO_NAME(ClassName, MemberFunctionName) {...}

to something like:

void ClassName::MemberFunctionName() {...}

at the moment, these files have lots of MACRO_NAME tags instead. (this, though, would be even more complicated and deserves a separate feature request, but seemed worth mentioning to anyone looking at this issue with an eye to improving -I!)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Elliott Hughes - 2008-04-13

assigned_to: nobody --> dhiebert
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Derek Peschel - 2008-05-01

Logged In: YES
user_id=1258800
Originator: YES

You're right, Elliott, I did '-I___PVOID=()'. I didn't completely expect
it to work but it was worth trying. How hard the implementation actually
is, if it would mess something else up, or if there are other common simple
substitutions that should be supported, I don't know. But this request
has been assigned which is a good sign!

IMO your additional feature should be a separate request with a link back
to this one, so that the two requests can be dealt with separately.

For the record, just now I finally realized why the ARGDECL4 example works.
It is contrived so that Exuberant can remove the word "ARGDECL4" without
knowing that the following list was meant for the preprocessor. Then
Exuberant's parser can treat the leftover list as the signature part of a
function definition (or declaration, I'm not sure which, since the example
doesn't follow the list with a left brace or semicolon).

Darren -- The prototype macros I've seen usually take one argument in an
extra set of parentheses. The 4 in "ARGDECL4" hints at a fixed number of
arguments. But if the man page had a second copy of the line without the
"ARGDECL4", or maybe a definition of the ARGDECL4 macro, then the principle
of being able to read the argument list as normal C, and the difference
between ARGDECL4 and the usual modern version, would be clearer. What do
you think?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Derek Peschel - 2008-05-13

Logged In: YES
user_id=1258800
Originator: YES

After experimenting with etags, I understand the limitations of its
regexp extension feature. Perhaps my experience will help someone
design a better extension language (which -I is a form of) for Exuberant.

The biggest limitation is that etags is running its built-in C parser
at the same time it's scanning extension regexps. You can give a few
command-line options to include or exclude certain entries from the tags
file, but those options don't control the behavior of the parser.
(By the time the entries are generated, the parser has already handled
at least one line and maybe more.) I know very little about how the parser
reacts to regexp-scanned text. Certainly there is no syntax for forcing
the scan to work only in a given parser state, or putting the parser in a
given state after the scan.

Also, multiline regexps don't cause all lines to be skipped by the parser.
My multiline regexp matching ___P macro bodies worked, but some of the
lines matched a second regexp I wrote to detect global variables. (I had
to detect them myself because the built-in global variable parser worked
so badly with my files.) That made me give up on the whole idea of using
extension regexps.

Also, the extension language doesn't give you much control over which
part of the source text to put into the tag entry. You can control the
name stored with the entry. You might be able to control the source text
by making the regexp match more or less of the original, but that would
probably create extra entries or miss entries that should be found.

I tried preprocessing the source. Then etags's parser sees normal C,
no regexp matching needs to happen, and the tags file contains the right
portions of the text (as determined by the parser, which is smart about
where an entry should start and stop). Unfortunately the text is the
preprocessed text and not the original. So the tags file is useless for
looking things up in the original.

But I thought I was on the right track. I ended up writing a specialized
expander that only knows about certain macros. The expanded text doesn't
have to compile. It has to not confuse etags's parser and it has to cause
etags to generate appropriate entries. (For example, if a macro is used
in function prototypes, the expanded text must work as part of a function
prototype.)

I needed to use a few more tricks. The expander preserves line breaks and
keeps the same number of characters in each line (which the C preprocessor
doesn't) so the line numbers and byte offsets in the tags file can be used
with the original file. The only remaining problem is the text portions
in the tags file. Luckily the expansion is reversible. I wrote an un-
expander which works on the tags file, so that the final tags file
contains text from the original source file.

It's interesting that this complicated preprocessing method works better
than the supposedly-simple regexp method.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

More features for -I (ignored identifiers)

Group

Searches

Help

#44 More features for -I (ignored identifiers)

Discussion