Johannes Zellner wrote:
> On Wed, Dec 06, 2000 at 11:14:19PM -0600, Darren Hiebert wrote:
> > - Added support for Posix regular expressions.
> I didn't manage to get this working. Would you mind giving an example
> (maybe also in the man page) ? -- I tried as very simple example
> ctags --regex='/.*Darren.*/a/e' QUOTES
> but got an empty tags file.
Well, this is on of the issues I wanted to discuss on the list.
Bear in mind that ctags attempts to determine the language of a file
by examining its extension (the part of the file name following the
last "."). Ctags has a built-in list of extension and which languages
they are mapped to (customizable using the --langmap option). For
each file examined by ctags it checks to see if the file is mapped
to a lanuage and, if so, invokes the parser for the mapped language
on the file. If the extension of the file is not mapped to a
language, that file is ignored (this allows you to run "ctags *" or
"ctags -R" and ignore .o files, etc.), making ctags pretty automatic
in most cases.
In the case you cite, QUOTES has a null extension -- and a null
extension is not mapped to any language, causing the file to be
ignored. Obviously, this is does not meet with what you expected.
When I was implementing regex in ctags, I first took the approach
that Gnu etags takes: every file parsed by etags is run through both
its parser and any regex expressions defined. It then occurred to me
that this approach limited the usefulness of regex, since a
particular regular expression may only make sense for a particular
kind of file. Then the idea occured to me that a user could define a
set of regular expressions for each supported language, allowing you
to write a regular expression that applied only to Perl files, for
example (e.g. using "--perl-regex=/.../.../"). This then led me to
the idea that I could then allow the user to define arbitrary
languages, each with its own regular expressions, then map those
user-defined languages to file extensions. This means a user could
have support for any language for which they were willing to write
Now this left me with the question as to whether it was appropriate
to support both language-specific regular expressions *and* regular
expressions which would be applied to every file parsed by ctags
(note that this says "parsed by ctags", not "passed to ctags",
because you could end up with a mess if ctags runs your object files
through regular expressions). The result is that both are in there
now. To support this, I added an internal psuedo-language, "regex",
not mapped to any file extensions, meaning that your example would
work if you had supplied either of the options
"--language-force=regex" or "--langmap=regex:." to ctags.
So I ask our readership for some ideas as to how to apply regular
expressions in a manner which is intuitive, but which retains the
power that I think one can gain from defining their own languages
with language specific regular expressions.
One approach is to apply a general regular expressions specified
with --regex to every file passed to ctags, even if ctags would not
normally parse the file because it is not mapped to an extension,
and even if it happens to be a 25MB object file. However, because
ctags has changed to using line-oriented I/O internally to support
regex, and because ctags implements all buffers with no fixed limits
(every buffer is growable, meaning there are no internal line length
limits), it is possible that an object file may have an arbitrary
length line in it, resulting in ctags attempting to malloc an
impossibly long buffer.
Another approach is to apply general regular extensions only to
files already mapped to an extension. Yet a third approach is to get
rid of the --regex option (since it raises the difficult issues
above in addition to the confusion you experienced) and just support
language-specific regular expressions. However, language-specific
regular expressions would require, at minimum, the following options
to actually be applied to a file:
ctags --langdef=mylang --langmap=mylang:.my --mylang-regex=/.../ *
Is this a bit too cumbersome?
Darren Hiebert <darren@...>