Thread: [Ctags] Ctags-4.1pre2 available
Brought to you by:
dhiebert
From: Darren H. <da...@hi...> - 2000-12-07 05:14:26
|
After too long a time, I have finally put together a prerelease of the upcoming 4.1 version of Exuberant Ctags. The highlights of this release are: - Added support for Posix regular expressions. - Added support for two new languages (ASP and PHP), bringing the total number of supported languages to 18. - Added ability to define user-defined languages using regular expressions. - Extensively redesigned code to simplify the addition of new parsers. The result: only one symbol needs to be added to one ctags include file to add a new custom parser. - Bug fixes. See the NEWS file in the release for a complete list of changes. I hope you folks find these changes useful. You can find it at: ftp://ctags.sourceforge.net/pub/ctags/ctags-4.1pre2.tar.gz I will raise some questions soon on the list about certains choices I would like to make. -- Darren Hiebert <da...@hi...> http://darren.hiebert.com |
From: Neil B. <nei...@rd...> - 2000-12-07 08:35:52
|
Darren Hiebert wrote: > After too long a time, I have finally put together a prerelease of > the upcoming 4.1 version of Exuberant Ctags. > > ftp://ctags.sourceforge.net/pub/ctags/ctags-4.1pre2.tar.gz It's there, but gives a 'permission denied' error upon RETR! -- =====================- http://www.racaldefence.com/ -==================== Neil Bird | If this .signature | work mailto:nei...@rd... | looks pants, then | $> cd /pub personal mailto:ne...@fn... | stop using Outlook! | $> more beer |
From: Darren H. <da...@hi...> - 2000-12-07 16:25:24
|
Neil Bird wrote: > > After too long a time, I have finally put together a prerelease of > > the upcoming 4.1 version of Exuberant Ctags. > > > > ftp://ctags.sourceforge.net/pub/ctags/ctags-4.1pre2.tar.gz > > It's there, but gives a 'permission denied' error upon RETR! Mea culpa. After uploading it, I forgot to change the permissions to make it world readable. I have fixed that now. Sorry, folks. -- Darren Hiebert <da...@hi...> http://darren.hiebert.com |
From: Paul S. <pa...@to...> - 2000-12-07 16:43:34
|
Hello, > It's there, but gives a 'permission denied' error upon RETR! It seems to be working now, I just downloaded it. Regards, Paul. |
From: Johannes Z. <joh...@ze...> - 2000-12-09 21:52:31
|
On Wed, Dec 06, 2000 at 11:14:19PM -0600, Darren Hiebert wrote: > You can find it at: > > ftp://ctags.sourceforge.net/pub/ctags/ctags-4.1pre2.tar.gz > > I will raise some questions soon on the list about certains choices > I would like to make. I had to comment out one line in fortran.c:290 Ancestors.list [Ancestors.count].filePosition = 0; to get it compiled. You can't do this assignment as fpos_t is not defined to be a primitive type. From `man fgetpos': [...] ... On some non-UNIX systems an fpos_t object may be a complex object and these routines may be the only way to portably reposition a text stream. [...] while I'm on Linux. Probably the large file support made this beeing not a long any more. From <stdio.h>: #ifndef __USE_FILE_OFFSET64 typedef _G_fpos_t fpos_t; #else typedef _G_fpos64_t fpos_t; #endif #ifdef __USE_LARGEFILE64 typedef _G_fpos64_t fpos64_t; #endif <_G_config.h>: typedef struct { __off_t __pos; __mbstate_t __state; } _G_fpos_t; typedef struct { __off64_t __pos; __mbstate_t __state; } _G_fpos64_t; -- Johannes |
From: Johannes Z. <joh...@ze...> - 2000-12-09 21:52:32
|
On Wed, Dec 06, 2000 at 11:14:19PM -0600, Darren Hiebert wrote: > - Added support for Posix regular expressions. I didn't manage to get this working. Would you mind giving an example (maybe also in the man page) ? -- I tried as very simple example ctags --regex='/.*Darren.*/a/e' QUOTES but got an empty tags file. -- Johannes |
From: Darren H. <da...@hi...> - 2000-12-10 06:53:23
|
Johannes Zellner wrote: > On Wed, Dec 06, 2000 at 11:14:19PM -0600, Darren Hiebert wrote: > > - Added support for Posix regular expressions. > > I didn't manage to get this working. Would you mind giving an example > (maybe also in the man page) ? -- I tried as very simple example > > ctags --regex='/.*Darren.*/a/e' QUOTES > > but got an empty tags file. Well, this is on of the issues I wanted to discuss on the list. Bear in mind that ctags attempts to determine the language of a file by examining its extension (the part of the file name following the last "."). Ctags has a built-in list of extension and which languages they are mapped to (customizable using the --langmap option). For each file examined by ctags it checks to see if the file is mapped to a lanuage and, if so, invokes the parser for the mapped language on the file. If the extension of the file is not mapped to a language, that file is ignored (this allows you to run "ctags *" or "ctags -R" and ignore .o files, etc.), making ctags pretty automatic in most cases. In the case you cite, QUOTES has a null extension -- and a null extension is not mapped to any language, causing the file to be ignored. Obviously, this is does not meet with what you expected. When I was implementing regex in ctags, I first took the approach that Gnu etags takes: every file parsed by etags is run through both its parser and any regex expressions defined. It then occurred to me that this approach limited the usefulness of regex, since a particular regular expression may only make sense for a particular kind of file. Then the idea occured to me that a user could define a set of regular expressions for each supported language, allowing you to write a regular expression that applied only to Perl files, for example (e.g. using "--perl-regex=/.../.../"). This then led me to the idea that I could then allow the user to define arbitrary languages, each with its own regular expressions, then map those user-defined languages to file extensions. This means a user could have support for any language for which they were willing to write expressions. Now this left me with the question as to whether it was appropriate to support both language-specific regular expressions *and* regular expressions which would be applied to every file parsed by ctags (note that this says "parsed by ctags", not "passed to ctags", because you could end up with a mess if ctags runs your object files through regular expressions). The result is that both are in there now. To support this, I added an internal psuedo-language, "regex", not mapped to any file extensions, meaning that your example would work if you had supplied either of the options "--language-force=regex" or "--langmap=regex:." to ctags. So I ask our readership for some ideas as to how to apply regular expressions in a manner which is intuitive, but which retains the power that I think one can gain from defining their own languages with language specific regular expressions. One approach is to apply a general regular expressions specified with --regex to every file passed to ctags, even if ctags would not normally parse the file because it is not mapped to an extension, and even if it happens to be a 25MB object file. However, because ctags has changed to using line-oriented I/O internally to support regex, and because ctags implements all buffers with no fixed limits (every buffer is growable, meaning there are no internal line length limits), it is possible that an object file may have an arbitrary length line in it, resulting in ctags attempting to malloc an impossibly long buffer. Another approach is to apply general regular extensions only to files already mapped to an extension. Yet a third approach is to get rid of the --regex option (since it raises the difficult issues above in addition to the confusion you experienced) and just support language-specific regular expressions. However, language-specific regular expressions would require, at minimum, the following options to actually be applied to a file: ctags --langdef=mylang --langmap=mylang:.my --mylang-regex=/.../ * Is this a bit too cumbersome? -- Darren Hiebert <da...@hi...> http://darren.hiebert.com |
From: Johannes Z. <joh...@ze...> - 2000-12-10 09:34:31
|
On Sun, Dec 10, 2000 at 12:52:35AM -0600, Darren Hiebert wrote: [...] > on the file. If the extension of the file is not mapped to a > language, that file is ignored (this allows you to run "ctags *" or > "ctags -R" and ignore .o files, etc.), making ctags pretty automatic > in most cases. > > In the case you cite, QUOTES has a null extension -- and a null > extension is not mapped to any language, causing the file to be > ignored. Obviously, this is does not meet with what you expected. [...] I think ctags should behave different if invoked with or w/o file arguments. I certainly expect to get QUOTES parsed if I explicitely give QUOTES as file argument. I agree that QUOTES shouldn't be parsed if `ctags -R' is used. I wouldn't expect a file which is given as command line argument to be ignored! What about an option to specify which suffixes should be ignored ? -- like `wildignore' in vim ? --ignore=so,o,gz,bz2,tgz Furthermore ctags could ignore binary files for regex, I think it's not too hard to make a guess about if a file is binary. But as I've said: both a suffix-ignore switch and checking for binary files should only be applied if no file arguments are given. Giving files as command line arguments should overrule all implicit rules IMHO, it's up to the user and if he invokes ctags with `libc.so' as file argument, ctags /should/ parse libc.so. -- Johannes |
From: Darren H. <da...@hi...> - 2000-12-11 04:22:29
|
On Sun, 10 Dec 2000, Johannes Zellner wrote: > I think ctags should behave different if invoked with or w/o file > arguments. I certainly expect to get QUOTES parsed if I explicitely > give QUOTES as file argument. I agree that QUOTES shouldn't be parsed > if `ctags -R' is used. I wouldn't expect a file which is given as > command line argument to be ignored! > > What about an option to specify which suffixes should be ignored ? > -- like `wildignore' in vim ? > --ignore=so,o,gz,bz2,tgz > Furthermore ctags could ignore binary files for regex, I think it's > not too hard to make a guess about if a file is binary. > > But as I've said: both a suffix-ignore switch and checking for binary > files should only be applied if no file arguments are given. Giving > files as command line arguments should overrule all implicit rules > IMHO, it's up to the user and if he invokes ctags with `libc.so' as > file argument, ctags /should/ parse libc.so. Under normal circumstances, I would agree with you. However, remember that up until now (disregarding the experimental --regex option for the moment) parsing a file has no meaning for ctags unless it can determine the language (i.e. context) of the file, because it has no way of determining what "parsing" it means. It must be able to determine the language of the file so that it can know which parser to apply to the file. Therefore ctags has no choice but to ignore a file whose language it cannot determine. Now, the only reason your argument has any validity at all is because of the experimental --regex option. Note that I am questioning whether this option makes sense, being an anomoly for ctags in that it is language independent, and because it is now possible for a user to specify language-dependent regular expressions. It is possible to apply regular expressions to all files considered, even if they are not a recognized language. But is there really a practical reason to do this, for which language-dependent regular expressions would not be more appropriate? The only reason I could see at the moment is that it would take less options than defining a language and mapping it for an arbitrary set of files. > Furthermore ctags could ignore binary files for regex, I think it's > not too hard to make a guess about if a file is binary. It does seem hard to me. If one assumes that non-binary code is constrained to the ASCII character set (i.e. 0-127), then it is as easy as you suggest. However, since must code now contains characters in European langauges, we find that the entire range of bytes values (0-255) are now possible in a non-binary file. Is there an algorithm you are familiar with that can determine this? And if so, how far into the file can one be before they determine whether or not it is a binary file? Have you ever mistakenly run grep over a set of files in the current directory using "grep regex *"? Often, the output will destroy the state your xterm, leaving the only resolution to close it and open a new one. Grep cannot determine which files should be ignored. I doubt I could do better. One can try use the magic numbers contained in the file, but this is heavily platform-dependent. -- Darren Hiebert <da...@hi...> http://darren.hiebert.com |
From: Neil B. <nei...@rd...> - 2000-12-12 09:41:30
|
Johannes Zellner wrote: > I wouldn't expect a file which is given as command line argument to be > ignored! I can sympathise with your argument, but I think it's essentially best the way it is, as otherwise you would lose the power of 'ctags *' - remember that ctags has /zero/ way of knowing whether the files specified *were* specified by hand, or globbed in by the shell. What if you had a script jumping through a list of dirs. doing 'ctags *', and one had a file you wished to ignore? While fignore methodology does have its uses, in this case I can't see it being the best solution, since there's always going to be 'yet another forgotten suffix' to exclude, and the object with ctags is to 'pick *just* the correct files', as opposed to 'try to exclude superfluous files' - subtly different. > But as I've said: both a suffix-ignore switch and checking for binary > files should only be applied if no file arguments are given. Giving > files as command line arguments should overrule all implicit rules > IMHO, it's up to the user and if he invokes ctags with `libc.so' as > file argument, ctags /should/ parse libc.so. Again, it's not that easy because of globbing. I suppose initially that maybe the best answer might be an '--all' option to force ctags to always look at all files (or maybe even include the --fignore= options, and thave the presence of *that* mean "process all files you see, bar these [and objs?]). I'll be checking out the release later - I've not had chance yet - so if I come up with anything, I'll post it. -- =====================- http://www.thalesgroup.com/ -===================== Neil Bird | If this .signature | work mailto:nei...@rd... | looks pants, then | $> cd /pub personal mailto:ne...@fn... | stop using Outlook! | $> more beer |
From: Darren H. <da...@hi...> - 2000-12-13 04:45:50
|
On Tue, 12 Dec 2000, Neil Bird wrote: > Johannes Zellner wrote: > > But as I've said: both a suffix-ignore switch and checking for binary > > files should only be applied if no file arguments are given. Giving > > files as command line arguments should overrule all implicit rules > > IMHO, it's up to the user and if he invokes ctags with `libc.so' as > > file argument, ctags /should/ parse libc.so. > > Again, it's not that easy because of globbing. I suppose initially that > maybe the best answer might be an '--all' option to force ctags to always > look at all files... Actually, ctags already has an option, --language-force, which allows the user to force ctags to parse every file considered as a particular language. This means that Johannes could get what he wants if he entered: ctags --regex=/.../ --langauge-force=regex <file-list> in which case, no files would be ignored. I believe this is adequate. My question remains: does it make sense to have a --regex option, or is it adequate to define an extemporaneous language, "mylang", for this purpose with the language definition mechanism: ctags --langdef=tmp --tmp-regex=/.../ --language-force=tmp * Perhaps even better, I could make the appearance of a --<LANG>-regex option automatically define the language <LANG>, thus making the --langdef option unnecessary. The problem with this is if one mistakenly misspells a language name, they will never know it: ctags --ruby-regex=/111/ --rudy-regex=/222/ --langmap=ruby:.rub *.rub Could lead the user on a wild goose chase trying to find what is wrong with the /222/ regular expression, when there is actually nothing wrong with it, but due to the fact that /222/ is a regular expression was defined for a language "rudy" which is not mapped to any extensions. Upon reflection, I see too many problems resulting from automatic definition. There are reasons that languages require explicit definition of variables. I just have a problem seeing the value of a general --regex option that is applied to all files already parsed not only by a language-specific parser but also possibly language-specific regular expressions. I believe that the --regex option opens up too many avenues for confusion and contradiction. -- Darren Hiebert <da...@hi...> http://darren.hiebert.com |