Re: [Lxr-dev] VHDL support and other stuff
Brought to you by:
ajlittoz
From: Malcolm B. <ma...@br...> - 2001-11-15 05:26:38
|
Hi Robin, Robin Theander wrote: >Malcolm Box wrote: > >>Logically the mappings should be per-language, and ideally Common.pm >>would not depend directly on the installed languages - ie it would not >>hold the list of mappings. The problem is that the ident script doesn't >>know what language each of the returned identifiers is in to display the >>correct string. >> >>Probably the best solution is to create another database table that maps >>a numeric id to a string, and then have the language modules store the >>id number where they now store a character. Then each language module >>can contain the string <-> number mapping, and simply check on >>initalisation that the strings are in the db, adding them if not. >> > >Is'nt it too much hassle to put the strings in db? They can be in the languange >module together with all the other language specific stuff. > >In my little head it goes like this: >1) Each identifier has to know what language it is. > That's the rub - currently each identifier does not store this information. It seems like something that could be stored in the indexes table, and perhaps it would be worth it. >2) Language types are possibly ints allocated and defined somewhere in each >language module (or in generic.conf). >3) Each language module defines its own type to string mapping (as ints instead >of chars?). >4) ident looks for the relevant language mapping in the relevant module to >return the string. > This would potentially be pretty slow - assume you have a common identifier such as "close" which might appear in many different languages. Each ident result returned would involve creating the correct language module and then asking it for the type -> string mapping. You could order the results by language, to reduce the create/destroy count, but this is unlikely to be the order people actually want to see the results in. Given I've got a LXR install running where one common identifier returns over 1000 declarations, I'm not sure I'd want to take the hit of that. What I half-implemented last night does the following: 1) Expand indexes.type to an int field 2) Create a declarations table containing declid (int) and declaration (char(255)) 3) Each language now maps the ctags types output/whatever other type info it has to an int (currently hardwired) 4) Index::index() now stores the type field as an int 5) Ident then joins declarations.declid to indexes.type to get the right string for display. The only difficulty is initialising the mapping in (3). My current plan is to have each language hold its own type strings (e.g. "class", "function definition") and on startup build the mapping string -> int by searching declarations for the string and use the declid if found, else insert the string and use the new declid. For languages using ctags, the initialisation would also build the appropriate ctags char -> declid mapping. This should be pretty fast and is a one-time cost for the language module, which is OK because the Lang modules are only used for genxref & source, both of which have much bigger overheads than that. Note this doesn't require ident to know about Lang::* modules. If we want to record the language the identifier was found in, it can either go in the files table, or in the indexes. Putting it in files implies that there is a 1 to 1 mapping from files to languages, which while currently true may not always be (think webpages with scripting in multiple languages :- ) . Putting it in indexes is a little redundant at the moment, especially since indexes is one of the biggest tables. >This could also make identifiers local to their language, but how should that >be handled when >1) Searching from scratch >2) Displaying the identifier from a link from source (here we know the >language). > Making identifiers carry language info is a good idea - it will help for the source -> ident -> source jump that happens so often, since we will be able to filter to identifiers from the same language (and even possibly order by whether the id is in the same file or directory, which would make it much faster to navigate). ident would then be extended to allow selection of a language when searching, defaulting to all langugages as at present. >I think ident should take an optional language identifier from the URL. This >could be generated from source. > Indeed, that's how I see it working. >Did I miss out on anything here? I haven't been in every dim lit corner of the >code.. > Don't go there without a light, or the grues will get you... >And something in the far dark of my mind says that the changes probably should >be compatible with dbm support. > dbm support doesn't work at the moment anyway - there was a discussion here about dropping it totally soon if no-one is prepared to work on it. The overhead of getting someone to set up a RDBMs is so low that I don't see it as a big issue, not to mention the fact that dbm performance was why 0.3 sucked so much on big repositories. >>I think it will be OK to add a C module to the distribution, provided it >>comes with some reasonable way to build it. My guess (correct me if I'm >>wrong) would be that the parser is pretty much vanilla C with no >>platform dependancies, so it should be easy to make build. I would >>suggest creating a lib/LXR/Lang/VHDL subdir to keep the source and build >>system in. Then those that want VHDL support can build it, and those >>that don't can just comment out the config in lxr.conf that maps files >>to VHDL (and in fact won't ever see a problem unless they have files >>that look like VHDL). >> > >I agree, but... The code skeleton has the following license (the files are from >'93): > * This file is intended not to be used for commercial purposes > * without permission of the University of Twente and permission > * of the University of Dortmund > >I'm going to contact the source to see if it can be GPL'ed or whatever. >If you know any other GPL'ed VHDL parser that's just the yacc skeleton with >grammar, I could hack that up instead. > I'd be very reluctant to let any non-free code into the main distribution. I know glimpse isn't free, but there are moves afoot to replace it (probably with Swish-E2) RSN. I don't know of any other VHDL parser out there, although perhaps VHDL mode from emacs might have something useful? >The parser code btw, required quite many hacks. It was in a very old lex >dialect and gave some trouble with both flex and gcc. It would probably need >some more cleaming to run on non gcc platforms. Again, the largest problem is >probably the license. > The licence is the number one problem. Cleaning up the code so it runs on non-gcc platforms would be good, but it's not essential since gcc is so widely available. Cheers, Malcolm |