Re: [Lxr-dev] VHDL support and other stuff

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Robin,

Robin Theander wrote:

>Malcolm Box wrote:
>
>>Logically the mappings should be per-language, and ideally Common.pm
>>would not depend directly on the installed languages - ie it would not
>>hold the list of mappings.  The problem is that the ident script doesn't
>>know what language each of the returned identifiers is in to display the
>>correct string.
>>
>>Probably the best solution is to create another database table that maps
>>a numeric id to a string, and then have the language modules store the
>>id number where they now store a character.  Then each language module
>>can contain the string <-> number mapping, and simply check on
>>initalisation that the strings are in the db, adding them if not.
>>
>
>Is'nt it too much hassle to put the strings in db? They can be in the languange
>module together with all the other language specific stuff.
>
>In my little head it goes like this:
>1) Each identifier has to know what language it is.
>
That's the rub - currently each identifier does not store this 
information.  It seems like something that could be stored in the 
indexes table, and perhaps it would be worth it.

>2) Language types are possibly ints allocated and defined somewhere in each
>language module (or in generic.conf).
>3) Each language module defines its own type to string mapping (as ints instead
>of chars?).
>4) ident looks for the relevant language mapping in the relevant module to
>return the string.
>
This would potentially be pretty slow - assume you have a common 
identifier such as "close" which might appear in many different 
languages.  Each ident result returned would involve creating the 
correct language module and then asking it for the type -> string 
mapping. You could order the results by language, to reduce the 
create/destroy count, but this is unlikely to be the order people 
actually want to see the results in. Given I've got a LXR install 
running where one common identifier returns over 1000 declarations, I'm 
not sure I'd want to take the hit of that.

What I half-implemented last night does the following:

1) Expand indexes.type to an int field
2) Create a declarations table containing declid (int) and declaration 
(char(255))
3) Each language now maps the ctags types output/whatever other type 
info it has to an int (currently hardwired)
4) Index::index() now stores the type field as an int
5) Ident then joins declarations.declid to indexes.type to get the right 
string for display.

The only difficulty is initialising the mapping in (3).  My current plan 
is to have each language hold its own type strings (e.g. "class", 
"function definition") and on startup build the mapping string -> int by 
searching declarations for the string and use the declid if found, else 
insert the string and use the new declid.  For languages using ctags, 
the initialisation would also build the appropriate ctags char -> declid 
mapping.

This should be pretty fast and is a one-time cost for the language 
module, which is OK because the Lang modules are only used for genxref & 
source, both of which have much bigger overheads than that.  Note this 
doesn't require ident to know about Lang::* modules.

If we want to record the language the identifier was found in, it can 
either go in the files table, or in the indexes.  Putting it in files 
implies that there is a 1 to 1 mapping from files to languages, which 
while currently true may not always be (think webpages with scripting in 
multiple languages :- ) .  Putting it in indexes is a little redundant 
at the moment, especially since indexes is one of the biggest tables.

>This could also make identifiers local to their language, but how should that
>be handled when
>1) Searching from scratch
>2) Displaying the identifier from a link from source (here we know the
>language).
>
Making identifiers carry language info is a good idea - it will help for 
the source -> ident -> source jump that happens so often, since we will 
be able to filter to identifiers from the same language (and even 
possibly order by whether the id is in the same file or directory, which 
would make it much faster to navigate).  ident would then be extended to 
allow selection of a language when searching, defaulting to all 
langugages as at present.

>I think ident should take an optional language identifier from the URL. This
>could be generated from source.
>
Indeed, that's how I see it working.

>Did I miss out on anything here? I haven't been in every dim lit corner of the
>code..
>
Don't go there without a light, or the grues will get you...

>And something in the far dark of my mind says that the changes probably should
>be compatible with dbm support.
>
dbm support doesn't work at the moment anyway - there was a discussion 
here about dropping it totally soon if no-one is prepared to work on it. 
 The overhead of getting someone to set up a RDBMs is so low that I 
don't see it as a big issue, not to mention the fact that dbm 
performance was why 0.3 sucked so much on big repositories.

>>I think it will be OK to add a C module to the distribution, provided it
>>comes with some reasonable way to build it.  My guess (correct me if I'm
>>wrong) would be that the parser is pretty much vanilla C with no
>>platform dependancies, so it should be easy to make build.  I would
>>suggest creating a lib/LXR/Lang/VHDL subdir to keep the source and build
>>system in.  Then those that want VHDL support can build it, and those
>>that don't can just comment out the config in lxr.conf that maps files
>>to VHDL (and in fact won't ever see a problem unless they have files
>>that look like VHDL).
>>
>
>I agree, but... The code skeleton has the following license (the files are from
>'93):
> * This file is intended not to be used for commercial purposes
> * without permission of the University of Twente and permission
> * of the University of Dortmund
>
>I'm going to contact the source to see if it can be GPL'ed or whatever.
>If you know any other GPL'ed VHDL parser that's just the yacc skeleton with
>grammar, I could hack that up instead. 
>
I'd be very reluctant to let any non-free code into the main 
distribution.  I know glimpse isn't free, but there are moves afoot to 
replace it (probably with Swish-E2) RSN.  I don't know of any other VHDL 
parser out there, although perhaps VHDL mode from emacs might have 
something useful?

>The parser code btw, required quite many hacks. It was in a very old lex
>dialect and gave some trouble with both flex and gcc. It would probably need
>some more cleaming to run on non gcc platforms. Again, the largest problem is
>probably the license.
>
The licence is the number one problem.  Cleaning up the code so it runs 
on non-gcc platforms would be good, but it's not essential since gcc is 
so widely available.

Cheers,

Malcolm