Thread: [Lxr-dev] VHDL support and other stuff
Brought to you by:
ajlittoz
From: Robin T. <Rob...@te...> - 2001-11-01 11:20:07
|
Hi there, I'm hacking VHDL support into lxr and have come across a few things I'd like to comment on. I have it fully working now but based on the 0.8 release (I'm firewalled so CVS access in N/A). I'm relying on an external parser (lex&yecc based skeleton I ripped from the net) because VHDL is a pain to parse. The main changes are kept in LXR::Lang::VHDL. A few other changes are necessary though. 1) The %type_names in Common.pm are hashed from chars. With VHDL I have about 20 new types and letter allocation is getting ugly. Is there another way of doing this. I could think of using numbers and an array instead. The db overhead is minimal and the type is referenced few places. I could produce a patch but I'm not dealing with C or C++ files, so I cannot offer to fully test the (e)ctags implementation. I'm also thinking about making this language dependent. Something like major.minor numbers in UNIX devices. Major is the language and minor is the local specific set. 2) The find and search is using $config->sourceroot to remove leading path so it fits to source. However, glimpse has a nasty way of expanding symlinks so the glimpse path and the sourceroot is not the same. I have something like this in mind (hand edited diff against 0.8 ;-): --- ../../lxrsrc/lxr/find Tue Oct 16 22:38:37 2001 +++ find Wed Oct 31 17:17:21 2001 @@ -58,11 +58,11 @@ return; } print("<hr>\n"); + $glimpseroot = $config->glimpseroot; - $sourceroot = $config->sourceroot; while($file = <FILELLISTING>) { + $file =~ s/^$glimpseroot//; - $file =~ s/^$sourceroot//; if($file =~ /$searchtext/) { print(&fileref("$file", "find-file", "/$file"),"<br>\n"); The same applies for search... Is that acceptable to include? 3) Just wondering. How could we go about dealing with an external C based parser. Including it in the project would increase the noise (and portability). Rewriting it into perl would be nice but quite a pain because VHDL is context sensitive and generally stateful (and I like lex and yacc for doing this). Right now it might make sense to keep VHDL out from the releases and offer a language addon. BTW, thanks for all this great work in lxr. Regards. Robin. -- ASIC Design Engineer Tellabs Denmark A/S Direct: +45 4473 2942 rob...@te... |
From: Malcolm B. <ma...@br...> - 2001-11-14 12:55:47
|
Hi Robin, Robin Theander wrote: > I'm hacking VHDL support into lxr and have come across a few things I'd like to > comment on. I have it fully working now but based on the 0.8 release (I'm > firewalled so CVS access in N/A). I'm relying on an external parser (lex&yecc > based skeleton I ripped from the net) because VHDL is a pain to parse. The main > changes are kept in LXR::Lang::VHDL. Cool stuff! As an ex-VHDL hacker myself, I know what you mean about it being a pain to parse. It's great to see a totally new language becoming supported. > A few other changes are necessary though. > > 1) The %type_names in Common.pm are hashed from chars. With VHDL I have about > 20 new types and letter allocation is getting ugly. Is there another way of > doing this. I could think of using numbers and an array instead. The db > overhead is minimal and the type is referenced few places. > I could produce a patch but I'm not dealing with C or C++ files, so I cannot > offer to fully test the (e)ctags implementation. > I'm also thinking about making this language dependent. Something like > major.minor numbers in UNIX devices. Major is the language and minor is the > local specific set. I agree, the current scheme is broken and needs to be replaced. It doesn't even work properly with ctags, since the meaning of the letters ctags outputs is not constant across languages. See the bug at http://sourceforge.net/tracker/index.php?func=detail&aid=476695&group_id=27350&atid=390117 for an example of what's wrong. Logically the mappings should be per-language, and ideally Common.pm would not depend directly on the installed languages - ie it would not hold the list of mappings. The problem is that the ident script doesn't know what language each of the returned identifiers is in to display the correct string. Probably the best solution is to create another database table that maps a numeric id to a string, and then have the language modules store the id number where they now store a character. Then each language module can contain the string <-> number mapping, and simply check on initalisation that the strings are in the db, adding them if not. > 2) The find and search is using $config->sourceroot to remove leading path so > it fits to source. However, glimpse has a nasty way of expanding symlinks so > the glimpse path and the sourceroot is not the same. I have something like this > in mind (hand edited diff against 0.8 ;-): > > --- ../../lxrsrc/lxr/find Tue Oct 16 22:38:37 2001 > +++ find Wed Oct 31 17:17:21 2001 > @@ -58,11 +58,11 @@ > return; > } > print("<hr>\n"); > + $glimpseroot = $config->glimpseroot; > - $sourceroot = $config->sourceroot; > while($file = <FILELLISTING>) { > + $file =~ s/^$glimpseroot//; > - $file =~ s/^$sourceroot//; > if($file =~ /$searchtext/) { > print(&fileref("$file", "find-file", "/$file"),"<br>\n"); > > The same applies for search... Is that acceptable to include? Yes, that's fine to include I think. Is glimpse support working for you - I think it's actually broken against a recent version of glimpse? > 3) Just wondering. How could we go about dealing with an external C based > parser. Including it in the project would increase the noise (and portability). > Rewriting it into perl would be nice but quite a pain because VHDL is context > sensitive and generally stateful (and I like lex and yacc for doing this). > Right now it might make sense to keep VHDL out from the releases and offer a > language addon. I think it will be OK to add a C module to the distribution, provided it comes with some reasonable way to build it. My guess (correct me if I'm wrong) would be that the parser is pretty much vanilla C with no platform dependancies, so it should be easy to make build. I would suggest creating a lib/LXR/Lang/VHDL subdir to keep the source and build system in. Then those that want VHDL support can build it, and those that don't can just comment out the config in lxr.conf that maps files to VHDL (and in fact won't ever see a problem unless they have files that look like VHDL). > BTW, thanks for all this great work in lxr. Glad you like it. Thanks for all the work you've been doing adding to LXR - without contributions like yours this project wouldn't be half as advanced. Cheers, Malcolm |
From: Robin T. <Rob...@te...> - 2001-11-14 15:05:27
|
Hi Malcom, Malcolm Box wrote: > I agree, the current scheme is broken and needs to be replaced. It > doesn't even work properly with ctags, since the meaning of the letters > ctags outputs is not constant across languages. See the bug at > http://sourceforge.net/tracker/index.php?func=detail&aid=476695&group_id=27350&atid=390117 > for an example of what's wrong. I'm not using LXR with C code, but I noticed the mess when reading the man page for ctags. > Logically the mappings should be per-language, and ideally Common.pm > would not depend directly on the installed languages - ie it would not > hold the list of mappings. The problem is that the ident script doesn't > know what language each of the returned identifiers is in to display the > correct string. > > Probably the best solution is to create another database table that maps > a numeric id to a string, and then have the language modules store the > id number where they now store a character. Then each language module > can contain the string <-> number mapping, and simply check on > initalisation that the strings are in the db, adding them if not. Is'nt it too much hassle to put the strings in db? They can be in the languange module together with all the other language specific stuff. In my little head it goes like this: 1) Each identifier has to know what language it is. 2) Language types are possibly ints allocated and defined somewhere in each language module (or in generic.conf). 3) Each language module defines its own type to string mapping (as ints instead of chars?). 4) ident looks for the relevant language mapping in the relevant module to return the string. 5) The indexfile function in each module should insert the type number and the language number. This could also make identifiers local to their language, but how should that be handled when 1) Searching from scratch 2) Displaying the identifier from a link from source (here we know the language). I think ident should take an optional language identifier from the URL. This could be generated from source. Did I miss out on anything here? I haven't been in every dim lit corner of the code.. And something in the far dark of my mind says that the changes probably should be compatible with dbm support. > Yes, that's fine to include I think. Is glimpse support working for you > - I think it's actually broken against a recent version of glimpse? I found a glimpse RPM version 4.12.5 from somewhere and it runs fine with the path fix. > I think it will be OK to add a C module to the distribution, provided it > comes with some reasonable way to build it. My guess (correct me if I'm > wrong) would be that the parser is pretty much vanilla C with no > platform dependancies, so it should be easy to make build. I would > suggest creating a lib/LXR/Lang/VHDL subdir to keep the source and build > system in. Then those that want VHDL support can build it, and those > that don't can just comment out the config in lxr.conf that maps files > to VHDL (and in fact won't ever see a problem unless they have files > that look like VHDL). I agree, but... The code skeleton has the following license (the files are from '93): * This file is intended not to be used for commercial purposes * without permission of the University of Twente and permission * of the University of Dortmund I'm going to contact the source to see if it can be GPL'ed or whatever. If you know any other GPL'ed VHDL parser that's just the yacc skeleton with grammar, I could hack that up instead. The parser code btw, required quite many hacks. It was in a very old lex dialect and gave some trouble with both flex and gcc. It would probably need some more cleaming to run on non gcc platforms. Again, the largest problem is probably the license. Regards, Robin. -- ASIC Design Engineer Tellabs Denmark A/S |
From: Malcolm B. <ma...@br...> - 2001-11-15 05:26:38
|
Hi Robin, Robin Theander wrote: >Malcolm Box wrote: > >>Logically the mappings should be per-language, and ideally Common.pm >>would not depend directly on the installed languages - ie it would not >>hold the list of mappings. The problem is that the ident script doesn't >>know what language each of the returned identifiers is in to display the >>correct string. >> >>Probably the best solution is to create another database table that maps >>a numeric id to a string, and then have the language modules store the >>id number where they now store a character. Then each language module >>can contain the string <-> number mapping, and simply check on >>initalisation that the strings are in the db, adding them if not. >> > >Is'nt it too much hassle to put the strings in db? They can be in the languange >module together with all the other language specific stuff. > >In my little head it goes like this: >1) Each identifier has to know what language it is. > That's the rub - currently each identifier does not store this information. It seems like something that could be stored in the indexes table, and perhaps it would be worth it. >2) Language types are possibly ints allocated and defined somewhere in each >language module (or in generic.conf). >3) Each language module defines its own type to string mapping (as ints instead >of chars?). >4) ident looks for the relevant language mapping in the relevant module to >return the string. > This would potentially be pretty slow - assume you have a common identifier such as "close" which might appear in many different languages. Each ident result returned would involve creating the correct language module and then asking it for the type -> string mapping. You could order the results by language, to reduce the create/destroy count, but this is unlikely to be the order people actually want to see the results in. Given I've got a LXR install running where one common identifier returns over 1000 declarations, I'm not sure I'd want to take the hit of that. What I half-implemented last night does the following: 1) Expand indexes.type to an int field 2) Create a declarations table containing declid (int) and declaration (char(255)) 3) Each language now maps the ctags types output/whatever other type info it has to an int (currently hardwired) 4) Index::index() now stores the type field as an int 5) Ident then joins declarations.declid to indexes.type to get the right string for display. The only difficulty is initialising the mapping in (3). My current plan is to have each language hold its own type strings (e.g. "class", "function definition") and on startup build the mapping string -> int by searching declarations for the string and use the declid if found, else insert the string and use the new declid. For languages using ctags, the initialisation would also build the appropriate ctags char -> declid mapping. This should be pretty fast and is a one-time cost for the language module, which is OK because the Lang modules are only used for genxref & source, both of which have much bigger overheads than that. Note this doesn't require ident to know about Lang::* modules. If we want to record the language the identifier was found in, it can either go in the files table, or in the indexes. Putting it in files implies that there is a 1 to 1 mapping from files to languages, which while currently true may not always be (think webpages with scripting in multiple languages :- ) . Putting it in indexes is a little redundant at the moment, especially since indexes is one of the biggest tables. >This could also make identifiers local to their language, but how should that >be handled when >1) Searching from scratch >2) Displaying the identifier from a link from source (here we know the >language). > Making identifiers carry language info is a good idea - it will help for the source -> ident -> source jump that happens so often, since we will be able to filter to identifiers from the same language (and even possibly order by whether the id is in the same file or directory, which would make it much faster to navigate). ident would then be extended to allow selection of a language when searching, defaulting to all langugages as at present. >I think ident should take an optional language identifier from the URL. This >could be generated from source. > Indeed, that's how I see it working. >Did I miss out on anything here? I haven't been in every dim lit corner of the >code.. > Don't go there without a light, or the grues will get you... >And something in the far dark of my mind says that the changes probably should >be compatible with dbm support. > dbm support doesn't work at the moment anyway - there was a discussion here about dropping it totally soon if no-one is prepared to work on it. The overhead of getting someone to set up a RDBMs is so low that I don't see it as a big issue, not to mention the fact that dbm performance was why 0.3 sucked so much on big repositories. >>I think it will be OK to add a C module to the distribution, provided it >>comes with some reasonable way to build it. My guess (correct me if I'm >>wrong) would be that the parser is pretty much vanilla C with no >>platform dependancies, so it should be easy to make build. I would >>suggest creating a lib/LXR/Lang/VHDL subdir to keep the source and build >>system in. Then those that want VHDL support can build it, and those >>that don't can just comment out the config in lxr.conf that maps files >>to VHDL (and in fact won't ever see a problem unless they have files >>that look like VHDL). >> > >I agree, but... The code skeleton has the following license (the files are from >'93): > * This file is intended not to be used for commercial purposes > * without permission of the University of Twente and permission > * of the University of Dortmund > >I'm going to contact the source to see if it can be GPL'ed or whatever. >If you know any other GPL'ed VHDL parser that's just the yacc skeleton with >grammar, I could hack that up instead. > I'd be very reluctant to let any non-free code into the main distribution. I know glimpse isn't free, but there are moves afoot to replace it (probably with Swish-E2) RSN. I don't know of any other VHDL parser out there, although perhaps VHDL mode from emacs might have something useful? >The parser code btw, required quite many hacks. It was in a very old lex >dialect and gave some trouble with both flex and gcc. It would probably need >some more cleaming to run on non gcc platforms. Again, the largest problem is >probably the license. > The licence is the number one problem. Cleaning up the code so it runs on non-gcc platforms would be good, but it's not essential since gcc is so widely available. Cheers, Malcolm |
From: Robin T. <Rob...@te...> - 2001-11-15 13:37:25
|
Hi Malcolm, > What I half-implemented last night does the following: I do not see it in CVS yet ;-) BTW, is there any way of getting head revision out of the SF CVS by tarball. > 1) Expand indexes.type to an int field > 2) Create a declarations table containing declid (int) and declaration > (char(255)) If you use two ints (one for the language and one for the id) each language can have its own "namespace" (my original major, minor idea). The impact is small if using two small ints. > 3) Each language now maps the ctags types output/whatever other type > info it has to an int (currently hardwired) > 4) Index::index() now stores the type field as an int > 5) Ident then joins declarations.declid to indexes.type to get the right > string for display. > > If we want to record the language the identifier was found in, it can > either go in the files table, or in the indexes. Putting it in files > implies that there is a 1 to 1 mapping from files to languages, which > while currently true may not always be (think webpages with scripting in > multiple languages :- ) . Putting it in indexes is a little redundant > at the moment, especially since indexes is one of the biggest tables. Given the current bindings from filename to language module, it makes sense to put it into the files table. Your webpage example is pretty scary, but a special language module should take care of this. Unless the page is preprocessed in some hairy way, there's a parser going to read it at some time. > The licence is the number one problem. Cleaning up the code so it runs > on non-gcc platforms would be good, but it's not essential since gcc is > so widely available. I checked up on the Alliance toolkit. The parser is divided into two. Behavorial and structural, and there's lot of language (like variable) not supported. It is a parser that is almost but not quite entirely unlike a VHDL parser... The good news is that the VAUL parser is derived from the parser i also used, so all I have to do is to recreate my special parser from the VAUL source and we're in GPL. Robin. -- ASIC Design Engineer Tellabs Denmark A/S Direct: +45 4473 2942 rob...@te... |
From: Malcolm B. <ma...@br...> - 2001-11-17 15:34:34
|
Robin Theander wrote: > > Hi Malcolm, > > > What I half-implemented last night does the following: > > I do not see it in CVS yet ;-) > BTW, is there any way of getting head revision out of the SF CVS by tarball. That's cos it's only half implemented and thus not checked in :-) I don't know how to get the HEAD revision out of the tarball - I assumed that if you untarred the tarball as a CVS repository, you could use the normal CVS commands to retreive the head, but I've never tried it. > > 1) Expand indexes.type to an int field > > 2) Create a declarations table containing declid (int) and declaration > > (char(255)) > > If you use two ints (one for the language and one for the id) each language can > have its own "namespace" (my original major, minor idea). The impact is small > if using two small ints. Good point. I've changed my implementation to use two numbers, one for the language code and one for the string id within the language. This also stops two languages sharing the string for say, "class", which is good if one then wants to change it. > Given the current bindings from filename to language module, it makes sense to > put it into the files table. Your webpage example is pretty scary, but a > special language module should take care of this. Unless the page is > preprocessed in some hairy way, there's a parser going to read it at some time. I'm going to put it in indexes since this is the logically correct place. In the webpage example, I would expect a special language module to take care of this, but it might then record the different languages found under multiple lang ids. Possibly by delgating the work to different Lang::* modules. e.g. Webpage.pm -> split file into different languages -> pass to X.pm & Y.pm -> index. > I checked up on the Alliance toolkit. The parser is divided into two. > Behavorial and structural, and there's lot of language (like variable) not > supported. It is a parser that is almost but not quite entirely unlike a VHDL > parser... Not so good. > The good news is that the VAUL parser is derived from the parser i also used, > so all I have to do is to recreate my special parser from the VAUL source and > we're in GPL. That is good news. Good luck with the extraction. Malcolm |
From: Malcolm B. <ma...@br...> - 2001-11-15 07:21:15
|
Hi, I guess you've already looked at them, but a websearch did find some other parsers, including a Perl one at http://www.cpan.org/modules/by-module/Hardware/ though it sounds *slow*. There's a VHDL compiler at Alliance, http://www-asim.lip6.fr/alliance/ which is GPL'ed and thus might have a grammer that you could rip out. And there's VAUL from http://www.freehdl.seul.org/frontend.html which claims to be a flex/bison job. Malcolm |
From: Robin T. <Rob...@te...> - 2001-11-15 08:07:55
|
Hi Malcom, Malcolm Box wrote: > I guess you've already looked at them, but a websearch did find some > other parsers, including a Perl one at > http://www.cpan.org/modules/by-module/Hardware/ though it sounds *slow*. Yup, the language definition is so huge and complex that a single small entity with a dummy arch takes 2+ minutes to parse. > There's a VHDL compiler at Alliance, http://www-asim.lip6.fr/alliance/ > which is GPL'ed and thus might have a grammer that you could rip out. Hmm, "my" parser was actually derived from an Alliance toolkit way back. I'll have a second look. > And there's VAUL from http://www.freehdl.seul.org/frontend.html which > claims to be a flex/bison job. I looked at the VAUL. It's a big thing and clutched togethed from several projects. I expected the job ripping and cleaning the parser to be bigger than starting over. Then I found the current skeleton... Thanks anyway. Robin. -- ASIC Design Engineer Tellabs Denmark A/S Direct: +45 4473 2942 rob...@te... |
From: Malcolm B. <ma...@br...> - 2001-11-18 03:38:20
|
Robin Theander wrote: > I'm going to contact the source to see if it can be GPL'ed or whatever. > If you know any other GPL'ed VHDL parser that's just the yacc skeleton with > grammar, I could hack that up instead. A random thought that strayed across my neurones - what would be the effort to get your parser integrated into ctags? I think ctags has a reasonably well-defined extension system, and if it was in ctags then (a) all the LXR support would be in place and (b) all the other tools like emacs/vi etc that use ctags would also benefit. Malcolm |
From: Robin T. <Rob...@te...> - 2001-11-19 08:52:47
|
Hi Malcolm, Malcolm Box wrote: > A random thought that strayed across my neurones - what would be the > effort to get your parser integrated into ctags? I think ctags has a > reasonably well-defined extension system, and if it was in ctags then > (a) all the LXR support would be in place and (b) all the other tools > like emacs/vi etc that use ctags would also benefit. I also crossed my mind. I can think of one problem. The lex/yacc parser is not very "robust". If the grammar is wrong the parser quits. I don't know what is required for a ctags parser. If it should continue parsing to grab as many idents as possible even in case of syntax errors, my parser has a problem. And second, I already have the new rewritten GPL'ed parser running. The lxr perl overhead is small (when the new db structure is implemented). Only Lang/VHDL.pm should be added beside the parser which is 3 files (Makefile, vhdl.lex and vhdl.yacc). When I get bored I'll take a look at ctags. Robin. -- ASIC Design Engineer Tellabs Denmark A/S Direct: +45 4473 2942 rob...@te... |