Re: [CEDET-devel] CEDET + JSON compilation database?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

So I have got a preliminary implementation of this up on github. Constructive criticism welcome.

https://github.com/randomphrase/ede-compdb

I will write up some proper documentation at some point, but in the meantime here's a quick start.

With CEDET loaded, and with the ede-compdb directory in your load path, load the library with M-x load-library ede-compdb.

Currently you'll need to have a compile_commands.json file to load. The easiest way to get this is to use cmake (see www.cmake.org, also packaged for most linux distros). Use the '-DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE' switch to generate the compile_commands.json file.

The 'test' directory contains an example c++ project which can be used for testing purposes. For example:

cd .../ede-compdb/test
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE .

From here you create the EDE project simply by pointing the :file attribute at the compile_commands.json file. For example:

(ede-add-project-to-global-list (ede-compdb-project "test" :file "compile_commands.json"))

If all goes well, the project should load and the include path should be set correctly for each source file. This file is watched at reloaded if a change in size or mod time is detected.

There is an ert test suite which basically checks the above process works in an automated fashion, although it uses a temp directory as a build directory.

Current limitations/TODOs:

* If you use "Summarize includes current buffer" you will NOT see the system include path for the buffer. The reason is that the include path is set on the target, and not on the project. However, the summarize function only prints out the system include path for the project, and not the target. You can of course use "(ede-system-include-path ede-object)" to check the include path instead.

* There is a failing test case which I can't quite explain. The test is attempting to ensure that the preprocessor symbols are being set correctly. It does this by looking for function tags whose name is set from a preprocessor symbol. So although the function is defined as "HELLO_FOO" in the source file, there is also an externally-defined macro which should redefine that symbol to "HelloFoo". For whatever reason semantic is still seeing the original definition, namely HELLO_FOO. I haven't gotten to the bottom of this.

* It only does very basic parsing of the GCC (or compatible) command line options, and doesn't support any of the more esoteric GCC-specfic ones such as "-imacros", "-idirafter", "-iprefix", etc.

* No support for detection of compiler-defined include paths (similar to semantic/bovine/gcc)

* No support for auto-loading of projects based on the presence of the compile_commands.json file.

* Currently uses the json module for loading the compilation database. We'll keep an eye on this to see what the performance is like, but it looks OK so far.

* The multiple build directory logic alluded to below is nonexistent. As David points out this isn't strictly necessary right now (although it is a show stopper for me at $WORK so I'll be working on it sooner rather than later). However it *does* support having the build directory (ie with the compile_commands.json file) separate from the source directory.

* Somewhat ironically, "compile target" doesn't work yet, but should be simple to implement.

* Calculating the preprocessor directives for a header file is not supported yet. I guess the solution here would be to look up a source file which includes the file in the current project, and use that include path. For example if you were trying to parse "foo.hpp", you might look up "foo.cpp" in the compilation database and use that compilation database entry to parse the .hpp file.

Despite all this, I'm happy with the progress so far. Also feedback definitely still appreciated.

Cheers,

On Nov 12, 2013, at 2:11 PM, David Engster <de...@ra...> wrote:

> Thank you for these detailed explanations. Forgive me when I cannot
> answer in a similarly detailed fashion, but time is sparse at the
> moment. Eric could probably say more about EDE than me, but it's
> pumpkin-throwing season. :-)
> 
> Alastair Rankine writes:
>> This information is very useful for tools like CEDET, as it enables the tool to
>> unambiguously determine the include paths and preprocessor definitions for C
>> and C++ source files. This information is otherwise quite difficult to
>> determine automatically, and most current tools typically require it to be
>> provided redundantly (eg once in the build tool input file and again in an EDE
>> project).
> 
> Yes. I agree such a database would be very useful. I'm not particularly
> fond of tools which wrap themselves around 'make' or similar and
> override CC etc. to get this information, since they often don't play
> well with complicated builds, stuff like distcc and similar.
> 
> Therefore, if this compilation database is trying to establish a
> standard for this ti get support from compilers, so that we don't need
> this hacks anymore, I'm all for it.
> 
>> So how is the compilation database generated? I know of two ways, and there may
>> be others. For CMake-based projects using the GNU Make build tool, there is an
>> option (CMAKE_EXPORT_COMPILE_COMMANDS) which tells CMake to write out a
>> compile_commands.json file along with the generated Makefiles in the build
>> directory. This file contains the entire compilation database for the project.
>> For projects using the Ninja build tool (http://martine.github.io/ninja/), the
>> compilation database can be generated on-demand using the "-t compdb" command,
>> either for the entire project, or for a specific source file.
> 
> I just tried out the tool 'Bear' which does this "wrap around
> Make"-Hack, and it actually worked nicely with gcc:
> 
> https://github.com/rizsotto/Bear
> 
> As you might know from recent discussion, it would be problematic if
> this feature would only work with clang, so this is good news.
> 
>> Firstly, I'm thinking that a new EDE project type would be needed. This is
>> because the include paths and preprocessor symbols would be sourced from the
>> compilation database on-demand, unlike any of the existing project types.
> 
> I agree.
> 
>> Also, the current semantic/bovine/clang support isn't really suitable for reuse
>> here. In fact it seems to be going the opposite direction - using the include
>> paths and defines already in the EDE project, to build a clang command line for
>> performing semantic completion.
> 
> Yes, this is outside of the scope of semantic-clang, which is only there
> to provide completions.
> 
>> 1. Maintain a build directory per build configuration. The build directory is
>> required to access the compilation database either as a file, or by invoking
>> the build tool. A desirable feature would be a custom hook function which would
>> be invoked to locate the build directory for a given configuration. Also it
>> should be possible for the initial build configuration to be determined by
>> searching possible build directory locations. For example, you should be able
>> to say "if there exists a build.dbg directory, use that as the build directory
>> for debug builds, and set the current configuration to debug". I think this
>> type of discovery is going to be necessary so that the compilation database is
>> available for parsing when a file is first loaded.
> 
> My advice would be to start simple: Create an EDE project type which
> accepts a filename which points to an existing compilation
> database. Things like automatic creation of the database and
> configuration detection can come later. Doing things automatically is
> great and all, but the complexity that comes with it is painful.
> 
>> 2. Determine how to access the compilation database, and detect when it
>> changes. This would typically involve watching a file in the current build
>> directory which is updated whenever the compilation database is. For example,
>> it could be the compile_commands.json file, or the rules.ninja file, or
>> something else entirely. This should be customizable with a user option to
>> specify how this is done, and also fall back to simple heuristics (eg if the
>> compile_commands.json file is there, then use it).
> 
> Again: nice to have, can come later. :-)
> 
>> I don't want to use a JSON library if it is just going to build an AST
>> in memory.  Intsead I'd rather parse this in a streaming fashion. The
>> compilation database entries can be stored in a hash table, keyed on
>> the source file name.
> 
> Take a look at json.el that ships with Emacs. I've never used it, but it
> seems to use a hash table for storage, so maybe it does what you need.
> 
>> The compilation database can in theory support more than one
>> compilation command per file, but I think this is unlikely to be an
>> issue when we have a build directory per configuration. If needed in
>> future, we could defer to user hooks to disambiguate cases such as
>> this.
> 
> I think configurations should be supported through an EDE
> 'configurations' slot, in accordance with how other EDE projects work.
> 
>> 4. Implement the various methods required to provide the include paths and
>> preprocessor definitions, eg ede-system-include-path. These can probably be
>> parsed as-needed from the compiler command line found in the hash table, and
>> possibly cached. Initially I think a simple regular expression-based match
>> would be sufficient to parse the compiler command line. These regular
>> expressions would be customizable for different compilers, for example.
>> However, given the compilation database is an LLVM standard, I would expect
>> Clang to be the most commonly-used compiler.
> 
> Clang follows gcc, so you can support both easily. But anyway, things
> like '-I', '-L' and '-D' are pretty much standard, at least on Unix, and
> those are the most important information we need.
> 
>> 5. System include paths automatically added by the compiler (for standard
>> library headers, for example) can be discovered by querying the compiler
>> directly, as we do currently for clang and gcc. These can be stored in a
>> global-scope variable, as they won't vary per project.
> 
> Yes, although getting this information from the compiler is so fast that
> it really doesn't matter if every project does it for itself.
> 
>> If I am correct, it should be possible for this new project type to 'bootstrap'
>> itself without requiring significant manual configuration. Hopefully, given a
>> build directory, and some customizable heuristics, we can discover all we need
>> to be able to useful things with C and C++ source files.
>> 
>> The above items are a minimal useful subset of functionality, and also a
>> suitable base onto which more features can be built.
> 
> As I've written, I think the minimal useful subset is way smaller, more
> like
> 
> (ede-compdb-project "MY-PROJECT" :file "~/project/compilationdb.json")
> 
> which covers all files under "~/project". If I load a file from there,
> the project would look into compilationdb.json and parse things like the
> used compiler, include paths and preprocessor symbols, so that Semantic
> can use them. At first, the user should take care of keeping this
> database up-to-date. If EDE parses the file on-demand, it doesn't even
> have to watch it.
> 
> Cheers,
> David