Re: [cedet-semantic] A few questions about the semantic API

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Thanks for the prompt and detailed reply!

On Mon, Jan 23, 2006 at 12:46:13PM -0500, Eric M. Ludlam wrote:
>   Your project ideal sounds pretty cool.
>
> 1)
> 
>   The best place to optimize for speed in semantic at the moment is in
> 'semanticdb'.  The larger a project gets, the slower lookups become
> because of the many scattered databases.  (1 per directory.)

I see. There must also be some slow-down when there are a large number
of completion candidates (at least, there is on my computer). Or maybe
I'm hitting the database issue.

>   Semamticdb uses tables and project databases via EIEIO which allows
> full subclassing.  Thus, you can make a new database class (1 per
> directory at the moment) and a new database table (1 per file, many
> per database.) These implementations can store and search databases
> via virtual methods.

I hadn't though about wrapping my data structures to make them look
like database tables. It might be the way to go, or it might not. One
downside is that my data structures are designed for looking up
completions[*], and will be slightly slower for just looking up a
single tag (O(log n) instead of O(1), assuming the current table
lookup is O(1)). Might not be a big deal though.

([*] they're a hybrid between a ternary search tree and a hash table,
which provides a decent speed/memory tradeoff).

Another downside is that I imagine it would be more difficult to
divide up the tags into different "tables" so that the table you look
in only contains relevant completions (e.g. one table per local scope,
one table of methods and properties per class, etc.). I could be wrong
here since I haven't looked into the internals of semantic's tag
tables.

A final downside is that it'll be more work to implement, and I'd like
to start simple and improve things later. (This is a hobby project, so
I don't have infinite coding time :) Writing up my thesis will
probably cut into a lot of that time in the near future too...)

>   I don't know if that is the target of your tool or not.  At the
> moment it is even possible for some new global Db to coincide with the
> existing imple, and search throttles can choose between them.

I was thinking more of inserting an extra layer between completion
lookup and the tag tables/databases, rather than implementing
something at the database level, for the reasons above. This
interfaces better with the way my package works at the moment.

I'd use the existing lookup functions the first time you try to
complete a string, but cache the results in my data structures (one
per scope, class, etc., whatever's relevant for the completion being
looked up), so that subsequent lookup of the same kind (same scope, or
methods/properties from the same class, etc.) is much faster. I'll
need the analyser to decide which cache to look in, and how to filter
the results (e.g. by type).

So long as it's possible to incrementally update these caches,
subsequent completions will always be faster than looking things up
directly in the tables or databases. Obviously, this would speed up
searching across multiple databases for completions, since it would
only be done once. You pay a memory cost, of course, but if that
proves an issue I can worry later about flushing less frequently used
caches, or only caching completions that have been used at least
once. I already have code to dump the data structures to disc, so they
can be made persistent to avoid recreating them each time you visit a
file.

But I'll definitely think about writing a db subclass as an
alternative approach.

The advantage of storing completions in these structures isn't only
speed, but the fact that it allows emacs to learn how often the
completions are used, and use that information to prioritise which
ones it offers - very useful when there are many. All that's already
in place.

That was the original goal of my package: to do predictive completion
for English text. Now that it works well for plain text, and also for
LaTeX and HTML (using regexps to parse the buffer), and I had a few
queries about using it for programming, I figured I'd try and get it
to work with semantic (I don't fancy trying to write a parser in terms
of regexps - in fact, I don't fancy trying to write a parser at all!
Thanks to the great work in semantic I don't have to).

> 2)
>
>   It is possible to override the impl of the analyzer, but that is
> really for mode-specific activities.

Right. I definitely didn't want to re-implement the analyser. I want
to use semantic precisely because you've already implemented the
analyser and I don't have to ;)

>   For your question on the hooks... I must profess I don't really
> remember.  However, semantic-decorate-mode does use the hooks, and it
> may prove useful to see what it does.  If you need more hooks with
> better detail, we can certainly add some.

I'll take a look. Thanks for the pointer.

>   One thing semantic cannot yet do is identify that a tag has moved.
> Thus if function 'foo' is set with some property (read-only, new
> color, whatever) and it is moved to another location, it is considered
> a 'new' tag, and loses old properties.

If it identifies this as one tag deletion and one tag creation, it's
no big deal for my purposes.

> 3)
> 
> There are existing overlays for semantic, 1 per specific tag.  What
> they cover is language specific.  In C you cover a variable, function,
> whatever.  In a function, each argument is covered.
> 
> Scopes are not covered.

I thought as much. Does that mean local variables defined inside {...}
don't have overlays (until you call the parser on that region)?

> The Java parser is the most advanced one in semantic and it parses the
> entire buffer.  This is a goal I aspire to for C/C++ as well.  How
> exactly to deal with sub-scopes has not been studied.
>
> I've also considered adding overlays to the { ... } part of functions
> and marking it 'code.  Overlays there could help optimize incremental edits.

Right. For now, I'll assume scopes aren't covered and set up the
overlays I need myself. That way it'll work with any major mode. If
this changes in the future, I can always adapt things to use it. (I
might look at the java case separately later, and see if I can use
existing overlays there.)

> 4)
> 
> Every language implements local arguments very differently, and there
> is no real consistency.
> 
> However, you can get the arguments and local vars separately.  The
> local args are also sorted in terms of closeness to point.  (I don't
> recall which order.)  That might prove useful.

I see. Looking at `semantic-get-local-variables/arguments', it would
be easy enough to get what I want using the same method used by the
default implementations of those: find the bounds of the current scope
using `semantic-up-context' and call the parser on hat region. It just
means not looping up through all contexts. (In fact it'd be easy to
add an optional argument to the functions to allow this, but I don't
know enough about EIEIO to know if that would break things).

As for `semantic-analyze-possible-completions', I'll have to write an
alternative version whatever I do, if I'm going to change the way
completions are looked up. (Actually my package would just provide a
completely different completion selection mechanism, at least at
first, since it works a bit differently. The semantic one would still
be available. Integrating my completion user interface more fully into
semantic is something I'd prefer to leave for later.)

> It would be good to discuss 1 a bit more before delving into many
> details on other topics so I can better understand what you propose.

Absolutely. I've tried above to explain a bit better what I'm trying
to do.

> The existing analyzer was built to be tweakable in many ways that
> should prevent the need for a full-scale replacement.

As I said, that's the last thing I want to do :) I want to use
semantic precisely because it already implements a parser and
analyser. My package is closer to the user-interface side of things,
so I see it more as sitting on top of (next to?) semantic rather than
replacing any part of it.

Thanks a lot for the useful discussion,

Toby
-- 
PhD Student
Quantum Information Theory group
Max Planck Institute for Quantum Optics
Garching, Germany

email: to...@dr...
web: www.dr-qubit.org