Re[2]: [cedet-semantic] A few questions about the semantic API

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>>> "Toby 'qubit' Cubitt" <tob...@dr...> seems to think that:
>Hi,
>
>Thanks for the prompt and detailed reply!
>
>On Mon, Jan 23, 2006 at 12:46:13PM -0500, Eric M. Ludlam wrote:
>>   Your project ideal sounds pretty cool.
>>
>> 1)
>> 
>>   The best place to optimize for speed in semantic at the moment is in
>> 'semanticdb'.  The larger a project gets, the slower lookups become
>> because of the many scattered databases.  (1 per directory.)
>
>I see. There must also be some slow-down when there are a large number
>of completion candidates (at least, there is on my computer). Or maybe
>I'm hitting the database issue.

I haven't really timed it, but if you look in the analyzer and
completion engines, you will notice there are very few loops.  It
basically looks for all the "stuff" you need to provide a good
context, then calls semanticdb to find out what all those symbols
mean.

The search routine then attempts to predict which tables to look
into.  There are two types of searches, a brute search (search all the
tables.) and regular search which looks only in tables refereed to via
#includes or similar.

The regular search is controlled by a search throttle which explains
where to find tags, including in global system databases.

>>   Semamticdb uses tables and project databases via EIEIO which allows
>> full subclassing.  Thus, you can make a new database class (1 per
>> directory at the moment) and a new database table (1 per file, many
>> per database.) These implementations can store and search databases
>> via virtual methods.
>
>I hadn't though about wrapping my data structures to make them look
>like database tables. It might be the way to go, or it might not. One
>downside is that my data structures are designed for looking up
>completions[*], and will be slightly slower for just looking up a
>single tag (O(log n) instead of O(1), assuming the current table
>lookup is O(1)). Might not be a big deal though.

Existing basic searches through a single file are O(n), but does use
'assoc' when possible, which is pretty fast.

>([*] they're a hybrid between a ternary search tree and a hash table,
>which provides a decent speed/memory tradeoff).
>
>Another downside is that I imagine it would be more difficult to
>divide up the tags into different "tables" so that the table you look
>in only contains relevant completions (e.g. one table per local scope,
>one table of methods and properties per class, etc.). I could be wrong
>here since I haven't looked into the internals of semantic's tag
>tables.

I already have routines that create a list of tables relevant per
file.  If you had a program that compressed that list of tables into
one new table you knew how to search quickly, that could be easily
cached on a per-file basis.  It would also likely be simple, and
provide a good speed boost for very large projects.

>A final downside is that it'll be more work to implement, and I'd like
>to start simple and improve things later. (This is a hobby project, so
>I don't have infinite coding time :) Writing up my thesis will
>probably cut into a lot of that time in the near future too...)

Look into semanticdb-skel.el for a skeleton for creating a new system
level database.

Look at semanticdb-el.el for an working example that uses the built-in
Emacs symbol tables to look up symbols in a global way.

It is very simple, and the minimum needed for implementation is very
small.  You need the classes, a few basic maintenance things, and
implementations of:

semanticdb-find-tags-by-name-method
semanticdb-find-tags-by-name-regexp-method
semanticdb-find-tags-for-completion-method

with a range of other optional search methods you can provide.  The
others are not as important as the above.

Lastly, you need to link the database into the system.  The hard part
there is figuring out what kind of table you are going to have.

>>   I don't know if that is the target of your tool or not.  At the
>> moment it is even possible for some new global Db to coincide with the
>> existing imple, and search throttles can choose between them.
>
>I was thinking more of inserting an extra layer between completion
>lookup and the tag tables/databases, rather than implementing
>something at the database level, for the reasons above. This
>interfaces better with the way my package works at the moment.
>
>I'd use the existing lookup functions the first time you try to
>complete a string, but cache the results in my data structures (one
>per scope, class, etc., whatever's relevant for the completion being
>looked up), so that subsequent lookup of the same kind (same scope, or
>methods/properties from the same class, etc.) is much faster. I'll
>need the analyser to decide which cache to look in, and how to filter
>the results (e.g. by type).
>
>So long as it's possible to incrementally update these caches,
>subsequent completions will always be faster than looking things up
>directly in the tables or databases. Obviously, this would speed up
>searching across multiple databases for completions, since it would
>only be done once. You pay a memory cost, of course, but if that
>proves an issue I can worry later about flushing less frequently used
>caches, or only caching completions that have been used at least
>once. I already have code to dump the data structures to disc, so they
>can be made persistent to avoid recreating them each time you visit a
>file.
>
>But I'll definitely think about writing a db subclass as an
>alternative approach.

What you describe above is not incompatible with the database API.
See semanticdb-find-default-throttle.  If you set this to the list:

(omniscience)

and then in your implementation do what you describe above in an
omniscient database, you will be done.

Alternately, you could replace the throttling functions via overriding
the method semanticdb-find-translate-path and do something similar
there.

Hmm.  I can't get out of my system that you wouldn't override the
semanticdb structure.  There just isn't an obvious layer where you are
looking.

>The advantage of storing completions in these structures isn't only
>speed, but the fact that it allows emacs to learn how often the
>completions are used, and use that information to prioritise which
>ones it offers - very useful when there are many. All that's already
>in place.
>
>That was the original goal of my package: to do predictive completion
>for English text. Now that it works well for plain text, and also for
>LaTeX and HTML (using regexps to parse the buffer), and I had a few
>queries about using it for programming, I figured I'd try and get it
>to work with semantic (I don't fancy trying to write a parser in terms
>of regexps - in fact, I don't fancy trying to write a parser at all!
>Thanks to the great work in semantic I don't have to).

Since you already have the code, making a systemdb, sounds like a 15
minute project past whatever it takes to turn existing databases into
your magic structure.

  [ ... ]
>> 3)
>> 
>> There are existing overlays for semantic, 1 per specific tag.  What
>> they cover is language specific.  In C you cover a variable, function,
>> whatever.  In a function, each argument is covered.
>> 
>> Scopes are not covered.
>
>I thought as much. Does that mean local variables defined inside {...}
>don't have overlays (until you call the parser on that region)?

Those overlays are deleted after the tags are returned to keep things tidy.

  [ ... ]
>> 4)
>> 
>> Every language implements local arguments very differently, and there
>> is no real consistency.
>> 
>> However, you can get the arguments and local vars separately.  The
>> local args are also sorted in terms of closeness to point.  (I don't
>> recall which order.)  That might prove useful.
>
>I see. Looking at `semantic-get-local-variables/arguments', it would
>be easy enough to get what I want using the same method used by the
>default implementations of those: find the bounds of the current scope
>using `semantic-up-context' and call the parser on hat region. It just
>means not looping up through all contexts. (In fact it'd be easy to
>add an optional argument to the functions to allow this, but I don't
>know enough about EIEIO to know if that would break things).
>
>As for `semantic-analyze-possible-completions', I'll have to write an
>alternative version whatever I do, if I'm going to change the way
>completions are looked up. (Actually my package would just provide a
>completely different completion selection mechanism, at least at
>first, since it works a bit differently. The semantic one would still
>be available. Integrating my completion user interface more fully into
>semantic is something I'd prefer to leave for later.)

The hard part of the above is decoding the prefix.  In the case of a
field/method from an object, there may be at most 20 possible symbols
to choose from.

If there is no prefix to speak of, then you are after some symbol from
the global namespace.  In this case, it is just a long list of calls
to `semantic-find-tags-by-name-regexp'.  It's not even a lot semanticdb
calls.  Just searches through arg lists, local variables, scopes and
such.  The last thing it does is call
semantic-analyze-find-tags-by-prefix which is pretty short and should
explain itself.

I'm actually surprised at what I find in
semantic-analyze-possible-completions.  It should be calling
`semantic-find-tags-for-completion' instead.  I'll have to
investigate.  I might be able to speed it up that way.

Good Luck
Eric

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org