Re: [cedet-semantic] A few questions about the semantic API

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Jan 23, 2006 at 08:47:38PM -0500, Eric M. Ludlam wrote:
> >>> "Toby 'qubit' Cubitt" <tob...@dr...> seems to think that:
>
> >One downside is that my data structures are designed for
> >looking up completions, and will be slightly slower for just
> >looking up a single tag (O(log n) instead of O(1), assuming the
> >current table lookup is O(1)). Might not be a big deal though.
> 
> Existing basic searches through a single file are O(n), but does use
> 'assoc' when possible, which is pretty fast.

Well, O(log n) shouldn't be a problem then. Completion searches will
be fast, regular lookup will be reasonable, and regexp searches will
be much slower if I use my data structures as a tag database. But I
don't think regexp searches are used in time-critical
places. (Actually, as you mention below, it seems at the moment to be
used where completion lookup ought to be used.)

> >Another downside is that I imagine it would be more difficult to
> >divide up the tags into different "tables" so that the table you look
> >in only contains relevant completions (e.g. one table per local scope,
> >one table of methods and properties per class, etc.). I could be wrong
> >here since I haven't looked into the internals of semantic's tag
> >tables.
> 
> I already have routines that create a list of tables relevant per
> file.

Is that the throttle-related stuff?

> If you had a program that compressed that list of tables into
> one new table you knew how to search quickly, that could be easily
> cached on a per-file basis.  It would also likely be simple, and
> provide a good speed boost for very large projects.

That sounds eminently possible. I'll have to think about how much to
combine into a single table: everything would mean keeping it synced
with tags from many different files, keeping a separate table for each
file would be the same as the current system if I understand things
right, and the only benefit would be the faster completion lookup.

There's still the issue of keeping things synchronised with the tags
actually in the buffer, as found by the parser. Recreating everything
each time the buffer's re-parsed is likely to kill any benefit, since
the data structures are of course slower to create than a simple
list. I assume this will mean using
`semantic-after-partial/toplevel-cache-change-hook'.

> Look into semanticdb-skel.el for a skeleton for creating a new system
> level database.
> 
> Look at semanticdb-el.el for an working example that uses the built-in
> Emacs symbol tables to look up symbols in a global way.
> 
> It is very simple, and the minimum needed for implementation is very
> small.  You need the classes, a few basic maintenance things, and
> implementations of:
> 
> semanticdb-find-tags-by-name-method
> semanticdb-find-tags-by-name-regexp-method
> semanticdb-find-tags-for-completion-method
> 
> with a range of other optional search methods you can provide.  The
> others are not as important as the above.

You're right, once I understand semanticdb and eieio better, it
does look like less work than I imagined at first sight...

[snip] 
> What you describe above is not incompatible with the database API.
> See semanticdb-find-default-throttle.  If you set this to the list:
> 
> (omniscience)
> 
> and then in your implementation do what you describe above in an
> omniscient database, you will be done.

I think I understand better now. Thanks.

> Hmm.  I can't get out of my system that you wouldn't override the
> semanticdb structure.  There just isn't an obvious layer where you are
> looking.

The way my package works at the moment, it adds that layer itself
without needing it to exist in semantic. But hooking into semnaticdb
is a better way.

> Since you already have the code, making a systemdb, sounds like a 15
> minute project past whatever it takes to turn existing databases into
> your magic structure.

I think I had different things foremost in my mind, coming from
writing completion for plain text and markup. I was worrying about how
to associate different tables with different sub-regions of the
buffer, for instance associating a different tag table with each local
scope, or creating one tag table for each class and looking in the
appropriate one whenever completing a method name. As I understand it,
this isn't how semantic currently works. It creates the tag table on
the fly whenever you, say, look up local variables.

I still think making these tag tables persistent is worthwhile given
the way my package works. It allows it to learn how often local
variable names or different class methods are used and prioritise
completions based on that information. Of course, that doesn't
preclude using semantic databases here. But my package will have to
deal with associating them with buffer regions, and working out which
one to look in when finding completions, since semantic doesn't
provide this at the moment (apart from maybe in the java
parser). That's fine, since I already have code to do this, seeing as
this is the more important kind of intellisensing for marku languages
(think LaTeX math-mode regions).

I wasn't worrying too much about tags from other files yet. But of
course, for programming projects spread across many files this is
important. This is where the semanticdb stuff is currently used, and
hooking into that makes perfect sense (and you're right that it won't
be difficult).

> >As for `semantic-analyze-possible-completions', I'll have to write an
> >alternative version whatever I do, if I'm going to change the way
> >completions are looked up. (Actually my package would just provide a
> >completely different completion selection mechanism, at least at
> >first, since it works a bit differently. The semantic one would still
> >be available. Integrating my completion user interface more fully into
> >semantic is something I'd prefer to leave for later.)
> 
> The hard part of the above is decoding the prefix.  In the case of a
> field/method from an object, there may be at most 20 possible symbols
> to choose from.

And if I've understood things right I can easily find out if it's a
field/method being completed using the analyzer.

> If there is no prefix to speak of, then you are after some symbol from
> the global namespace.  In this case, it is just a long list of calls
> to `semantic-find-tags-by-name-regexp'.

Which should be `semantic-find-tags-for-completion'?

> I'm actually surprised at what I find in
> semantic-analyze-possible-completions.  It should be calling
> `semantic-find-tags-for-completion' instead.  I'll have to
> investigate.  I might be able to speed it up that way.

I probably should have mentioned the reason I'm obsessed with
completion speed isn't just for the sake of it. It's because my
package continuously looks up possible completions as you type, and
can even provisionally insert the most likely one (based on the usage
frequency it's learnt as it goes along) into the buffer. Completion is
"switched on" all the time. Even a few tenths of a second delay gets
in the way of typing, hence the fancy data structures.

Anyway, thanks for all the info. I need to go away and look at the
semanticdb code, and the semantic completion code, and try to
start integrating my package in with it.

Toby
-- 
PhD Student
Quantum Information Theory group
Max Planck Institute for Quantum Optics
Garching, Germany

email: to...@dr...
web: www.dr-qubit.org