Re: [CEDET-devel] tags internal representation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Eric,

[...]
> When we are done, I hope that the new data structure is relevant only
> insofar as it provides 2 features:
> 1) Proof to ourselves that we can safely change it
> 2) A starting point for finding the optimal data structure for tokens.
> 
> Of course, that doesn't mean we shouldn't try to make is as useful as
> possible now. ;)
> 
> To that end I captured some usage statistics:
[...]

Very interesting!

> As you can see, those functions that do something with the overlay
> are used more than any other part of the token.  I was guessing
> before, but now I am sure that we want to put the overlay in an easy
> to get to location.

That's a good reason!

> It has turned out that the top 3 fields are the ones in the currently
> proposed data structure, which is nice.
> 
> I had suggested using a vector before also.  Functions like
> semantic-token-overlay-cdr would then be quite difficult to implement.

OTOH, if we use a vector `semantic-token-overlay-cdr' would be no
more necessary.  As it seems it is not used a lot, it could be worth
doing the conversion by hand.

So we could use a tag internal representation like this:

[NAME TAG-CLASS ATTRIBUTES PROPERTIES OVERLAY]

In that case would it be still possible to directly use tags in
completion lists?

Or, if we decide to not use vectors:

(NAME TAG-CLASS ATTRIBUTES PROPERTIES OVERLAY)

`semantic-token-overlay-cdr' would be trivial to implement as it would
just become an equivalent of: (nthcdr 4 tag).

I did a quick benchmark on the two representations (using the nice
benchmark library now included in Emacs 21.3.50):

(setq mylist '(NAME TAG-CLASS ATTRIBUTES PROPERTIES OVERLAY)
      myvect [NAME TAG-CLASS ATTRIBUTES PROPERTIES OVERLAY])

(benchmark 1000000 '(aref myvect 0)) ;; access to tag name
 => "Elapsed time: 1.222000s (5.027000s in 16 GCs)"

(benchmark 1000000 '(aref myvect 4)) ;; access to tag overlay
 => "Elapsed time: 1.231000s (5.036000s in 16 GCs)"

(benchmark 1000000 '(car mylist)) ;; access to tag name
 => "Elapsed time: 0.921000s (5.088000s in 16 GCs)"

(benchmark 1000000 '(nth 4 mylist)) ;; access to tag overlay
 => "Elapsed time: 1.322000s (5.027000s in 16 GCs)"

Clearly, there isn't a big difference in performance!  Vectors are
better to access overlay (constant time to access any element), but
lists are far better to access the tag name.

There is no significant difference when accessing the TAG-CLASS:

(benchmark 1000000 '(aref myvect 1))
 => "Elapsed time: 1.232000s (5.078000s in 16 GCs)"

(benchmark 1000000 '(nth 1 mylist))
 => "Elapsed time: 1.251000s (5.088000s in 16 GCs)"

What is more significant, is tag creation time:

(benchmark 1000000
 '(vector "NAME" 'TAG-CLASS '(ATTRIBUTES) '(PROPERTIES) [0 0]))
 => "Elapsed time: 9.834000s (8.547000s in 22 GCs)"

(benchmark 1000000
 '(list "NAME" 'TAG-CLASS '(ATTRIBUTES) '(PROPERTIES) [0 0]))
 => "Elapsed time: 6.379000s (7.640000s in 24 GCs)"

I that case list creation is about 35% faster than vector creation!

In conclusion, we could use lists to represent tags ;-)

David