Thread: [Parseperl-discuss] Unique ID per PPI::Element
Brought to you by:
adamkennedy
From: Dan B. <mr....@gm...> - 2006-10-05 13:39:21
|
I'm currently going through the motions of going from a very development version of PPIx::Index to something useful in practice and I've hit a stage where I feel having a unique ID for each PPI::Element would be useful. My rationale behind this is that if someone has a PPI::Document they are iterating over and want to access the indexed information pertaining to a given node then they can just access that data using its unique id. This may very well be a X/Y Problem and I'm certainly open to suggestions but that's what my monkey brain has managed to scratch in the dirt. I haven't gotten around to implementing it yet but if I did it probably use some PPIx::XPath magic (cos I'm lazy like that) and look something like this: use Digest::MD5 'md5_hex'; sub PPI::Element::id { my $el = shift; return md5hex( PPIx::XPath->new($el)->xpath . $el->location->[0] ); } There are probably issues with that code, such as the fact that if you have recurring Elements on the same line you'd get the same ID (but that's fine for what I'm doing) and it probably needs the script path to be unique too, but I guess that's what you get for knocking out code off the cuff. So any thoughts, suggestions, criticisms on this ID idea or anything pertaining to indexing software would be very much appreciated. Thanks, Dan Brook aka broquaint |
From: Chris D. <ch...@cl...> - 2006-10-05 14:38:43
|
Hi Dan, I just looked at PPIx::Index. My first comment is a non-technical one. I recommend you change all mention of "script" to "file" or "program". Script is a very diminishing word that psychologically reduces the significance of the work. See: http://www.nntp.perl.org/group/perl.module-authors/4765 After that, I see this issue with your 0.0.1_1 code: you return hashes of variables, pkgs, etc., which precludes you using the same name twice in code. For example, consider this silly code: sub print_two_lines { my $fh = shift; if (my $line = <$fh>) { print $line; } if (my $line = <$fh>) { print $line; } } Note that $line is used twice lexically, but your hash-based API can only return the first of them. Furthermore, why not return the actual token instead of the line number? It would be more useful to return the following from lex_vars_in_subs: { print_two_lines => { '$line' => [ $element1, $element2, ], }, } From each element, you can simply $element->location()->[0] to get the line number. So, output code looks like: my $ppi_index = PPIx::Index->new( script => $fn ); my $lex = $ppi_index->lex_vars_in_subs; for my $sub_name (sort keys %{$lex}) { print "sub $sub_name\n"; for my $var_name (sort keys %{$lex->{$sub_name}}) { print " var $var_name ".join(",", map {$_->location()->[0]} @{$lex->{$sub_name}-> {$var_name}})."\n"; } } which yields: sub print_two_lines var $fh 2 var $line 3,6 For that matter, you could even have two subs with the same name per file. This can happen if you have multiple "package" declarations. Or, the name could be '' if it's an anonymous sub declaration. so probably you should go to an array of subs: { print_two_lines => [{ '$line' => [ $element1, $element2, ], }], } By returning the element itself instead of just the line number, I think you remove the need for any ID at all. Chris On Oct 5, 2006, at 8:39 AM, Dan Brook wrote: > I'm currently going through the motions of going from a very > development version of PPIx::Index to something useful in practice and > I've hit a stage where I feel having a unique ID for each PPI::Element > would be useful. My rationale behind this is that if someone has a > PPI::Document they are iterating over and want to access the indexed > information pertaining to a given node then they can just access that > data using its unique id. This may very well be a X/Y Problem and I'm > certainly open to suggestions but that's what my monkey brain has > managed to scratch in the dirt. > > I haven't gotten around to implementing it yet but if I did it > probably use some PPIx::XPath magic (cos I'm lazy like that) and look > something like this: > > use Digest::MD5 'md5_hex'; > sub PPI::Element::id { > my $el = shift; > return md5hex( PPIx::XPath->new($el)->xpath . $el->location->[0] ); > } > > There are probably issues with that code, such as the fact that if you > have recurring Elements on the same line you'd get the same ID (but > that's fine for what I'm doing) and it probably needs the script path > to be unique too, but I guess that's what you get for knocking out > code off the cuff. > > So any thoughts, suggestions, criticisms on this ID idea or anything > pertaining to indexing software would be very much appreciated. > > Thanks, > Dan Brook aka broquaint > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Parseperl-discuss mailing list > Par...@li... > https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Dan B. <mr....@gm...> - 2006-10-05 15:04:21
|
On 10/5/06, Chris Dolan <ch...@cl...> wrote: > I just looked at PPIx::Index. My first comment is a non-technical > one. I recommend you change all mention of "script" to "file" or > "program". Script is a very diminishing word that psychologically > reduces the significance of the work. See: > http://www.nntp.perl.org/group/perl.module-authors/4765 Yep, sounds good. > After that, I see this issue with your 0.0.1_1 code: you return > hashes of variables, pkgs, etc., which precludes you using the same > name twice in code. For example, consider this silly code: I'm using hashes because the indexer is only indexing where things are declared, not where they're used. However, if I was to index where things are used, which I may do at some point, I'd definitely take that route. > Note that $line is used twice lexically, but your hash-based API can > only return the first of them. Furthermore, why not return the > actual token instead of the line number? The line number is used instead of the token itself because only the line number is the only relevant information to the indexer and it also makes the indexed data language agnostic. > For that matter, you could even have two subs with the same name per > file. This can happen if you have multiple "package" declarations. > Or, the name could be '' if it's an anonymous sub declaration. so > probably you should go to an array of subs: All subs are indexed by their fully qualified name, so there's no worry with multiple subs and anonymous subs are just bound to whatever variable they're assigned to. > By returning the element itself instead of just the line number, I > think you remove the need for any ID at all. I perhaps should've elaborated on the fact that I need an ID for backend usage too (i.e a database) so that it can be consistently referred to both in code and in the database. Again, I don't know if this is the best solution, but it's what I've managed to come up with. Thanks for your thoughts! Dan |