parseperl-discuss Mailing List for Parse::Perl (Page 2)
Brought to you by:
adamkennedy
You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(10) |
Aug
(8) |
Sep
(1) |
Oct
|
Nov
(3) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(5) |
Aug
(2) |
Sep
(12) |
Oct
(26) |
Nov
|
Dec
|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
(1) |
Jul
(3) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(1) |
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Torsten S. <kaf...@gm...> - 2006-10-12 16:08:12
|
On Thu, 2006-10-12 at 12:53 +1000, Adam Kennedy wrote: > If you are just looking for the word, then the top part isn't necesary. > > Just this will be enough. > > $document->find_any( sub { > $_[1]->isa('PPI::Token::Word') > and > $_[1] eq 'Gtk2::Gdk::DisplayManager' > } ); > > I'm not entirely sure why you have the PPI::Statement phrase there, but > if you've been copying code from Perl::Critic that might be some sort of > an optimisation hack? Hmm, I don't know how it happened anymore, but for some reason my early testing seemed to suggest that find_any doesn't "recurse into" statements. Anyway, thanks to you, Dan, and Chris for the help. Works perfectly. -- Bye, -Torsten |
From: Dan B. <mr....@gm...> - 2006-10-12 09:24:49
|
On 10/11/06, Torsten Schoenfeld <kaf...@gm...> wrote: > Aloha, > > I recently started using PPI to begin work on a module that finds out > which version of Gtk2/gtk+ a given bunch of code requires. To do that, > I need to know if, for example, the bareword "Gtk2::Gdk::DisplayManager" > is used anywhere. Here's what I came up with: > > $document->find_any(sub { > $_[1]->isa ("PPI::Statement") and > $_[1]->find_any(sub { > $_[1]->isa ("PPI::Token::Word") and > $_[1] eq "Gtk2::Gdk::DisplayManager" > }) > }); > > Is this a good way to express the requirements? Is it correct? Is > there a better way? A simpler (if not better) way would be to use PPIx::XPath like so: use PPIx::XPath; my @toks = PPIx::XPath->new($document) ->match('//PPI::Token::Word[.="Gtk2::Gdk::DisplayManager"]'); And that should return you the appropriate nodes. HTH Dan Brook |
From: Adam K. <ad...@ph...> - 2006-10-12 02:56:00
|
Torsten Schoenfeld wrote: > Aloha, > > I recently started using PPI to begin work on a module that finds out > which version of Gtk2/gtk+ a given bunch of code requires. To do that, > I need to know if, for example, the bareword "Gtk2::Gdk::DisplayManager" > is used anywhere. Here's what I came up with: > > $document->find_any(sub { > $_[1]->isa ("PPI::Statement") and > $_[1]->find_any(sub { > $_[1]->isa ("PPI::Token::Word") and > $_[1] eq "Gtk2::Gdk::DisplayManager" > }) > }); > > Is this a good way to express the requirements? Is it correct? Is > there a better way? If you are just looking for the word, then the top part isn't necesary. Just this will be enough. $document->find_any( sub { $_[1]->isa('PPI::Token::Word') and $_[1] eq 'Gtk2::Gdk::DisplayManager' } ); I'm not entirely sure why you have the PPI::Statement phrase there, but if you've been copying code from Perl::Critic that might be some sort of an optimisation hack? Adam K |
From: Chris D. <ch...@cl...> - 2006-10-12 01:18:49
|
Hi Torsten, It's easier than that, I think: $document->find_any(sub { $_[1]->isa('PPI::Token::Word') and $_[1] eq 'Gtk2::Gdk::DisplayManager' }); Chris On Oct 11, 2006, at 5:34 PM, Torsten Schoenfeld wrote: > Aloha, > > I recently started using PPI to begin work on a module that finds out > which version of Gtk2/gtk+ a given bunch of code requires. To do > that, > I need to know if, for example, the bareword > "Gtk2::Gdk::DisplayManager" > is used anywhere. Here's what I came up with: > > $document->find_any(sub { > $_[1]->isa ("PPI::Statement") and > $_[1]->find_any(sub { > $_[1]->isa ("PPI::Token::Word") and > $_[1] eq "Gtk2::Gdk::DisplayManager" > }) > }); > > Is this a good way to express the requirements? Is it correct? Is > there a better way? > > -- > Keep up the great work! > -Torsten > > > ---------------------------------------------------------------------- > --- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Parseperl-discuss mailing list > Par...@li... > https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Torsten S. <kaf...@gm...> - 2006-10-11 22:35:00
|
Aloha, I recently started using PPI to begin work on a module that finds out which version of Gtk2/gtk+ a given bunch of code requires. To do that, I need to know if, for example, the bareword "Gtk2::Gdk::DisplayManager" is used anywhere. Here's what I came up with: $document->find_any(sub { $_[1]->isa ("PPI::Statement") and $_[1]->find_any(sub { $_[1]->isa ("PPI::Token::Word") and $_[1] eq "Gtk2::Gdk::DisplayManager" }) }); Is this a good way to express the requirements? Is it correct? Is there a better way? -- Keep up the great work! -Torsten |
From: Chris D. <ch...@ch...> - 2006-10-09 00:38:50
|
Adam has posted his plans for PPI v1.200 on his use.perl.org journal: http://use.perl.org/~Alias/journal/31246 Good reading! Chris -- Chris Dolan, Software Developer, http://www.chrisdolan.net/ Public key: http://www.chrisdolan.net/public.key vCard: http://www.chrisdolan.net/ChrisDolan.vcf |
From: Chris D. <ch...@cl...> - 2006-10-07 03:44:52
|
I think that the PPI version number should be part of the PPI::Document MD5 identifier. On my machine, I have a mixture of PPI SVN trunk and PPI v1.118 from CPAN. I test against both when I work on Perl::Critic, and use the PPI::Cache to speed things up. However, due to the recent addition of new PPI::Token::Number subclasses, I get errors when the cached version is from trunk and I'm running with the older CPAN code. For example: Cannot restore overloading on HASH(0x1a6f574) (package PPI::Token::Number::Float) (even after a "require PPI::Token::Number::Float;") at blib/lib/Storable.pm (autosplit into blib/lib/auto/Storable/_retrieve.al) line 331, at /Users/chris/perl/ lib/perl5/site_perl/PPI/Cache.pm line 238 If the $PPI::VERSION were mixed into the cache ID, then this would not happen (except to developers working against SVN). This would lead to wasted disk space (new copies in cache for each PPI version), but that's a minor inconvenience. Sound good? It will be very easy to implement. Alternatively, the cache could store the version number of the last PPI that accessed it and clear itself if the version numbers don't match. However, if any part of the cache were not writable this would cause odd failures which would not occur in the above implementation. Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Chris D. <ch...@cl...> - 2006-10-05 19:55:56
|
On Sep 25, 2006, at 1:10 PM, Chris Dolan wrote: > I discovered a problem with anonymous hashref parsing (a pair of > failing tests added to SVN as rev 1103). The attached fix passes > all PPI and Perl::Critic tests, but breaks the following common idiom: > > my %foo = map({$_ => 1} @foo); > > I think the lexer needs a lookahead to detect the lack of a > following comma to distinguish the following cases: > map({$_ => 1} @foo); > foo({$_ => 1}, @foo); > > That seems like a non-trivial change to the Lexer code, so I'm > going to set it aside for now. > > Chris I just added a failing test for the above cases. I'm finding the Lexer.pm code to be impenetrable, so I'm going to continue to set this issue aside. Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Dan B. <mr....@gm...> - 2006-10-05 15:04:21
|
On 10/5/06, Chris Dolan <ch...@cl...> wrote: > I just looked at PPIx::Index. My first comment is a non-technical > one. I recommend you change all mention of "script" to "file" or > "program". Script is a very diminishing word that psychologically > reduces the significance of the work. See: > http://www.nntp.perl.org/group/perl.module-authors/4765 Yep, sounds good. > After that, I see this issue with your 0.0.1_1 code: you return > hashes of variables, pkgs, etc., which precludes you using the same > name twice in code. For example, consider this silly code: I'm using hashes because the indexer is only indexing where things are declared, not where they're used. However, if I was to index where things are used, which I may do at some point, I'd definitely take that route. > Note that $line is used twice lexically, but your hash-based API can > only return the first of them. Furthermore, why not return the > actual token instead of the line number? The line number is used instead of the token itself because only the line number is the only relevant information to the indexer and it also makes the indexed data language agnostic. > For that matter, you could even have two subs with the same name per > file. This can happen if you have multiple "package" declarations. > Or, the name could be '' if it's an anonymous sub declaration. so > probably you should go to an array of subs: All subs are indexed by their fully qualified name, so there's no worry with multiple subs and anonymous subs are just bound to whatever variable they're assigned to. > By returning the element itself instead of just the line number, I > think you remove the need for any ID at all. I perhaps should've elaborated on the fact that I need an ID for backend usage too (i.e a database) so that it can be consistently referred to both in code and in the database. Again, I don't know if this is the best solution, but it's what I've managed to come up with. Thanks for your thoughts! Dan |
From: Chris D. <ch...@cl...> - 2006-10-05 14:38:43
|
Hi Dan, I just looked at PPIx::Index. My first comment is a non-technical one. I recommend you change all mention of "script" to "file" or "program". Script is a very diminishing word that psychologically reduces the significance of the work. See: http://www.nntp.perl.org/group/perl.module-authors/4765 After that, I see this issue with your 0.0.1_1 code: you return hashes of variables, pkgs, etc., which precludes you using the same name twice in code. For example, consider this silly code: sub print_two_lines { my $fh = shift; if (my $line = <$fh>) { print $line; } if (my $line = <$fh>) { print $line; } } Note that $line is used twice lexically, but your hash-based API can only return the first of them. Furthermore, why not return the actual token instead of the line number? It would be more useful to return the following from lex_vars_in_subs: { print_two_lines => { '$line' => [ $element1, $element2, ], }, } From each element, you can simply $element->location()->[0] to get the line number. So, output code looks like: my $ppi_index = PPIx::Index->new( script => $fn ); my $lex = $ppi_index->lex_vars_in_subs; for my $sub_name (sort keys %{$lex}) { print "sub $sub_name\n"; for my $var_name (sort keys %{$lex->{$sub_name}}) { print " var $var_name ".join(",", map {$_->location()->[0]} @{$lex->{$sub_name}-> {$var_name}})."\n"; } } which yields: sub print_two_lines var $fh 2 var $line 3,6 For that matter, you could even have two subs with the same name per file. This can happen if you have multiple "package" declarations. Or, the name could be '' if it's an anonymous sub declaration. so probably you should go to an array of subs: { print_two_lines => [{ '$line' => [ $element1, $element2, ], }], } By returning the element itself instead of just the line number, I think you remove the need for any ID at all. Chris On Oct 5, 2006, at 8:39 AM, Dan Brook wrote: > I'm currently going through the motions of going from a very > development version of PPIx::Index to something useful in practice and > I've hit a stage where I feel having a unique ID for each PPI::Element > would be useful. My rationale behind this is that if someone has a > PPI::Document they are iterating over and want to access the indexed > information pertaining to a given node then they can just access that > data using its unique id. This may very well be a X/Y Problem and I'm > certainly open to suggestions but that's what my monkey brain has > managed to scratch in the dirt. > > I haven't gotten around to implementing it yet but if I did it > probably use some PPIx::XPath magic (cos I'm lazy like that) and look > something like this: > > use Digest::MD5 'md5_hex'; > sub PPI::Element::id { > my $el = shift; > return md5hex( PPIx::XPath->new($el)->xpath . $el->location->[0] ); > } > > There are probably issues with that code, such as the fact that if you > have recurring Elements on the same line you'd get the same ID (but > that's fine for what I'm doing) and it probably needs the script path > to be unique too, but I guess that's what you get for knocking out > code off the cuff. > > So any thoughts, suggestions, criticisms on this ID idea or anything > pertaining to indexing software would be very much appreciated. > > Thanks, > Dan Brook aka broquaint > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Parseperl-discuss mailing list > Par...@li... > https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Dan B. <mr....@gm...> - 2006-10-05 13:39:21
|
I'm currently going through the motions of going from a very development version of PPIx::Index to something useful in practice and I've hit a stage where I feel having a unique ID for each PPI::Element would be useful. My rationale behind this is that if someone has a PPI::Document they are iterating over and want to access the indexed information pertaining to a given node then they can just access that data using its unique id. This may very well be a X/Y Problem and I'm certainly open to suggestions but that's what my monkey brain has managed to scratch in the dirt. I haven't gotten around to implementing it yet but if I did it probably use some PPIx::XPath magic (cos I'm lazy like that) and look something like this: use Digest::MD5 'md5_hex'; sub PPI::Element::id { my $el = shift; return md5hex( PPIx::XPath->new($el)->xpath . $el->location->[0] ); } There are probably issues with that code, such as the fact that if you have recurring Elements on the same line you'd get the same ID (but that's fine for what I'm doing) and it probably needs the script path to be unique too, but I guess that's what you get for knocking out code off the cuff. So any thoughts, suggestions, criticisms on this ID idea or anything pertaining to indexing software would be very much appreciated. Thanks, Dan Brook aka broquaint |
From: Chris D. <ch...@ch...> - 2006-10-05 07:12:27
|
On Oct 5, 2006, at 1:14 AM, Adam Kennedy wrote: > So I guess to start with, we need to take the input string, and detect > > 1. All Unix > 2. All Win32 > 3. All Mac > 4. Mixed Unix-dominant > 5. Mixed Win32-dominant > 6. Mixed Mac-dominant > > :/ > > So... $doc->{newlines} = 'unix|win32|max' and $doc-> > {newlines_mixed} = 1|0 > > What do you think? Sounds good. Instead of "unix/win/mac" I'd prefer "NL/CRNL/CR" as the identifiers to be platform agnostic. Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 06:16:10
|
Chris Dolan wrote: > On Oct 5, 2006, at 12:40 AM, Adam Kennedy wrote: > >>>> That leaves us only with the case of mixed newlines. Personally, >>>> outside of binary files I am not away of ANY cases in which mixed >>>> newlines in a text file is allowed, even in __DATA__ segments. >>> Well, certainly my goal is to get rid of the mixed newlines! That's >>> why I was writing a Perl::Critic policy against that. :-) >> >> Well, you know it's possible to detect mixed newlines from the raw >> source right? >> >> Does critic have access to the original file/string? > > In theory almost always yes, in practice no. We accept PPI::Document > instances, so that's one case where I can't ever get the raw source. We > also accept STDIN, so I'd have to explicitly save the source. In the > most common case, we build a PPI::Document::File. I *could* of course > use the filename() to get back to source. Oh right, I forgot about the STDIN stuff... so yeah, practically speaking you don't have access. Suck. >> Emacs et al doing binary items inside a text file is one example of >> what I mean by handling it wrong. > > Hmm... Maybe I've just been using Emacs so long that its behavior feels > intuitive to me? :-) Probably :) I mean, it's a newline, and Perl doesn't have ANY binary (ignoring unicode) content. The fact it isn't a newline for the current platform (at least with respect for Perl) should be irrelevant. > I think the correct behavior on insert is to convert to document newline > by default, and possibly offer a way to preserve the mixed newlines if > the generator requests it. The mechanism for that "request" can be a > workaround (add_before() then set_newline()), or a flag to add_before(), > or a sticky_newline flag on the node to be inserted. I'm happy to go with "convert to the document default automatically" and ignore any customization capability until someone actually asks for it :) >> I don't think anyone ever really WANTs to make mixed newlines. As I >> said, I don't know of any legitimate real-world cases of this. > > ... so probably the generator case is just to adopt the newline of the > surrounding document. > > There are plenty of real world cases of mixed newlines, just none of > them intentional! Right :) So I guess to start with, we need to take the input string, and detect 1. All Unix 2. All Win32 3. All Mac 4. Mixed Unix-dominant 5. Mixed Win32-dominant 6. Mixed Mac-dominant :/ So... $doc->{newlines} = 'unix|win32|max' and $doc->{newlines_mixed} = 1|0 What do you think? Adam K Adam K |
From: Chris D. <ch...@ch...> - 2006-10-05 06:06:09
|
On Oct 5, 2006, at 12:40 AM, Adam Kennedy wrote: >>> That leaves us only with the case of mixed newlines. Personally, >>> outside of binary files I am not away of ANY cases in which >>> mixed newlines in a text file is allowed, even in __DATA__ >>> segments. >> Well, certainly my goal is to get rid of the mixed newlines! >> That's why I was writing a Perl::Critic policy against that. :-) > > Well, you know it's possible to detect mixed newlines from the raw > source right? > > Does critic have access to the original file/string? In theory almost always yes, in practice no. We accept PPI::Document instances, so that's one case where I can't ever get the raw source. We also accept STDIN, so I'd have to explicitly save the source. In the most common case, we build a PPI::Document::File. I *could* of course use the filename() to get back to source. The main reason I can't use raw source is that we have pseudo-pragmas that turn Perl::Critic on/off over lexical scopes. In the bizarre case of someone who wanted to have mixed newlines and used a "## no critic" to make Perl::Critic ignore it, I couldn't catch that. But maybe it's a lost cause at that point? Nonetheless, I find this discussion interesting beyond Perl::Critic's immediate problem. >>> In THAT case, perhaps we either localise, or we flatten to the >>> first newline in the file. >>> >>> I'd be happy to implement that as a first step towards full >>> native mixed newlines, as the functionality seems fairly >>> containable. >>> >>> It also matches what some of the better editors do... localise >>> internally, but remember and save out as the same input type. >> Not Emacs. It picks one newline to work on and treats the others >> like binary. So, if you're in \n mode and there is a single \r\n >> in the file, you see a "^M" character at the end of that line. > > Well, I mean for the non-mixed case, they deal with it internally. > > Emacs et al doing binary items inside a text file is one example of > what I mean by handling it wrong. Hmm... Maybe I've just been using Emacs so long that its behavior feels intuitive to me? :-) >>> But I'm honestly not aware of ANYTHING that handles mixed >>> newlines properly. I've done tons of unix/win32/mac cross-over >>> work and I've seen just about every screwed up case there is. >>> >>> Even Dreamweaver, which inspired PPI in the first place, doesn't >>> handle fucked up broken newlines. >>> >>> So we'd have to invent the solution. >> You must be a step ahead of me or something. Doesn't "handle" >> simply mean serialization and de-serialization for PPI? Maybe >> I've spent too much time just in the Perl::Critic case? > > There's two levels here. > > One is the parser/serializer, the second is the API layer. If we > provide add_before, for example, and it's mixed, what do we do? > > Say it's mixed unix/mac running on Win32, do we insert with the > Win32 newlines and now have 3-platform mix? > > We need to define the "correct" behaviour for working with mixed > newlines... and I don't know that anyone else has done this yet. I think the correct behavior on insert is to convert to document newline by default, and possibly offer a way to preserve the mixed newlines if the generator requests it. The mechanism for that "request" can be a workaround (add_before() then set_newline()), or a flag to add_before(), or a sticky_newline flag on the node to be inserted. But as you say... > I don't think anyone ever really WANTs to make mixed newlines. As I > said, I don't know of any legitimate real-world cases of this. ... so probably the generator case is just to adopt the newline of the surrounding document. There are plenty of real world cases of mixed newlines, just none of them intentional! Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 05:41:54
|
Chris Dolan wrote: > On Oct 4, 2006, at 11:43 PM, Adam Kennedy wrote: > >> Unfortunately, I don't only work with real documents. > > Sorry, I was unclear. By "real documents" I meant anything that the > Tokenizer can emit. I do not believe it is possible for the > tokenizer to emit a PPI::Token::HereDoc where the _terminator_line is > non-null where the line for the token lacks a newline. The only way > that's possible is via generated content. The latter is what I meant > by non-real. Hmm... possibly if the terminator is the last line in the program. PPI::Document->new( \"<<FOO\ninside\nFOO" ); >> The issue wasn't so much making it easier on the programmer, as >> making it sane. If I didn't localise the newlines, it would become >> a major gotcha, and an enormous source of bugs. > > Hmm, enormous? You are obviously much more familiar with the gotchas > than I am, but my patch wasn't that hard to write and works well with > the existing test suite. Perhaps my confidence is unwarranted? Lets imagine that we do native newlines. What is the first thing that people are going to do when they do minor generation. Mostly likely, they'll end up using native newlines. What I don't want to happen is for everyone to make this mistake once (the gotcha) and then have to go and do extra things just to get back to normal behaviour. >> I see a couple of solutions. >> >> Firstly, I really want to keep things localised internally. >> >> So pre-scanning the document text (which we do anyway for unicode >> checks, or at least we used to before the latin-1 improvements) to >> pick up 100% unix/win32/mac, storing that newline type in a top >> level document accessor, and then writing back out to the same >> type, is probably ok. > > We'd have to change the code in add_element, remove, and replace to > correct the newlines on entry for new tokens. That calls for a > set_newline method on PPI::Element and PPI::Node (and > PPI::Token::HereDoc). Hmm... or we just do it automatically. When you add, scan upwards to the document, find the right thing, and ... yeah I get what you are saying. >> That leaves us only with the case of mixed newlines. Personally, >> outside of binary files I am not away of ANY cases in which mixed >> newlines in a text file is allowed, even in __DATA__ segments. > > Well, certainly my goal is to get rid of the mixed newlines! That's > why I was writing a Perl::Critic policy against that. :-) Well, you know it's possible to detect mixed newlines from the raw source right? Does critic have access to the original file/string? >> In THAT case, perhaps we either localise, or we flatten to the >> first newline in the file. >> >> I'd be happy to implement that as a first step towards full native >> mixed newlines, as the functionality seems fairly containable. >> >> It also matches what some of the better editors do... localise >> internally, but remember and save out as the same input type. > > Not Emacs. It picks one newline to work on and treats the others > like binary. So, if you're in \n mode and there is a single \r\n in > the file, you see a "^M" character at the end of that line. Well, I mean for the non-mixed case, they deal with it internally. Emacs et al doing binary items inside a text file is one example of what I mean by handling it wrong. >> But I'm honestly not aware of ANYTHING that handles mixed newlines >> properly. I've done tons of unix/win32/mac cross-over work and I've >> seen just about every screwed up case there is. >> >> Even Dreamweaver, which inspired PPI in the first place, doesn't >> handle fucked up broken newlines. >> >> So we'd have to invent the solution. > > You must be a step ahead of me or something. Doesn't "handle" simply > mean serialization and de-serialization for PPI? Maybe I've spent > too much time just in the Perl::Critic case? There's two levels here. One is the parser/serializer, the second is the API layer. If we provide add_before, for example, and it's mixed, what do we do? Say it's mixed unix/mac running on Win32, do we insert with the Win32 newlines and now have 3-platform mix? We need to define the "correct" behaviour for working with mixed newlines... and I don't know that anyone else has done this yet. >> And I'm still not (yet) convinced that native mixed newlines is the >> answer... if only because how the hell do we guarentee round-trip >> safety for them? If we do it, it needs to be 100%. > > I feel like I must be missing some crucial point. For the read-only > case, isn't round-trip safety just ensuring that we spit out exactly > what the Tokenizer took in? With the exception of the HereDoc stuff > already mentioned, we've already achieved that with my patch, I think. > > That leaves just the generated code case to worry about. In that > case, we decide on a dominant line ending, like Emacs does, and > ensure that all added tokens inherit that line ending. If the > generator *wants* to make mixed newlines, that's the only really hard > case, and that can be worked around with set_newline. I don't think anyone ever really WANTs to make mixed newlines. As I said, I don't know of any legitimate real-world cases of this. Adam K |
From: Chris D. <ch...@ch...> - 2006-10-05 05:24:55
|
On Oct 4, 2006, at 11:43 PM, Adam Kennedy wrote: > Unfortunately, I don't only work with real documents. Sorry, I was unclear. By "real documents" I meant anything that the Tokenizer can emit. I do not believe it is possible for the tokenizer to emit a PPI::Token::HereDoc where the _terminator_line is non-null where the line for the token lacks a newline. The only way that's possible is via generated content. The latter is what I meant by non-real. > That's why exhaustive.t is in there, so throw line noise at you and > kick out in the ass when you get sloppy :) > > And we only run it in light mode, turned up to full it takes 6+ > hours to run. At one point it was throwing up error cases that only > triggered every 1.5+/-1 hours. Egads! >> * We could have the \n localization on by default and have a >> Tokenizer flag that disables it. That way, Perl::Critic could >> dig down to the real newlines, but other downstream modules could >> remain care-free. Perhaps that flag would happen in PPI::Document? > > It would be there almost certainly. > > The issue wasn't so much making it easier on the programmer, as > making it sane. If I didn't localise the newlines, it would become > a major gotcha, and an enormous source of bugs. Hmm, enormous? You are obviously much more familiar with the gotchas than I am, but my patch wasn't that hard to write and works well with the existing test suite. Perhaps my confidence is unwarranted? > I see a couple of solutions. > > Firstly, I really want to keep things localised internally. > > So pre-scanning the document text (which we do anyway for unicode > checks, or at least we used to before the latin-1 improvements) to > pick up 100% unix/win32/mac, storing that newline type in a top > level document accessor, and then writing back out to the same > type, is probably ok. We'd have to change the code in add_element, remove, and replace to correct the newlines on entry for new tokens. That calls for a set_newline method on PPI::Element and PPI::Node (and PPI::Token::HereDoc). > That leaves us only with the case of mixed newlines. Personally, > outside of binary files I am not away of ANY cases in which mixed > newlines in a text file is allowed, even in __DATA__ segments. Well, certainly my goal is to get rid of the mixed newlines! That's why I was writing a Perl::Critic policy against that. :-) > In THAT case, perhaps we either localise, or we flatten to the > first newline in the file. > > I'd be happy to implement that as a first step towards full native > mixed newlines, as the functionality seems fairly containable. > > It also matches what some of the better editors do... localise > internally, but remember and save out as the same input type. Not Emacs. It picks one newline to work on and treats the others like binary. So, if you're in \n mode and there is a single \r\n in the file, you see a "^M" character at the end of that line. > But I'm honestly not aware of ANYTHING that handles mixed newlines > properly. I've done tons of unix/win32/mac cross-over work and I've > seen just about every screwed up case there is. > > Even Dreamweaver, which inspired PPI in the first place, doesn't > handle fucked up broken newlines. > > So we'd have to invent the solution. You must be a step ahead of me or something. Doesn't "handle" simply mean serialization and de-serialization for PPI? Maybe I've spent too much time just in the Perl::Critic case? > And I'm still not (yet) convinced that native mixed newlines is the > answer... if only because how the hell do we guarentee round-trip > safety for them? If we do it, it needs to be 100%. I feel like I must be missing some crucial point. For the read-only case, isn't round-trip safety just ensuring that we spit out exactly what the Tokenizer took in? With the exception of the HereDoc stuff already mentioned, we've already achieved that with my patch, I think. That leaves just the generated code case to worry about. In that case, we decide on a dominant line ending, like Emacs does, and ensure that all added tokens inherit that line ending. If the generator *wants* to make mixed newlines, that's the only really hard case, and that can be worked around with set_newline. Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 04:45:04
|
Chris Dolan wrote: > Adam, > > [this email got longer than expected... Skip to the last bullet for > the most interesting action item, IMO] > > I understand your nervousness. I hadn't thought of the downstream \n > effect -- I thought it was just to keep the PPI code simpler. As for > the regexp soup, I had thought of the same thing, but decided to > email you before doing additional tweaks in case you rejected the > concept. > > I do think this change is important, however. It is the one place > where PPI is lossy. As is, I have to worry about working on Windows > perl code (with \r\n) on my Mac (where \n is the default), which I > did not realize until today. While the \n localization is easier on > the downstream *programmer*, it's harder on the downstream *user*, > because the output of serialize() cannot be compared reliably to the > original source. > > In particular, the \n localization obscures the case of inconsistent > newlines in the source code. Or, if I had code that HAD to have \r\n > in the __DATA__ section (not sure why...) then PPI would break it on > my Mac. It just feels wrong, I guess. > > A few side notes: > > * Perl::Critic only relies on the localized \n in one place, so > that would be easy to fix. > > * The look ahead to the first line ending was only to be used in > HereDoc repair in PPI::Document::serialize(). Personally, I think > that attempt to repair is dubious anyway -- real documents should > never have that problem I think. Unfortunately, I don't only work with real documents. That's why exhaustive.t is in there, so throw line noise at you and kick out in the ass when you get sloppy :) And we only run it in light mode, turned up to full it takes 6+ hours to run. At one point it was throwing up error cases that only triggered every 1.5+/-1 hours. So, perhaps we could skip the > line_ending() function and chop code from serialize(). > > * We could have the \n localization on by default and have a > Tokenizer flag that disables it. That way, Perl::Critic could dig > down to the real newlines, but other downstream modules could remain > care-free. Perhaps that flag would happen in PPI::Document? It would be there almost certainly. The issue wasn't so much making it easier on the programmer, as making it sane. If I didn't localise the newlines, it would become a major gotcha, and an enormous source of bugs. I see a couple of solutions. Firstly, I really want to keep things localised internally. So pre-scanning the document text (which we do anyway for unicode checks, or at least we used to before the latin-1 improvements) to pick up 100% unix/win32/mac, storing that newline type in a top level document accessor, and then writing back out to the same type, is probably ok. That leaves us only with the case of mixed newlines. Personally, outside of binary files I am not away of ANY cases in which mixed newlines in a text file is allowed, even in __DATA__ segments. In THAT case, perhaps we either localise, or we flatten to the first newline in the file. I'd be happy to implement that as a first step towards full native mixed newlines, as the functionality seems fairly containable. It also matches what some of the better editors do... localise internally, but remember and save out as the same input type. But I'm honestly not aware of ANYTHING that handles mixed newlines properly. I've done tons of unix/win32/mac cross-over work and I've seen just about every screwed up case there is. Even Dreamweaver, which inspired PPI in the first place, doesn't handle fucked up broken newlines. So we'd have to invent the solution. And I'm still not (yet) convinced that native mixed newlines is the answer... if only because how the hell do we guarentee round-trip safety for them? If we do it, it needs to be 100%. Adam K > Chris > > > On Oct 4, 2006, at 8:14 PM, Adam Kennedy wrote: > >> I'm quite nervous about this one, if only because of the problem >> that nobody downstream can use \n either (which is partly why I >> localised the newlines in the first place). >> >> The look ahead to first end-line isn't an option, it fails in too >> many cases. >> >> I might be able to be convinced though, but we need find a way to >> mitigate the regex soup effect. >> >> Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL >> or something) that might make life a little simpler. >> >> Adam K >> >> Chris Dolan wrote: >>> Here's another slightly controversial patch that I don't feel >>> comfortable just committing without a nod from Adam: >>> This patch changes PPI to preserve line endings instead of >>> localizing them. The code passes all PPI and Perl::Critic tests. >>> I implemented this change to enable a new Perl::Critic policy, >>> CodeLayout::RequireConsistentNewlines, which is intended to help >>> with code signing problems. >>> There are still a few un-fixed methods that are not exercised by >>> the test suite, like PPI::Token::Pod::merge(). These need to be >>> fixed before this patch could be committed. Additionally, the >>> PPI.pm docs need to be updated to reflect this change. >>> In the process, I added two new API methods: >>> PPI::Node::line_ending() and PPI::Token::HereDoc::Terminator_line(). >>> This patch will collide a tiny bit with my "$_" patch, but it >>> should be easy to repair (I'm happy to do the legwork on that). >>> *** >>> This implementation simply changes all "\n" checks to "\015{1,2} >>> \012|\012|\015" checks. An alternative implementation would be >>> for the tokenizer to look ahead to the first line ending, store >>> that value, and use it throughout tokenization. That >>> implementation would be more like what text editors do, but less >>> like Perl (I think). >>> Chris >>> -- >>> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >>> Public key: http://www.chrisdolan.net/public.key >>> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >>> --------------------------------------------------------------------- >>> --- >>> --------------------------------------------------------------------- >>> ---- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share your >>> opinions on IT & business topics through brief surveys -- and earn >>> cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> --------------------------------------------------------------------- >>> --- >>> _______________________________________________ >>> Parseperl-discuss mailing list >>> Par...@li... >>> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > |
From: Chris D. <ch...@ch...> - 2006-10-05 04:30:12
|
Yes, I understand the point of generated content. I thought about that. It occurred to me that the tokenizer could watch all (well, most) of the line endings and store the newline that has the plurality as the document newline. Then, Node.pm (or Normal.pm?) could have a set_newline method that traverses the tokens and changes the newline. Generated code could use that value too. As for the unit tests, I checked and all the files in PPI, and none have \r anywhere except inc/Test/ClassAPI.pm. I've written a bunch of mixed "\r\n","\n" regression tests for Perl::Critic that could be repurposed for PPI. Chris On Oct 4, 2006, at 10:26 PM, Adam Kennedy wrote: > Also, remembering back to my original decision, there were also > some other factors. > > For example, parsing is one thing, but what happens when you start > generating content? What newline do you use then? > > We have to remember the newline scheme (if clean) or somehow make a > decision on what to use if messy. > > We can't actually insert using \n any more, we need to use some per- > document value... $doc->newline or something. > > It makes a lot of things messier than they should be. > > A lot of that mess isn't covered in the unit tests, because I > solved the whole problem by localising newlines, so the tests > didn't matter. > > Adam K > > Adam Kennedy wrote: >> I'm quite nervous about this one, if only because of the problem >> that nobody downstream can use \n either (which is partly why I >> localised the newlines in the first place). >> The look ahead to first end-line isn't an option, it fails in too >> many cases. >> I might be able to be convinced though, but we need find a way to >> mitigate the regex soup effect. >> Maybe if we provided \015{1,2}\012|\012|\015 as an export (as >> $XPNL or something) that might make life a little simpler. >> Adam K >> Chris Dolan wrote: >>> Here's another slightly controversial patch that I don't feel >>> comfortable just committing without a nod from Adam: >>> >>> This patch changes PPI to preserve line endings instead of >>> localizing them. The code passes all PPI and Perl::Critic >>> tests. I implemented this change to enable a new Perl::Critic >>> policy, CodeLayout::RequireConsistentNewlines, which is intended >>> to help with code signing problems. >>> >>> There are still a few un-fixed methods that are not exercised by >>> the test suite, like PPI::Token::Pod::merge(). These need to be >>> fixed before this patch could be committed. Additionally, the >>> PPI.pm docs need to be updated to reflect this change. >>> >>> In the process, I added two new API methods: >>> PPI::Node::line_ending() and PPI::Token::HereDoc::Terminator_line(). >>> >>> This patch will collide a tiny bit with my "$_" patch, but it >>> should be easy to repair (I'm happy to do the legwork on that). >>> >>> *** >>> >>> This implementation simply changes all "\n" checks to "\015{1,2} >>> \012|\012|\015" checks. An alternative implementation would be >>> for the tokenizer to look ahead to the first line ending, store >>> that value, and use it throughout tokenization. That >>> implementation would be more like what text editors do, but less >>> like Perl (I think). >>> >>> Chris >>> >>> -- >>> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >>> Public key: http://www.chrisdolan.net/public.key >>> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> ---- >>> >>> -------------------------------------------------------------------- >>> ----- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share your >>> opinions on IT & business topics through brief surveys -- and >>> earn cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> -------------------------------------------------------------------- >>> ---- >>> >>> _______________________________________________ >>> Parseperl-discuss mailing list >>> Par...@li... >>> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss >> --------------------------------------------------------------------- >> ---- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys -- and earn >> cash >> http://www.techsay.com/default.php? >> page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Parseperl-discuss mailing list >> Par...@li... >> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Chris D. <ch...@ch...> - 2006-10-05 04:16:50
|
Adam, [this email got longer than expected... Skip to the last bullet for the most interesting action item, IMO] I understand your nervousness. I hadn't thought of the downstream \n effect -- I thought it was just to keep the PPI code simpler. As for the regexp soup, I had thought of the same thing, but decided to email you before doing additional tweaks in case you rejected the concept. I do think this change is important, however. It is the one place where PPI is lossy. As is, I have to worry about working on Windows perl code (with \r\n) on my Mac (where \n is the default), which I did not realize until today. While the \n localization is easier on the downstream *programmer*, it's harder on the downstream *user*, because the output of serialize() cannot be compared reliably to the original source. In particular, the \n localization obscures the case of inconsistent newlines in the source code. Or, if I had code that HAD to have \r\n in the __DATA__ section (not sure why...) then PPI would break it on my Mac. It just feels wrong, I guess. A few side notes: * Perl::Critic only relies on the localized \n in one place, so that would be easy to fix. * The look ahead to the first line ending was only to be used in HereDoc repair in PPI::Document::serialize(). Personally, I think that attempt to repair is dubious anyway -- real documents should never have that problem I think. So, perhaps we could skip the line_ending() function and chop code from serialize(). * We could have the \n localization on by default and have a Tokenizer flag that disables it. That way, Perl::Critic could dig down to the real newlines, but other downstream modules could remain care-free. Perhaps that flag would happen in PPI::Document? Chris On Oct 4, 2006, at 8:14 PM, Adam Kennedy wrote: > I'm quite nervous about this one, if only because of the problem > that nobody downstream can use \n either (which is partly why I > localised the newlines in the first place). > > The look ahead to first end-line isn't an option, it fails in too > many cases. > > I might be able to be convinced though, but we need find a way to > mitigate the regex soup effect. > > Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL > or something) that might make life a little simpler. > > Adam K > > Chris Dolan wrote: >> Here's another slightly controversial patch that I don't feel >> comfortable just committing without a nod from Adam: >> This patch changes PPI to preserve line endings instead of >> localizing them. The code passes all PPI and Perl::Critic tests. >> I implemented this change to enable a new Perl::Critic policy, >> CodeLayout::RequireConsistentNewlines, which is intended to help >> with code signing problems. >> There are still a few un-fixed methods that are not exercised by >> the test suite, like PPI::Token::Pod::merge(). These need to be >> fixed before this patch could be committed. Additionally, the >> PPI.pm docs need to be updated to reflect this change. >> In the process, I added two new API methods: >> PPI::Node::line_ending() and PPI::Token::HereDoc::Terminator_line(). >> This patch will collide a tiny bit with my "$_" patch, but it >> should be easy to repair (I'm happy to do the legwork on that). >> *** >> This implementation simply changes all "\n" checks to "\015{1,2} >> \012|\012|\015" checks. An alternative implementation would be >> for the tokenizer to look ahead to the first line ending, store >> that value, and use it throughout tokenization. That >> implementation would be more like what text editors do, but less >> like Perl (I think). >> Chris >> -- >> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >> Public key: http://www.chrisdolan.net/public.key >> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >> --------------------------------------------------------------------- >> --- >> --------------------------------------------------------------------- >> ---- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys -- and earn >> cash >> http://www.techsay.com/default.php? >> page=join.php&p=sourceforge&CID=DEVDEV >> --------------------------------------------------------------------- >> --- >> _______________________________________________ >> Parseperl-discuss mailing list >> Par...@li... >> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 03:28:39
|
Also, remembering back to my original decision, there were also some other factors. For example, parsing is one thing, but what happens when you start generating content? What newline do you use then? We have to remember the newline scheme (if clean) or somehow make a decision on what to use if messy. We can't actually insert using \n any more, we need to use some per-document value... $doc->newline or something. It makes a lot of things messier than they should be. A lot of that mess isn't covered in the unit tests, because I solved the whole problem by localising newlines, so the tests didn't matter. Adam K Adam Kennedy wrote: > I'm quite nervous about this one, if only because of the problem that > nobody downstream can use \n either (which is partly why I localised the > newlines in the first place). > > The look ahead to first end-line isn't an option, it fails in too many > cases. > > I might be able to be convinced though, but we need find a way to > mitigate the regex soup effect. > > Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL or > something) that might make life a little simpler. > > Adam K > > Chris Dolan wrote: >> Here's another slightly controversial patch that I don't feel >> comfortable just committing without a nod from Adam: >> >> This patch changes PPI to preserve line endings instead of localizing >> them. The code passes all PPI and Perl::Critic tests. I implemented >> this change to enable a new Perl::Critic policy, >> CodeLayout::RequireConsistentNewlines, which is intended to help with >> code signing problems. >> >> There are still a few un-fixed methods that are not exercised by the >> test suite, like PPI::Token::Pod::merge(). These need to be fixed >> before this patch could be committed. Additionally, the PPI.pm docs >> need to be updated to reflect this change. >> >> In the process, I added two new API methods: PPI::Node::line_ending() >> and PPI::Token::HereDoc::Terminator_line(). >> >> This patch will collide a tiny bit with my "$_" patch, but it should be >> easy to repair (I'm happy to do the legwork on that). >> >> *** >> >> This implementation simply changes all "\n" checks to >> "\015{1,2}\012|\012|\015" checks. An alternative implementation would >> be for the tokenizer to look ahead to the first line ending, store that >> value, and use it throughout tokenization. That implementation would be >> more like what text editors do, but less like Perl (I think). >> >> Chris >> >> -- >> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >> Public key: http://www.chrisdolan.net/public.key >> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >> >> >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share your >> opinions on IT & business topics through brief surveys -- and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Parseperl-discuss mailing list >> Par...@li... >> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Parseperl-discuss mailing list > Par...@li... > https://lists.sourceforge.net/lists/listinfo/parseperl-discuss |
From: Adam K. <ad...@ph...> - 2006-10-05 01:16:02
|
I'm quite nervous about this one, if only because of the problem that nobody downstream can use \n either (which is partly why I localised the newlines in the first place). The look ahead to first end-line isn't an option, it fails in too many cases. I might be able to be convinced though, but we need find a way to mitigate the regex soup effect. Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL or something) that might make life a little simpler. Adam K Chris Dolan wrote: > Here's another slightly controversial patch that I don't feel > comfortable just committing without a nod from Adam: > > This patch changes PPI to preserve line endings instead of localizing > them. The code passes all PPI and Perl::Critic tests. I implemented > this change to enable a new Perl::Critic policy, > CodeLayout::RequireConsistentNewlines, which is intended to help with > code signing problems. > > There are still a few un-fixed methods that are not exercised by the > test suite, like PPI::Token::Pod::merge(). These need to be fixed > before this patch could be committed. Additionally, the PPI.pm docs > need to be updated to reflect this change. > > In the process, I added two new API methods: PPI::Node::line_ending() > and PPI::Token::HereDoc::Terminator_line(). > > This patch will collide a tiny bit with my "$_" patch, but it should be > easy to repair (I'm happy to do the legwork on that). > > *** > > This implementation simply changes all "\n" checks to > "\015{1,2}\012|\012|\015" checks. An alternative implementation would > be for the tokenizer to look ahead to the first line ending, store that > value, and use it throughout tokenization. That implementation would be > more like what text editors do, but less like Perl (I think). > > Chris > > -- > Chris Dolan, Software Developer, http://www.chrisdolan.net/ > Public key: http://www.chrisdolan.net/public.key > vCard: http://www.chrisdolan.net/ChrisDolan.vcf > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > ------------------------------------------------------------------------ > > _______________________________________________ > Parseperl-discuss mailing list > Par...@li... > https://lists.sourceforge.net/lists/listinfo/parseperl-discuss |
From: Adam K. <ad...@ph...> - 2006-10-05 01:09:57
|
Feel free to correct my spelling :) I'm not aware of anybody using that method. Since your changes are fairly large, I'm delaying release until a few other nice things get added and doing a 1.200 release anyway I think, so it's fine. We'll do a couple of dev releases before the real one, to shake out any minor things. Adam K Chris Dolan wrote: > The word "separator" is often misspelled as "seperator" in PPI code. > I'd like to change this to the correct spelling, but am concerned > about breaking compatibility. It *looks* to me that this would be a > very minor incompatibility. The only external API method that would > be affected is PPI::Normal::Standard::remove_statement_seperator(). > Is it OK to change the name of that method? > > Below is a list of all places in the code where this spelling > correction would happen. > > Chris > > % egrep -rn seperator lib t | grep -v svn > lib/PPI/Normal/Standard.pm:37: remove_statement_seperator => 2, > lib/PPI/Normal/Standard.pm:90:sub remove_statement_seperator { > lib/PPI/Token/_QuoteEngine/Full.pm:25: 'q' => { operator > => 'q', braced => undef, seperator => undef, _sections => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:26: 'qq' => { operator > => 'qq', braced => undef, seperator => undef, _sections => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:27: 'qx' => { operator > => 'qx', braced => undef, seperator => undef, _sections => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:28: 'qw' => { operator > => 'qw', braced => undef, seperator => undef, _sections => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:29: 'qr' => { operator > => 'qr', braced => undef, seperator => undef, _sections => 1, > modifiers => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:30: 'm' => { operator > => 'm', braced => undef, seperator => undef, _sections => 1, > modifiers => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:31: 's' => { operator > => 's', braced => undef, seperator => undef, _sections => 2, > modifiers => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:32: 'tr' => { operator > => 'tr', braced => undef, seperator => undef, _sections => 2, > modifiers => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:35: 'y' => { operator > => 'y', braced => undef, seperator => undef, _sections => 2, > modifiers => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:37: '/' => { operator > => undef, braced => 0, seperator => '/', _sections => 1, > modifiers => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:40: '<' => { operator > => undef, braced => 1, seperator => undef, _sections => 1, }, > lib/PPI/Token/_QuoteEngine/Full.pm:45: '?' => { operator > => undef, braced => 0, seperator => '?', _sections => 1, > modifieds => 1 }, > lib/PPI/Token/_QuoteEngine/Full.pm:110: # operator and the > opening seperator. > lib/PPI/Token/_QuoteEngine/Full.pm:123: # The character we > are now on is the seperator. Capture, > lib/PPI/Token/_QuoteEngine/Full.pm:134: $self-> > {seperator} = $sep; > lib/PPI/Token/_QuoteEngine/Full.pm:163: # Get the content up to the > next seperator > lib/PPI/Token/_QuoteEngine/Full.pm:164: my $string = $self- > >_scan_for_unescaped_character( $t, $self->{seperator} ); > lib/PPI/Token/_QuoteEngine/Full.pm:187: # Get the content up to the > end seperator > lib/PPI/Token/_QuoteEngine/Full.pm:188: $string = $self- > >_scan_for_unescaped_character( $t, $self->{seperator} ); > lib/PPI/Token/_QuoteEngine/Simple.pm:19: my $seperator = shift > or return undef; > lib/PPI/Token/_QuoteEngine/Simple.pm:21: # Create a new token > containing the seperator > lib/PPI/Token/_QuoteEngine/Simple.pm:25: my $self = > PPI::Token::new( $class, $seperator ) or return undef; > lib/PPI/Token/_QuoteEngine/Simple.pm:26: $self->{seperator} = > $seperator; > lib/PPI/Token/_QuoteEngine/Simple.pm:36: # Scan for the end > seperator > lib/PPI/Token/_QuoteEngine/Simple.pm:37: my $string = $self- > >_scan_for_unescaped_character( $t, $self->{seperator} ); > t/02_api.t:414:remove_statement_seperator=method > t/08_regression.t:90: seperator => undef, > t/08_regression.t:121: seperator => undef, > > |
From: Chris D. <ch...@ch...> - 2006-10-04 21:45:36
|
Here's another slightly controversial patch that I don't feel comfortable just committing without a nod from Adam: This patch changes PPI to preserve line endings instead of localizing them. The code passes all PPI and Perl::Critic tests. I implemented this change to enable a new Perl::Critic policy, CodeLayout::RequireConsistentNewlines, which is intended to help with code signing problems. There are still a few un-fixed methods that are not exercised by the test suite, like PPI::Token::Pod::merge(). These need to be fixed before this patch could be committed. Additionally, the PPI.pm docs need to be updated to reflect this change. In the process, I added two new API methods: PPI::Node::line_ending () and PPI::Token::HereDoc::Terminator_line(). This patch will collide a tiny bit with my "$_" patch, but it should be easy to repair (I'm happy to do the legwork on that). *** This implementation simply changes all "\n" checks to "\015{1,2}\012| \012|\015" checks. An alternative implementation would be for the tokenizer to look ahead to the first line ending, store that value, and use it throughout tokenization. That implementation would be more like what text editors do, but less like Perl (I think). Chris -- Chris Dolan, Software Developer, http://www.chrisdolan.net/ Public key: http://www.chrisdolan.net/public.key vCard: http://www.chrisdolan.net/ChrisDolan.vcf |
From: Chris D. <ch...@ch...> - 2006-10-04 18:49:05
|
PPI makes heavy use of unlocalized $_. This mangles any existing $_ in surrounding code, which is bad form. The attached patch changes all $_ assignments to lexical vars. A nice side effect is that it makes the code much more readable. The performance hit from this patch is about 1.0+/-0.5%, based on running "make test" with and without it three times each. If this degradation is real and not just normal variability, is this patch acceptable? Along the way, I fixed a bug in PPI::Tokenizer::get_token where $_ was used in multiple contexts and got overwritten incorrectly. The code worked by coincidence only. Chris -- Chris Dolan, Software Developer, http://www.chrisdolan.net/ Public key: http://www.chrisdolan.net/public.key vCard: http://www.chrisdolan.net/ChrisDolan.vcf |
From: Chris D. <ch...@cl...> - 2006-10-04 17:27:02
|
The word "separator" is often misspelled as "seperator" in PPI code. I'd like to change this to the correct spelling, but am concerned about breaking compatibility. It *looks* to me that this would be a very minor incompatibility. The only external API method that would be affected is PPI::Normal::Standard::remove_statement_seperator(). Is it OK to change the name of that method? Below is a list of all places in the code where this spelling correction would happen. Chris % egrep -rn seperator lib t | grep -v svn lib/PPI/Normal/Standard.pm:37: remove_statement_seperator => 2, lib/PPI/Normal/Standard.pm:90:sub remove_statement_seperator { lib/PPI/Token/_QuoteEngine/Full.pm:25: 'q' => { operator => 'q', braced => undef, seperator => undef, _sections => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:26: 'qq' => { operator => 'qq', braced => undef, seperator => undef, _sections => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:27: 'qx' => { operator => 'qx', braced => undef, seperator => undef, _sections => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:28: 'qw' => { operator => 'qw', braced => undef, seperator => undef, _sections => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:29: 'qr' => { operator => 'qr', braced => undef, seperator => undef, _sections => 1, modifiers => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:30: 'm' => { operator => 'm', braced => undef, seperator => undef, _sections => 1, modifiers => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:31: 's' => { operator => 's', braced => undef, seperator => undef, _sections => 2, modifiers => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:32: 'tr' => { operator => 'tr', braced => undef, seperator => undef, _sections => 2, modifiers => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:35: 'y' => { operator => 'y', braced => undef, seperator => undef, _sections => 2, modifiers => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:37: '/' => { operator => undef, braced => 0, seperator => '/', _sections => 1, modifiers => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:40: '<' => { operator => undef, braced => 1, seperator => undef, _sections => 1, }, lib/PPI/Token/_QuoteEngine/Full.pm:45: '?' => { operator => undef, braced => 0, seperator => '?', _sections => 1, modifieds => 1 }, lib/PPI/Token/_QuoteEngine/Full.pm:110: # operator and the opening seperator. lib/PPI/Token/_QuoteEngine/Full.pm:123: # The character we are now on is the seperator. Capture, lib/PPI/Token/_QuoteEngine/Full.pm:134: $self-> {seperator} = $sep; lib/PPI/Token/_QuoteEngine/Full.pm:163: # Get the content up to the next seperator lib/PPI/Token/_QuoteEngine/Full.pm:164: my $string = $self- >_scan_for_unescaped_character( $t, $self->{seperator} ); lib/PPI/Token/_QuoteEngine/Full.pm:187: # Get the content up to the end seperator lib/PPI/Token/_QuoteEngine/Full.pm:188: $string = $self- >_scan_for_unescaped_character( $t, $self->{seperator} ); lib/PPI/Token/_QuoteEngine/Simple.pm:19: my $seperator = shift or return undef; lib/PPI/Token/_QuoteEngine/Simple.pm:21: # Create a new token containing the seperator lib/PPI/Token/_QuoteEngine/Simple.pm:25: my $self = PPI::Token::new( $class, $seperator ) or return undef; lib/PPI/Token/_QuoteEngine/Simple.pm:26: $self->{seperator} = $seperator; lib/PPI/Token/_QuoteEngine/Simple.pm:36: # Scan for the end seperator lib/PPI/Token/_QuoteEngine/Simple.pm:37: my $string = $self- >_scan_for_unescaped_character( $t, $self->{seperator} ); t/02_api.t:414:remove_statement_seperator=method t/08_regression.t:90: seperator => undef, t/08_regression.t:121: seperator => undef, -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |