Re: [Parseperl-discuss] [PATCH] preserve newlines
Brought to you by:
adamkennedy
From: Adam K. <ad...@ph...> - 2006-10-05 04:45:04
|
Chris Dolan wrote: > Adam, > > [this email got longer than expected... Skip to the last bullet for > the most interesting action item, IMO] > > I understand your nervousness. I hadn't thought of the downstream \n > effect -- I thought it was just to keep the PPI code simpler. As for > the regexp soup, I had thought of the same thing, but decided to > email you before doing additional tweaks in case you rejected the > concept. > > I do think this change is important, however. It is the one place > where PPI is lossy. As is, I have to worry about working on Windows > perl code (with \r\n) on my Mac (where \n is the default), which I > did not realize until today. While the \n localization is easier on > the downstream *programmer*, it's harder on the downstream *user*, > because the output of serialize() cannot be compared reliably to the > original source. > > In particular, the \n localization obscures the case of inconsistent > newlines in the source code. Or, if I had code that HAD to have \r\n > in the __DATA__ section (not sure why...) then PPI would break it on > my Mac. It just feels wrong, I guess. > > A few side notes: > > * Perl::Critic only relies on the localized \n in one place, so > that would be easy to fix. > > * The look ahead to the first line ending was only to be used in > HereDoc repair in PPI::Document::serialize(). Personally, I think > that attempt to repair is dubious anyway -- real documents should > never have that problem I think. Unfortunately, I don't only work with real documents. That's why exhaustive.t is in there, so throw line noise at you and kick out in the ass when you get sloppy :) And we only run it in light mode, turned up to full it takes 6+ hours to run. At one point it was throwing up error cases that only triggered every 1.5+/-1 hours. So, perhaps we could skip the > line_ending() function and chop code from serialize(). > > * We could have the \n localization on by default and have a > Tokenizer flag that disables it. That way, Perl::Critic could dig > down to the real newlines, but other downstream modules could remain > care-free. Perhaps that flag would happen in PPI::Document? It would be there almost certainly. The issue wasn't so much making it easier on the programmer, as making it sane. If I didn't localise the newlines, it would become a major gotcha, and an enormous source of bugs. I see a couple of solutions. Firstly, I really want to keep things localised internally. So pre-scanning the document text (which we do anyway for unicode checks, or at least we used to before the latin-1 improvements) to pick up 100% unix/win32/mac, storing that newline type in a top level document accessor, and then writing back out to the same type, is probably ok. That leaves us only with the case of mixed newlines. Personally, outside of binary files I am not away of ANY cases in which mixed newlines in a text file is allowed, even in __DATA__ segments. In THAT case, perhaps we either localise, or we flatten to the first newline in the file. I'd be happy to implement that as a first step towards full native mixed newlines, as the functionality seems fairly containable. It also matches what some of the better editors do... localise internally, but remember and save out as the same input type. But I'm honestly not aware of ANYTHING that handles mixed newlines properly. I've done tons of unix/win32/mac cross-over work and I've seen just about every screwed up case there is. Even Dreamweaver, which inspired PPI in the first place, doesn't handle fucked up broken newlines. So we'd have to invent the solution. And I'm still not (yet) convinced that native mixed newlines is the answer... if only because how the hell do we guarentee round-trip safety for them? If we do it, it needs to be 100%. Adam K > Chris > > > On Oct 4, 2006, at 8:14 PM, Adam Kennedy wrote: > >> I'm quite nervous about this one, if only because of the problem >> that nobody downstream can use \n either (which is partly why I >> localised the newlines in the first place). >> >> The look ahead to first end-line isn't an option, it fails in too >> many cases. >> >> I might be able to be convinced though, but we need find a way to >> mitigate the regex soup effect. >> >> Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL >> or something) that might make life a little simpler. >> >> Adam K >> >> Chris Dolan wrote: >>> Here's another slightly controversial patch that I don't feel >>> comfortable just committing without a nod from Adam: >>> This patch changes PPI to preserve line endings instead of >>> localizing them. The code passes all PPI and Perl::Critic tests. >>> I implemented this change to enable a new Perl::Critic policy, >>> CodeLayout::RequireConsistentNewlines, which is intended to help >>> with code signing problems. >>> There are still a few un-fixed methods that are not exercised by >>> the test suite, like PPI::Token::Pod::merge(). These need to be >>> fixed before this patch could be committed. Additionally, the >>> PPI.pm docs need to be updated to reflect this change. >>> In the process, I added two new API methods: >>> PPI::Node::line_ending() and PPI::Token::HereDoc::Terminator_line(). >>> This patch will collide a tiny bit with my "$_" patch, but it >>> should be easy to repair (I'm happy to do the legwork on that). >>> *** >>> This implementation simply changes all "\n" checks to "\015{1,2} >>> \012|\012|\015" checks. An alternative implementation would be >>> for the tokenizer to look ahead to the first line ending, store >>> that value, and use it throughout tokenization. That >>> implementation would be more like what text editors do, but less >>> like Perl (I think). >>> Chris >>> -- >>> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >>> Public key: http://www.chrisdolan.net/public.key >>> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >>> --------------------------------------------------------------------- >>> --- >>> --------------------------------------------------------------------- >>> ---- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share your >>> opinions on IT & business topics through brief surveys -- and earn >>> cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> --------------------------------------------------------------------- >>> --- >>> _______________________________________________ >>> Parseperl-discuss mailing list >>> Par...@li... >>> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > |