Re: [Parseperl-discuss] [PATCH] preserve newlines
Brought to you by:
adamkennedy
From: Chris D. <ch...@ch...> - 2006-10-05 05:24:55
|
On Oct 4, 2006, at 11:43 PM, Adam Kennedy wrote: > Unfortunately, I don't only work with real documents. Sorry, I was unclear. By "real documents" I meant anything that the Tokenizer can emit. I do not believe it is possible for the tokenizer to emit a PPI::Token::HereDoc where the _terminator_line is non-null where the line for the token lacks a newline. The only way that's possible is via generated content. The latter is what I meant by non-real. > That's why exhaustive.t is in there, so throw line noise at you and > kick out in the ass when you get sloppy :) > > And we only run it in light mode, turned up to full it takes 6+ > hours to run. At one point it was throwing up error cases that only > triggered every 1.5+/-1 hours. Egads! >> * We could have the \n localization on by default and have a >> Tokenizer flag that disables it. That way, Perl::Critic could >> dig down to the real newlines, but other downstream modules could >> remain care-free. Perhaps that flag would happen in PPI::Document? > > It would be there almost certainly. > > The issue wasn't so much making it easier on the programmer, as > making it sane. If I didn't localise the newlines, it would become > a major gotcha, and an enormous source of bugs. Hmm, enormous? You are obviously much more familiar with the gotchas than I am, but my patch wasn't that hard to write and works well with the existing test suite. Perhaps my confidence is unwarranted? > I see a couple of solutions. > > Firstly, I really want to keep things localised internally. > > So pre-scanning the document text (which we do anyway for unicode > checks, or at least we used to before the latin-1 improvements) to > pick up 100% unix/win32/mac, storing that newline type in a top > level document accessor, and then writing back out to the same > type, is probably ok. We'd have to change the code in add_element, remove, and replace to correct the newlines on entry for new tokens. That calls for a set_newline method on PPI::Element and PPI::Node (and PPI::Token::HereDoc). > That leaves us only with the case of mixed newlines. Personally, > outside of binary files I am not away of ANY cases in which mixed > newlines in a text file is allowed, even in __DATA__ segments. Well, certainly my goal is to get rid of the mixed newlines! That's why I was writing a Perl::Critic policy against that. :-) > In THAT case, perhaps we either localise, or we flatten to the > first newline in the file. > > I'd be happy to implement that as a first step towards full native > mixed newlines, as the functionality seems fairly containable. > > It also matches what some of the better editors do... localise > internally, but remember and save out as the same input type. Not Emacs. It picks one newline to work on and treats the others like binary. So, if you're in \n mode and there is a single \r\n in the file, you see a "^M" character at the end of that line. > But I'm honestly not aware of ANYTHING that handles mixed newlines > properly. I've done tons of unix/win32/mac cross-over work and I've > seen just about every screwed up case there is. > > Even Dreamweaver, which inspired PPI in the first place, doesn't > handle fucked up broken newlines. > > So we'd have to invent the solution. You must be a step ahead of me or something. Doesn't "handle" simply mean serialization and de-serialization for PPI? Maybe I've spent too much time just in the Perl::Critic case? > And I'm still not (yet) convinced that native mixed newlines is the > answer... if only because how the hell do we guarentee round-trip > safety for them? If we do it, it needs to be 100%. I feel like I must be missing some crucial point. For the read-only case, isn't round-trip safety just ensuring that we spit out exactly what the Tokenizer took in? With the exception of the HereDoc stuff already mentioned, we've already achieved that with my patch, I think. That leaves just the generated code case to worry about. In that case, we decide on a dominant line ending, like Emacs does, and ensure that all added tokens inherit that line ending. If the generator *wants* to make mixed newlines, that's the only really hard case, and that can be worked around with set_newline. Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |