Thread: [Parseperl-discuss] [PATCH] preserve newlines
Brought to you by:
adamkennedy
From: Chris D. <ch...@ch...> - 2006-10-04 21:45:36
Attachments:
newline.patch
|
Here's another slightly controversial patch that I don't feel comfortable just committing without a nod from Adam: This patch changes PPI to preserve line endings instead of localizing them. The code passes all PPI and Perl::Critic tests. I implemented this change to enable a new Perl::Critic policy, CodeLayout::RequireConsistentNewlines, which is intended to help with code signing problems. There are still a few un-fixed methods that are not exercised by the test suite, like PPI::Token::Pod::merge(). These need to be fixed before this patch could be committed. Additionally, the PPI.pm docs need to be updated to reflect this change. In the process, I added two new API methods: PPI::Node::line_ending () and PPI::Token::HereDoc::Terminator_line(). This patch will collide a tiny bit with my "$_" patch, but it should be easy to repair (I'm happy to do the legwork on that). *** This implementation simply changes all "\n" checks to "\015{1,2}\012| \012|\015" checks. An alternative implementation would be for the tokenizer to look ahead to the first line ending, store that value, and use it throughout tokenization. That implementation would be more like what text editors do, but less like Perl (I think). Chris -- Chris Dolan, Software Developer, http://www.chrisdolan.net/ Public key: http://www.chrisdolan.net/public.key vCard: http://www.chrisdolan.net/ChrisDolan.vcf |
From: Adam K. <ad...@ph...> - 2006-10-05 01:16:02
|
I'm quite nervous about this one, if only because of the problem that nobody downstream can use \n either (which is partly why I localised the newlines in the first place). The look ahead to first end-line isn't an option, it fails in too many cases. I might be able to be convinced though, but we need find a way to mitigate the regex soup effect. Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL or something) that might make life a little simpler. Adam K Chris Dolan wrote: > Here's another slightly controversial patch that I don't feel > comfortable just committing without a nod from Adam: > > This patch changes PPI to preserve line endings instead of localizing > them. The code passes all PPI and Perl::Critic tests. I implemented > this change to enable a new Perl::Critic policy, > CodeLayout::RequireConsistentNewlines, which is intended to help with > code signing problems. > > There are still a few un-fixed methods that are not exercised by the > test suite, like PPI::Token::Pod::merge(). These need to be fixed > before this patch could be committed. Additionally, the PPI.pm docs > need to be updated to reflect this change. > > In the process, I added two new API methods: PPI::Node::line_ending() > and PPI::Token::HereDoc::Terminator_line(). > > This patch will collide a tiny bit with my "$_" patch, but it should be > easy to repair (I'm happy to do the legwork on that). > > *** > > This implementation simply changes all "\n" checks to > "\015{1,2}\012|\012|\015" checks. An alternative implementation would > be for the tokenizer to look ahead to the first line ending, store that > value, and use it throughout tokenization. That implementation would be > more like what text editors do, but less like Perl (I think). > > Chris > > -- > Chris Dolan, Software Developer, http://www.chrisdolan.net/ > Public key: http://www.chrisdolan.net/public.key > vCard: http://www.chrisdolan.net/ChrisDolan.vcf > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > ------------------------------------------------------------------------ > > _______________________________________________ > Parseperl-discuss mailing list > Par...@li... > https://lists.sourceforge.net/lists/listinfo/parseperl-discuss |
From: Chris D. <ch...@ch...> - 2006-10-05 04:16:50
|
Adam, [this email got longer than expected... Skip to the last bullet for the most interesting action item, IMO] I understand your nervousness. I hadn't thought of the downstream \n effect -- I thought it was just to keep the PPI code simpler. As for the regexp soup, I had thought of the same thing, but decided to email you before doing additional tweaks in case you rejected the concept. I do think this change is important, however. It is the one place where PPI is lossy. As is, I have to worry about working on Windows perl code (with \r\n) on my Mac (where \n is the default), which I did not realize until today. While the \n localization is easier on the downstream *programmer*, it's harder on the downstream *user*, because the output of serialize() cannot be compared reliably to the original source. In particular, the \n localization obscures the case of inconsistent newlines in the source code. Or, if I had code that HAD to have \r\n in the __DATA__ section (not sure why...) then PPI would break it on my Mac. It just feels wrong, I guess. A few side notes: * Perl::Critic only relies on the localized \n in one place, so that would be easy to fix. * The look ahead to the first line ending was only to be used in HereDoc repair in PPI::Document::serialize(). Personally, I think that attempt to repair is dubious anyway -- real documents should never have that problem I think. So, perhaps we could skip the line_ending() function and chop code from serialize(). * We could have the \n localization on by default and have a Tokenizer flag that disables it. That way, Perl::Critic could dig down to the real newlines, but other downstream modules could remain care-free. Perhaps that flag would happen in PPI::Document? Chris On Oct 4, 2006, at 8:14 PM, Adam Kennedy wrote: > I'm quite nervous about this one, if only because of the problem > that nobody downstream can use \n either (which is partly why I > localised the newlines in the first place). > > The look ahead to first end-line isn't an option, it fails in too > many cases. > > I might be able to be convinced though, but we need find a way to > mitigate the regex soup effect. > > Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL > or something) that might make life a little simpler. > > Adam K > > Chris Dolan wrote: >> Here's another slightly controversial patch that I don't feel >> comfortable just committing without a nod from Adam: >> This patch changes PPI to preserve line endings instead of >> localizing them. The code passes all PPI and Perl::Critic tests. >> I implemented this change to enable a new Perl::Critic policy, >> CodeLayout::RequireConsistentNewlines, which is intended to help >> with code signing problems. >> There are still a few un-fixed methods that are not exercised by >> the test suite, like PPI::Token::Pod::merge(). These need to be >> fixed before this patch could be committed. Additionally, the >> PPI.pm docs need to be updated to reflect this change. >> In the process, I added two new API methods: >> PPI::Node::line_ending() and PPI::Token::HereDoc::Terminator_line(). >> This patch will collide a tiny bit with my "$_" patch, but it >> should be easy to repair (I'm happy to do the legwork on that). >> *** >> This implementation simply changes all "\n" checks to "\015{1,2} >> \012|\012|\015" checks. An alternative implementation would be >> for the tokenizer to look ahead to the first line ending, store >> that value, and use it throughout tokenization. That >> implementation would be more like what text editors do, but less >> like Perl (I think). >> Chris >> -- >> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >> Public key: http://www.chrisdolan.net/public.key >> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >> --------------------------------------------------------------------- >> --- >> --------------------------------------------------------------------- >> ---- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys -- and earn >> cash >> http://www.techsay.com/default.php? >> page=join.php&p=sourceforge&CID=DEVDEV >> --------------------------------------------------------------------- >> --- >> _______________________________________________ >> Parseperl-discuss mailing list >> Par...@li... >> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 04:45:04
|
Chris Dolan wrote: > Adam, > > [this email got longer than expected... Skip to the last bullet for > the most interesting action item, IMO] > > I understand your nervousness. I hadn't thought of the downstream \n > effect -- I thought it was just to keep the PPI code simpler. As for > the regexp soup, I had thought of the same thing, but decided to > email you before doing additional tweaks in case you rejected the > concept. > > I do think this change is important, however. It is the one place > where PPI is lossy. As is, I have to worry about working on Windows > perl code (with \r\n) on my Mac (where \n is the default), which I > did not realize until today. While the \n localization is easier on > the downstream *programmer*, it's harder on the downstream *user*, > because the output of serialize() cannot be compared reliably to the > original source. > > In particular, the \n localization obscures the case of inconsistent > newlines in the source code. Or, if I had code that HAD to have \r\n > in the __DATA__ section (not sure why...) then PPI would break it on > my Mac. It just feels wrong, I guess. > > A few side notes: > > * Perl::Critic only relies on the localized \n in one place, so > that would be easy to fix. > > * The look ahead to the first line ending was only to be used in > HereDoc repair in PPI::Document::serialize(). Personally, I think > that attempt to repair is dubious anyway -- real documents should > never have that problem I think. Unfortunately, I don't only work with real documents. That's why exhaustive.t is in there, so throw line noise at you and kick out in the ass when you get sloppy :) And we only run it in light mode, turned up to full it takes 6+ hours to run. At one point it was throwing up error cases that only triggered every 1.5+/-1 hours. So, perhaps we could skip the > line_ending() function and chop code from serialize(). > > * We could have the \n localization on by default and have a > Tokenizer flag that disables it. That way, Perl::Critic could dig > down to the real newlines, but other downstream modules could remain > care-free. Perhaps that flag would happen in PPI::Document? It would be there almost certainly. The issue wasn't so much making it easier on the programmer, as making it sane. If I didn't localise the newlines, it would become a major gotcha, and an enormous source of bugs. I see a couple of solutions. Firstly, I really want to keep things localised internally. So pre-scanning the document text (which we do anyway for unicode checks, or at least we used to before the latin-1 improvements) to pick up 100% unix/win32/mac, storing that newline type in a top level document accessor, and then writing back out to the same type, is probably ok. That leaves us only with the case of mixed newlines. Personally, outside of binary files I am not away of ANY cases in which mixed newlines in a text file is allowed, even in __DATA__ segments. In THAT case, perhaps we either localise, or we flatten to the first newline in the file. I'd be happy to implement that as a first step towards full native mixed newlines, as the functionality seems fairly containable. It also matches what some of the better editors do... localise internally, but remember and save out as the same input type. But I'm honestly not aware of ANYTHING that handles mixed newlines properly. I've done tons of unix/win32/mac cross-over work and I've seen just about every screwed up case there is. Even Dreamweaver, which inspired PPI in the first place, doesn't handle fucked up broken newlines. So we'd have to invent the solution. And I'm still not (yet) convinced that native mixed newlines is the answer... if only because how the hell do we guarentee round-trip safety for them? If we do it, it needs to be 100%. Adam K > Chris > > > On Oct 4, 2006, at 8:14 PM, Adam Kennedy wrote: > >> I'm quite nervous about this one, if only because of the problem >> that nobody downstream can use \n either (which is partly why I >> localised the newlines in the first place). >> >> The look ahead to first end-line isn't an option, it fails in too >> many cases. >> >> I might be able to be convinced though, but we need find a way to >> mitigate the regex soup effect. >> >> Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL >> or something) that might make life a little simpler. >> >> Adam K >> >> Chris Dolan wrote: >>> Here's another slightly controversial patch that I don't feel >>> comfortable just committing without a nod from Adam: >>> This patch changes PPI to preserve line endings instead of >>> localizing them. The code passes all PPI and Perl::Critic tests. >>> I implemented this change to enable a new Perl::Critic policy, >>> CodeLayout::RequireConsistentNewlines, which is intended to help >>> with code signing problems. >>> There are still a few un-fixed methods that are not exercised by >>> the test suite, like PPI::Token::Pod::merge(). These need to be >>> fixed before this patch could be committed. Additionally, the >>> PPI.pm docs need to be updated to reflect this change. >>> In the process, I added two new API methods: >>> PPI::Node::line_ending() and PPI::Token::HereDoc::Terminator_line(). >>> This patch will collide a tiny bit with my "$_" patch, but it >>> should be easy to repair (I'm happy to do the legwork on that). >>> *** >>> This implementation simply changes all "\n" checks to "\015{1,2} >>> \012|\012|\015" checks. An alternative implementation would be >>> for the tokenizer to look ahead to the first line ending, store >>> that value, and use it throughout tokenization. That >>> implementation would be more like what text editors do, but less >>> like Perl (I think). >>> Chris >>> -- >>> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >>> Public key: http://www.chrisdolan.net/public.key >>> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >>> --------------------------------------------------------------------- >>> --- >>> --------------------------------------------------------------------- >>> ---- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share your >>> opinions on IT & business topics through brief surveys -- and earn >>> cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> --------------------------------------------------------------------- >>> --- >>> _______________________________________________ >>> Parseperl-discuss mailing list >>> Par...@li... >>> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > |
From: Chris D. <ch...@ch...> - 2006-10-05 05:24:55
|
On Oct 4, 2006, at 11:43 PM, Adam Kennedy wrote: > Unfortunately, I don't only work with real documents. Sorry, I was unclear. By "real documents" I meant anything that the Tokenizer can emit. I do not believe it is possible for the tokenizer to emit a PPI::Token::HereDoc where the _terminator_line is non-null where the line for the token lacks a newline. The only way that's possible is via generated content. The latter is what I meant by non-real. > That's why exhaustive.t is in there, so throw line noise at you and > kick out in the ass when you get sloppy :) > > And we only run it in light mode, turned up to full it takes 6+ > hours to run. At one point it was throwing up error cases that only > triggered every 1.5+/-1 hours. Egads! >> * We could have the \n localization on by default and have a >> Tokenizer flag that disables it. That way, Perl::Critic could >> dig down to the real newlines, but other downstream modules could >> remain care-free. Perhaps that flag would happen in PPI::Document? > > It would be there almost certainly. > > The issue wasn't so much making it easier on the programmer, as > making it sane. If I didn't localise the newlines, it would become > a major gotcha, and an enormous source of bugs. Hmm, enormous? You are obviously much more familiar with the gotchas than I am, but my patch wasn't that hard to write and works well with the existing test suite. Perhaps my confidence is unwarranted? > I see a couple of solutions. > > Firstly, I really want to keep things localised internally. > > So pre-scanning the document text (which we do anyway for unicode > checks, or at least we used to before the latin-1 improvements) to > pick up 100% unix/win32/mac, storing that newline type in a top > level document accessor, and then writing back out to the same > type, is probably ok. We'd have to change the code in add_element, remove, and replace to correct the newlines on entry for new tokens. That calls for a set_newline method on PPI::Element and PPI::Node (and PPI::Token::HereDoc). > That leaves us only with the case of mixed newlines. Personally, > outside of binary files I am not away of ANY cases in which mixed > newlines in a text file is allowed, even in __DATA__ segments. Well, certainly my goal is to get rid of the mixed newlines! That's why I was writing a Perl::Critic policy against that. :-) > In THAT case, perhaps we either localise, or we flatten to the > first newline in the file. > > I'd be happy to implement that as a first step towards full native > mixed newlines, as the functionality seems fairly containable. > > It also matches what some of the better editors do... localise > internally, but remember and save out as the same input type. Not Emacs. It picks one newline to work on and treats the others like binary. So, if you're in \n mode and there is a single \r\n in the file, you see a "^M" character at the end of that line. > But I'm honestly not aware of ANYTHING that handles mixed newlines > properly. I've done tons of unix/win32/mac cross-over work and I've > seen just about every screwed up case there is. > > Even Dreamweaver, which inspired PPI in the first place, doesn't > handle fucked up broken newlines. > > So we'd have to invent the solution. You must be a step ahead of me or something. Doesn't "handle" simply mean serialization and de-serialization for PPI? Maybe I've spent too much time just in the Perl::Critic case? > And I'm still not (yet) convinced that native mixed newlines is the > answer... if only because how the hell do we guarentee round-trip > safety for them? If we do it, it needs to be 100%. I feel like I must be missing some crucial point. For the read-only case, isn't round-trip safety just ensuring that we spit out exactly what the Tokenizer took in? With the exception of the HereDoc stuff already mentioned, we've already achieved that with my patch, I think. That leaves just the generated code case to worry about. In that case, we decide on a dominant line ending, like Emacs does, and ensure that all added tokens inherit that line ending. If the generator *wants* to make mixed newlines, that's the only really hard case, and that can be worked around with set_newline. Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 05:41:54
|
Chris Dolan wrote: > On Oct 4, 2006, at 11:43 PM, Adam Kennedy wrote: > >> Unfortunately, I don't only work with real documents. > > Sorry, I was unclear. By "real documents" I meant anything that the > Tokenizer can emit. I do not believe it is possible for the > tokenizer to emit a PPI::Token::HereDoc where the _terminator_line is > non-null where the line for the token lacks a newline. The only way > that's possible is via generated content. The latter is what I meant > by non-real. Hmm... possibly if the terminator is the last line in the program. PPI::Document->new( \"<<FOO\ninside\nFOO" ); >> The issue wasn't so much making it easier on the programmer, as >> making it sane. If I didn't localise the newlines, it would become >> a major gotcha, and an enormous source of bugs. > > Hmm, enormous? You are obviously much more familiar with the gotchas > than I am, but my patch wasn't that hard to write and works well with > the existing test suite. Perhaps my confidence is unwarranted? Lets imagine that we do native newlines. What is the first thing that people are going to do when they do minor generation. Mostly likely, they'll end up using native newlines. What I don't want to happen is for everyone to make this mistake once (the gotcha) and then have to go and do extra things just to get back to normal behaviour. >> I see a couple of solutions. >> >> Firstly, I really want to keep things localised internally. >> >> So pre-scanning the document text (which we do anyway for unicode >> checks, or at least we used to before the latin-1 improvements) to >> pick up 100% unix/win32/mac, storing that newline type in a top >> level document accessor, and then writing back out to the same >> type, is probably ok. > > We'd have to change the code in add_element, remove, and replace to > correct the newlines on entry for new tokens. That calls for a > set_newline method on PPI::Element and PPI::Node (and > PPI::Token::HereDoc). Hmm... or we just do it automatically. When you add, scan upwards to the document, find the right thing, and ... yeah I get what you are saying. >> That leaves us only with the case of mixed newlines. Personally, >> outside of binary files I am not away of ANY cases in which mixed >> newlines in a text file is allowed, even in __DATA__ segments. > > Well, certainly my goal is to get rid of the mixed newlines! That's > why I was writing a Perl::Critic policy against that. :-) Well, you know it's possible to detect mixed newlines from the raw source right? Does critic have access to the original file/string? >> In THAT case, perhaps we either localise, or we flatten to the >> first newline in the file. >> >> I'd be happy to implement that as a first step towards full native >> mixed newlines, as the functionality seems fairly containable. >> >> It also matches what some of the better editors do... localise >> internally, but remember and save out as the same input type. > > Not Emacs. It picks one newline to work on and treats the others > like binary. So, if you're in \n mode and there is a single \r\n in > the file, you see a "^M" character at the end of that line. Well, I mean for the non-mixed case, they deal with it internally. Emacs et al doing binary items inside a text file is one example of what I mean by handling it wrong. >> But I'm honestly not aware of ANYTHING that handles mixed newlines >> properly. I've done tons of unix/win32/mac cross-over work and I've >> seen just about every screwed up case there is. >> >> Even Dreamweaver, which inspired PPI in the first place, doesn't >> handle fucked up broken newlines. >> >> So we'd have to invent the solution. > > You must be a step ahead of me or something. Doesn't "handle" simply > mean serialization and de-serialization for PPI? Maybe I've spent > too much time just in the Perl::Critic case? There's two levels here. One is the parser/serializer, the second is the API layer. If we provide add_before, for example, and it's mixed, what do we do? Say it's mixed unix/mac running on Win32, do we insert with the Win32 newlines and now have 3-platform mix? We need to define the "correct" behaviour for working with mixed newlines... and I don't know that anyone else has done this yet. >> And I'm still not (yet) convinced that native mixed newlines is the >> answer... if only because how the hell do we guarentee round-trip >> safety for them? If we do it, it needs to be 100%. > > I feel like I must be missing some crucial point. For the read-only > case, isn't round-trip safety just ensuring that we spit out exactly > what the Tokenizer took in? With the exception of the HereDoc stuff > already mentioned, we've already achieved that with my patch, I think. > > That leaves just the generated code case to worry about. In that > case, we decide on a dominant line ending, like Emacs does, and > ensure that all added tokens inherit that line ending. If the > generator *wants* to make mixed newlines, that's the only really hard > case, and that can be worked around with set_newline. I don't think anyone ever really WANTs to make mixed newlines. As I said, I don't know of any legitimate real-world cases of this. Adam K |
From: Chris D. <ch...@ch...> - 2006-10-05 06:06:09
|
On Oct 5, 2006, at 12:40 AM, Adam Kennedy wrote: >>> That leaves us only with the case of mixed newlines. Personally, >>> outside of binary files I am not away of ANY cases in which >>> mixed newlines in a text file is allowed, even in __DATA__ >>> segments. >> Well, certainly my goal is to get rid of the mixed newlines! >> That's why I was writing a Perl::Critic policy against that. :-) > > Well, you know it's possible to detect mixed newlines from the raw > source right? > > Does critic have access to the original file/string? In theory almost always yes, in practice no. We accept PPI::Document instances, so that's one case where I can't ever get the raw source. We also accept STDIN, so I'd have to explicitly save the source. In the most common case, we build a PPI::Document::File. I *could* of course use the filename() to get back to source. The main reason I can't use raw source is that we have pseudo-pragmas that turn Perl::Critic on/off over lexical scopes. In the bizarre case of someone who wanted to have mixed newlines and used a "## no critic" to make Perl::Critic ignore it, I couldn't catch that. But maybe it's a lost cause at that point? Nonetheless, I find this discussion interesting beyond Perl::Critic's immediate problem. >>> In THAT case, perhaps we either localise, or we flatten to the >>> first newline in the file. >>> >>> I'd be happy to implement that as a first step towards full >>> native mixed newlines, as the functionality seems fairly >>> containable. >>> >>> It also matches what some of the better editors do... localise >>> internally, but remember and save out as the same input type. >> Not Emacs. It picks one newline to work on and treats the others >> like binary. So, if you're in \n mode and there is a single \r\n >> in the file, you see a "^M" character at the end of that line. > > Well, I mean for the non-mixed case, they deal with it internally. > > Emacs et al doing binary items inside a text file is one example of > what I mean by handling it wrong. Hmm... Maybe I've just been using Emacs so long that its behavior feels intuitive to me? :-) >>> But I'm honestly not aware of ANYTHING that handles mixed >>> newlines properly. I've done tons of unix/win32/mac cross-over >>> work and I've seen just about every screwed up case there is. >>> >>> Even Dreamweaver, which inspired PPI in the first place, doesn't >>> handle fucked up broken newlines. >>> >>> So we'd have to invent the solution. >> You must be a step ahead of me or something. Doesn't "handle" >> simply mean serialization and de-serialization for PPI? Maybe >> I've spent too much time just in the Perl::Critic case? > > There's two levels here. > > One is the parser/serializer, the second is the API layer. If we > provide add_before, for example, and it's mixed, what do we do? > > Say it's mixed unix/mac running on Win32, do we insert with the > Win32 newlines and now have 3-platform mix? > > We need to define the "correct" behaviour for working with mixed > newlines... and I don't know that anyone else has done this yet. I think the correct behavior on insert is to convert to document newline by default, and possibly offer a way to preserve the mixed newlines if the generator requests it. The mechanism for that "request" can be a workaround (add_before() then set_newline()), or a flag to add_before(), or a sticky_newline flag on the node to be inserted. But as you say... > I don't think anyone ever really WANTs to make mixed newlines. As I > said, I don't know of any legitimate real-world cases of this. ... so probably the generator case is just to adopt the newline of the surrounding document. There are plenty of real world cases of mixed newlines, just none of them intentional! Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 06:16:10
|
Chris Dolan wrote: > On Oct 5, 2006, at 12:40 AM, Adam Kennedy wrote: > >>>> That leaves us only with the case of mixed newlines. Personally, >>>> outside of binary files I am not away of ANY cases in which mixed >>>> newlines in a text file is allowed, even in __DATA__ segments. >>> Well, certainly my goal is to get rid of the mixed newlines! That's >>> why I was writing a Perl::Critic policy against that. :-) >> >> Well, you know it's possible to detect mixed newlines from the raw >> source right? >> >> Does critic have access to the original file/string? > > In theory almost always yes, in practice no. We accept PPI::Document > instances, so that's one case where I can't ever get the raw source. We > also accept STDIN, so I'd have to explicitly save the source. In the > most common case, we build a PPI::Document::File. I *could* of course > use the filename() to get back to source. Oh right, I forgot about the STDIN stuff... so yeah, practically speaking you don't have access. Suck. >> Emacs et al doing binary items inside a text file is one example of >> what I mean by handling it wrong. > > Hmm... Maybe I've just been using Emacs so long that its behavior feels > intuitive to me? :-) Probably :) I mean, it's a newline, and Perl doesn't have ANY binary (ignoring unicode) content. The fact it isn't a newline for the current platform (at least with respect for Perl) should be irrelevant. > I think the correct behavior on insert is to convert to document newline > by default, and possibly offer a way to preserve the mixed newlines if > the generator requests it. The mechanism for that "request" can be a > workaround (add_before() then set_newline()), or a flag to add_before(), > or a sticky_newline flag on the node to be inserted. I'm happy to go with "convert to the document default automatically" and ignore any customization capability until someone actually asks for it :) >> I don't think anyone ever really WANTs to make mixed newlines. As I >> said, I don't know of any legitimate real-world cases of this. > > ... so probably the generator case is just to adopt the newline of the > surrounding document. > > There are plenty of real world cases of mixed newlines, just none of > them intentional! Right :) So I guess to start with, we need to take the input string, and detect 1. All Unix 2. All Win32 3. All Mac 4. Mixed Unix-dominant 5. Mixed Win32-dominant 6. Mixed Mac-dominant :/ So... $doc->{newlines} = 'unix|win32|max' and $doc->{newlines_mixed} = 1|0 What do you think? Adam K Adam K |
From: Chris D. <ch...@ch...> - 2006-10-05 07:12:27
|
On Oct 5, 2006, at 1:14 AM, Adam Kennedy wrote: > So I guess to start with, we need to take the input string, and detect > > 1. All Unix > 2. All Win32 > 3. All Mac > 4. Mixed Unix-dominant > 5. Mixed Win32-dominant > 6. Mixed Mac-dominant > > :/ > > So... $doc->{newlines} = 'unix|win32|max' and $doc-> > {newlines_mixed} = 1|0 > > What do you think? Sounds good. Instead of "unix/win/mac" I'd prefer "NL/CRNL/CR" as the identifiers to be platform agnostic. Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |
From: Adam K. <ad...@ph...> - 2006-10-05 03:28:39
|
Also, remembering back to my original decision, there were also some other factors. For example, parsing is one thing, but what happens when you start generating content? What newline do you use then? We have to remember the newline scheme (if clean) or somehow make a decision on what to use if messy. We can't actually insert using \n any more, we need to use some per-document value... $doc->newline or something. It makes a lot of things messier than they should be. A lot of that mess isn't covered in the unit tests, because I solved the whole problem by localising newlines, so the tests didn't matter. Adam K Adam Kennedy wrote: > I'm quite nervous about this one, if only because of the problem that > nobody downstream can use \n either (which is partly why I localised the > newlines in the first place). > > The look ahead to first end-line isn't an option, it fails in too many > cases. > > I might be able to be convinced though, but we need find a way to > mitigate the regex soup effect. > > Maybe if we provided \015{1,2}\012|\012|\015 as an export (as $XPNL or > something) that might make life a little simpler. > > Adam K > > Chris Dolan wrote: >> Here's another slightly controversial patch that I don't feel >> comfortable just committing without a nod from Adam: >> >> This patch changes PPI to preserve line endings instead of localizing >> them. The code passes all PPI and Perl::Critic tests. I implemented >> this change to enable a new Perl::Critic policy, >> CodeLayout::RequireConsistentNewlines, which is intended to help with >> code signing problems. >> >> There are still a few un-fixed methods that are not exercised by the >> test suite, like PPI::Token::Pod::merge(). These need to be fixed >> before this patch could be committed. Additionally, the PPI.pm docs >> need to be updated to reflect this change. >> >> In the process, I added two new API methods: PPI::Node::line_ending() >> and PPI::Token::HereDoc::Terminator_line(). >> >> This patch will collide a tiny bit with my "$_" patch, but it should be >> easy to repair (I'm happy to do the legwork on that). >> >> *** >> >> This implementation simply changes all "\n" checks to >> "\015{1,2}\012|\012|\015" checks. An alternative implementation would >> be for the tokenizer to look ahead to the first line ending, store that >> value, and use it throughout tokenization. That implementation would be >> more like what text editors do, but less like Perl (I think). >> >> Chris >> >> -- >> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >> Public key: http://www.chrisdolan.net/public.key >> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >> >> >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share your >> opinions on IT & business topics through brief surveys -- and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Parseperl-discuss mailing list >> Par...@li... >> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Parseperl-discuss mailing list > Par...@li... > https://lists.sourceforge.net/lists/listinfo/parseperl-discuss |
From: Chris D. <ch...@ch...> - 2006-10-05 04:30:12
|
Yes, I understand the point of generated content. I thought about that. It occurred to me that the tokenizer could watch all (well, most) of the line endings and store the newline that has the plurality as the document newline. Then, Node.pm (or Normal.pm?) could have a set_newline method that traverses the tokens and changes the newline. Generated code could use that value too. As for the unit tests, I checked and all the files in PPI, and none have \r anywhere except inc/Test/ClassAPI.pm. I've written a bunch of mixed "\r\n","\n" regression tests for Perl::Critic that could be repurposed for PPI. Chris On Oct 4, 2006, at 10:26 PM, Adam Kennedy wrote: > Also, remembering back to my original decision, there were also > some other factors. > > For example, parsing is one thing, but what happens when you start > generating content? What newline do you use then? > > We have to remember the newline scheme (if clean) or somehow make a > decision on what to use if messy. > > We can't actually insert using \n any more, we need to use some per- > document value... $doc->newline or something. > > It makes a lot of things messier than they should be. > > A lot of that mess isn't covered in the unit tests, because I > solved the whole problem by localising newlines, so the tests > didn't matter. > > Adam K > > Adam Kennedy wrote: >> I'm quite nervous about this one, if only because of the problem >> that nobody downstream can use \n either (which is partly why I >> localised the newlines in the first place). >> The look ahead to first end-line isn't an option, it fails in too >> many cases. >> I might be able to be convinced though, but we need find a way to >> mitigate the regex soup effect. >> Maybe if we provided \015{1,2}\012|\012|\015 as an export (as >> $XPNL or something) that might make life a little simpler. >> Adam K >> Chris Dolan wrote: >>> Here's another slightly controversial patch that I don't feel >>> comfortable just committing without a nod from Adam: >>> >>> This patch changes PPI to preserve line endings instead of >>> localizing them. The code passes all PPI and Perl::Critic >>> tests. I implemented this change to enable a new Perl::Critic >>> policy, CodeLayout::RequireConsistentNewlines, which is intended >>> to help with code signing problems. >>> >>> There are still a few un-fixed methods that are not exercised by >>> the test suite, like PPI::Token::Pod::merge(). These need to be >>> fixed before this patch could be committed. Additionally, the >>> PPI.pm docs need to be updated to reflect this change. >>> >>> In the process, I added two new API methods: >>> PPI::Node::line_ending() and PPI::Token::HereDoc::Terminator_line(). >>> >>> This patch will collide a tiny bit with my "$_" patch, but it >>> should be easy to repair (I'm happy to do the legwork on that). >>> >>> *** >>> >>> This implementation simply changes all "\n" checks to "\015{1,2} >>> \012|\012|\015" checks. An alternative implementation would be >>> for the tokenizer to look ahead to the first line ending, store >>> that value, and use it throughout tokenization. That >>> implementation would be more like what text editors do, but less >>> like Perl (I think). >>> >>> Chris >>> >>> -- >>> Chris Dolan, Software Developer, http://www.chrisdolan.net/ >>> Public key: http://www.chrisdolan.net/public.key >>> vCard: http://www.chrisdolan.net/ChrisDolan.vcf >>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> ---- >>> >>> -------------------------------------------------------------------- >>> ----- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share your >>> opinions on IT & business topics through brief surveys -- and >>> earn cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> -------------------------------------------------------------------- >>> ---- >>> >>> _______________________________________________ >>> Parseperl-discuss mailing list >>> Par...@li... >>> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss >> --------------------------------------------------------------------- >> ---- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys -- and earn >> cash >> http://www.techsay.com/default.php? >> page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Parseperl-discuss mailing list >> Par...@li... >> https://lists.sourceforge.net/lists/listinfo/parseperl-discuss > -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 vCard: http://www.chrisdolan.net/ChrisDolan.vcf Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/) |