From: Ingy d. N. <in...@in...> - 2011-10-28 19:47:23
|
Hi Gang, I wanted to tell you about a bold initiative that I started this week. I call it YAML2. The idea is that it is time to start up a new wave of YAML development, while not disturbing the YAML toolchains that are in common use. YAML is fairly usable and has no big fires to put out. Development of the language has been fairly dormant. The Open Source world is so much bigger and rich with tools and social systems. The YAML world we set out to create 10 years ago, has barely reached the vision we dreamed of. Today, YAML is the language of choice for simple config files, and dumping objects to readable text. We thought it would be so much more. Serial processing of infinite streams, realtime object messaging between multiple languages, YPath, YSchema (and the rest of the stuff we could do better than XML) comes to mind. I'd like to see books on a YAML world that is deserving of books. My idea is to start a new community driven process with all these big goals in mind, without disturbing the YAML user base in the process. When we started out 10 years ago we had 3 people on a mailing list trying to agree on a spec, and finally producing a tome that is very hard for mortals to read, let alone implement. We ended up with a handful of languages implementing things with very different APIs, and bugs from differing interpretations of edge cases. On the other had, YAML is a real success! We got something huge off the ground. We have a well defined language, some decent implementations, a ton of experience and a great community. It's a perfect foundation for Round 2. So what is YAML2 about? * It's about producing a YAML 2.0 that is mostly a *simplification* of YAML 1.2. * It's about NOT doing this "Spec First". * It's about having a common test suite that defines the language. * It's about having matching-API, full-stack implementations in all the languages that support JSON. * It's about having a common YAML grammar that these implementations work from. * It's about starting YPath and YSchema and YTransform things implementations from the beginning. * It's about doing everything on GitHub and Wiki so that everyone can drive the process, and so that nobody bottlenecks it. * It's about writing a community book on YAML from the start. Most importantly it's about us all doing this together, and having the results (when they are stable) replace the current YAML stuffs. I decided to get this started because I feel strongly that it's the right time, and I've never been afraid to JFDI. The idea came upon me a couple days ago, spurred by recent activity of YPath, an offline spec email, and recent completions of cross-langauge parsing and testing frameworks I've been working on (Pegex and TestML). I put together some YAML2 basics, but didn't want to go any further without inviting everyone in the community to join in. I created a "yaml2" GitHub organization, and started forking YAML related repos into it. Everyone is welcome to join in. I started a #yaml2 irc channel on freenode, and got yaml2.org just in case we need it later. Most importantly I started a GitHub wiki on a repository called YAML2. This wiki is itself a git repo so you can clone, edit and push to it. https://github.com/yaml2/YAML2/wiki There's much more I have to say but I'd rather not do it on the mailing list. That's so YAML1 ;) So I'll put that all on the wiki. I think it's fine to discuss the meta idea of YAML2 here, but let's keep the specifics in wiki, tests and code. Cheers, Ingy |
From: Oren Ben-K. <or...@be...> - 2011-10-28 21:16:44
|
Looks all good. What parts of the current YAML spec are viewed as especially irksome (I'm talking about the content and not its presentation :-)? We already stripped away some troublesome "least common denominator" types and the Unicode line break handling. What's next? I think folded scalars were mentioned at some point (I'm ambivalent about that). Tags? Nobody really uses them, and we could clean up the stream definition and some nasty syntax edge cases if we re-worked them... I added GoodParts and a BadParts Wiki pages for people to add their un/favorites into... Have fun, Oren Ben-Kiki On Fri, Oct 28, 2011 at 9:47 PM, Ingy dot Net <in...@in...> wrote: > Hi Gang, > > I wanted to tell you about a bold initiative that I started this week. I > call it YAML2. > > The idea is that it is time to start up a new wave of YAML development, > while not disturbing the YAML toolchains that are in common use. YAML is > fairly usable and has no big fires to put out. Development of the language > has been fairly dormant. The Open Source world is so much bigger and rich > with tools and social systems. The YAML world we set out to create 10 years > ago, has barely reached the vision we dreamed of. Today, YAML is the > language of choice for simple config files, and dumping objects to readable > text. We thought it would be so much more. Serial processing of infinite > streams, realtime object messaging between multiple languages, YPath, > YSchema (and the rest of the stuff we could do better than XML) comes to > mind. I'd like to see books on a YAML world that is deserving of books. > > My idea is to start a new community driven process with all these big goals > in mind, without disturbing the YAML user base in the process. When we > started out 10 years ago we had 3 people on a mailing list trying to agree > on a spec, and finally producing a tome that is very hard for mortals to > read, let alone implement. We ended up with a handful of languages > implementing things with very different APIs, and bugs from differing > interpretations of edge cases. On the other had, YAML is a real success! We > got something huge off the ground. We have a well defined language, some > decent implementations, a ton of experience and a great community. It's a > perfect foundation for Round 2. > > So what is YAML2 about? > > * It's about producing a YAML 2.0 that is mostly a *simplification* of > YAML 1.2. > * It's about NOT doing this "Spec First". > * It's about having a common test suite that defines the language. > * It's about having matching-API, full-stack implementations in all the > languages that support JSON. > * It's about having a common YAML grammar that these implementations work > from. > * It's about starting YPath and YSchema and YTransform things > implementations from the beginning. > * It's about doing everything on GitHub and Wiki so that everyone can drive > the process, and so that nobody bottlenecks it. > * It's about writing a community book on YAML from the start. > > Most importantly it's about us all doing this together, and having the > results (when they are stable) replace the current YAML stuffs. I decided to > get this started because I feel strongly that it's the right time, and I've > never been afraid to JFDI. The idea came upon me a couple days ago, spurred > by recent activity of YPath, an offline spec email, and recent completions > of cross-langauge parsing and testing frameworks I've been working on (Pegex > and TestML). > > I put together some YAML2 basics, but didn't want to go any further without > inviting everyone in the community to join in. I created a "yaml2" GitHub > organization, and started forking YAML related repos into it. Everyone is > welcome to join in. I started a #yaml2 irc channel on freenode, and got > yaml2.org just in case we need it later. Most importantly I started a > GitHub wiki on a repository called YAML2. This wiki is itself a git repo so > you can clone, edit and push to it. > > https://github.com/yaml2/YAML2/wiki > > There's much more I have to say but I'd rather not do it on the mailing > list. That's so YAML1 ;) So I'll put that all on the wiki. I think it's fine > to discuss the meta idea of YAML2 here, but let's keep the specifics in > wiki, tests and code. > > Cheers, Ingy > > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > > |
From: William S. <sp...@rh...> - 2011-10-28 23:02:39
|
PLEASE!!! This is the main reason we cannot use unaltered YAML: SUPPORT FOR INVALID UTF-8 AND UTF-16 There needs to be a way to put an arbitrary byte sequence into a scalar without losing the ability to make valid byte sequences human-readable. Currently YAML is limited to only putting byte sequences that are valid UTF-8 into scalars, unless some transformation is done that makes some (often all) Unicode unreadable in the YAML input. This has the counter-productive effect of *discouraging* use of Unicode on any backend that uses bytes where there is no guarantee that the backend limits the byte sequences to valid UTF-8. Examples are all byte-based file formats, most internet protocols, Unix filenames, and Windows resource identifiers. My recommendation is here. However any solution that allows an arbitrary byte stream to be produced, while allowing valid UTF-8 bytes to be represented by the correct Unicode character in the YAML source, would be acceptable: 1. The backslash escape of \xNN represents a "raw UTF-8 byte" with the given value. This is only different from current YAML for 0x80-0xFF. The sequence \u00NN must be used for actual Unicode code points in this range. 2. An api that requests YAML scalars as UTF-8 gets these bytes, inserted between the UTF-8 encoding of all other characters, as raw data. 3. An api that requests scalers in some other form, such as UTF-16, gets these bytes as unchanged code units. This makes \xNN work identically to current YAML/JSON when the UTF-16 api is used. It may also allow invalid forms of other encodings to be supported. In addition invalid UTF-16 must also be supported. Support of invalid UTF-16 is more common, due to it's use on Windows and therefore the realization by otherwise ignorant programmers of the inability to work without supporting them. Technically the YAML spec does not allow invalid UTF-16, but my proposal here formalizes the actual support that is in most (all?) YAML and JSON implementations: 1. The backslash escape of \uNNNN for NNNN in the range 0xD800..0xDFFF represents a "raw UTF-16 code unit". 2. An api that requests UTF-16 or other 16-bit code units will get these codes unchanged. 3. An api that requests bytes will get 3 for each of these, these three bytes match the encoding you get from UTF-8 if you extend it to these invalid code points. |
From: Ingy d. N. <in...@in...> - 2011-10-29 02:43:28
|
William, I put the content of this email here: https://github.com/yaml2/YAML2/wiki/Unicode-Strictness Oren, I started listing some bad parts. I am making a page for each bad point so that they can each have their own long discussion. On Fri, Oct 28, 2011 at 3:49 PM, William Spitzak <sp...@rh...> wrote: > PLEASE!!! This is the main reason we cannot use unaltered YAML: > > SUPPORT FOR INVALID UTF-8 AND UTF-16 > > There needs to be a way to put an arbitrary byte sequence into a scalar > without losing the ability to make valid byte sequences human-readable. > > Currently YAML is limited to only putting byte sequences that are valid > UTF-8 into scalars, unless some transformation is done that makes some > (often all) Unicode unreadable in the YAML input. This has the > counter-productive effect of *discouraging* use of Unicode on any backend > that uses bytes where there is no guarantee that the backend limits the byte > sequences to valid UTF-8. Examples are all byte-based file formats, most > internet protocols, Unix filenames, and Windows resource identifiers. > > My recommendation is here. However any solution that allows an arbitrary > byte stream to be produced, while allowing valid UTF-8 bytes to be > represented by the correct Unicode character in the YAML source, would be > acceptable: > > 1. The backslash escape of \xNN represents a "raw UTF-8 byte" with the > given value. This is only different from current YAML for 0x80-0xFF. The > sequence \u00NN must be used for actual Unicode code points in this range. > > 2. An api that requests YAML scalars as UTF-8 gets these bytes, inserted > between the UTF-8 encoding of all other characters, as raw data. > > 3. An api that requests scalers in some other form, such as UTF-16, gets > these bytes as unchanged code units. This makes \xNN work identically to > current YAML/JSON when the UTF-16 api is used. It may also allow invalid > forms of other encodings to be supported. > > In addition invalid UTF-16 must also be supported. Support of invalid > UTF-16 is more common, due to it's use on Windows and therefore the > realization by otherwise ignorant programmers of the inability to work > without supporting them. Technically the YAML spec does not allow invalid > UTF-16, but my proposal here formalizes the actual support that is in most > (all?) YAML and JSON implementations: > > 1. The backslash escape of \uNNNN for NNNN in the range 0xD800..0xDFFF > represents a "raw UTF-16 code unit". > > 2. An api that requests UTF-16 or other 16-bit code units will get these > codes unchanged. > > 3. An api that requests bytes will get 3 for each of these, these three > bytes match the encoding you get from UTF-8 if you extend it to these > invalid code points. > |
From: Devin J. <jea...@gm...> - 2011-10-29 02:54:50
|
> There needs to be a way to put an arbitrary byte sequence into a scalar > without losing the ability to make valid byte sequences human-readable. Mixing byte data into what is ostensibly unicode seems like a bad idea. Either have a particular part of the document be unicode, or bytes, not both. That said, if you mix unicode and bytes in the same file, it ceases to be exactly readable in a standard text editor. So I don't like that either. Maybe two separate types of YAML file (text / binary/compressed)? more than one protocol has had a "binary" version made of it. Devin On Fri, Oct 28, 2011 at 6:49 PM, William Spitzak <sp...@rh...> wrote: > PLEASE!!! This is the main reason we cannot use unaltered YAML: > > SUPPORT FOR INVALID UTF-8 AND UTF-16 > > There needs to be a way to put an arbitrary byte sequence into a scalar > without losing the ability to make valid byte sequences human-readable. > > Currently YAML is limited to only putting byte sequences that are valid > UTF-8 into scalars, unless some transformation is done that makes some > (often all) Unicode unreadable in the YAML input. This has the > counter-productive effect of *discouraging* use of Unicode on any > backend that uses bytes where there is no guarantee that the backend > limits the byte sequences to valid UTF-8. Examples are all byte-based > file formats, most internet protocols, Unix filenames, and Windows > resource identifiers. > > My recommendation is here. However any solution that allows an arbitrary > byte stream to be produced, while allowing valid UTF-8 bytes to be > represented by the correct Unicode character in the YAML source, would > be acceptable: > > 1. The backslash escape of \xNN represents a "raw UTF-8 byte" with the > given value. This is only different from current YAML for 0x80-0xFF. The > sequence \u00NN must be used for actual Unicode code points in this range. > > 2. An api that requests YAML scalars as UTF-8 gets these bytes, inserted > between the UTF-8 encoding of all other characters, as raw data. > > 3. An api that requests scalers in some other form, such as UTF-16, gets > these bytes as unchanged code units. This makes \xNN work identically to > current YAML/JSON when the UTF-16 api is used. It may also allow invalid > forms of other encodings to be supported. > > In addition invalid UTF-16 must also be supported. Support of invalid > UTF-16 is more common, due to it's use on Windows and therefore the > realization by otherwise ignorant programmers of the inability to work > without supporting them. Technically the YAML spec does not allow > invalid UTF-16, but my proposal here formalizes the actual support that > is in most (all?) YAML and JSON implementations: > > 1. The backslash escape of \uNNNN for NNNN in the range 0xD800..0xDFFF > represents a "raw UTF-16 code unit". > > 2. An api that requests UTF-16 or other 16-bit code units will get these > codes unchanged. > > 3. An api that requests bytes will get 3 for each of these, these three > bytes match the encoding you get from UTF-8 if you extend it to these > invalid code points. > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > |
From: Ingy d. N. <in...@in...> - 2011-10-29 05:03:59
|
Hi Devin, I added your comments to https://github.com/yaml2/YAML2/wiki/Unicode-Strictness This topic seems like it would be more poignant if people actually wrote some test cases about what they were talking about. It's so easy to be misunderstood on this type of issue. But with actual test files, it's much less so. The best way to do this would be to simply create a repo on github and push some files up. Even if it's not completely running code, it would be helpful. Of course, running code in something like pyyaml would be even better. I really don't want to have YAML2 discussions, without actual tests to make people's points. William, it would be great if you could post some files that elegantly show off your concerns. Otherwise it just feels like conjecture. Ingy On Fri, Oct 28, 2011 at 7:54 PM, Devin Jeanpierre <jea...@gm...>wrote: > > There needs to be a way to put an arbitrary byte sequence into a scalar > > without losing the ability to make valid byte sequences human-readable. > > Mixing byte data into what is ostensibly unicode seems like a bad > idea. Either have a particular part of the document be unicode, or > bytes, not both. > > That said, if you mix unicode and bytes in the same file, it ceases to > be exactly readable in a standard text editor. So I don't like that > either. Maybe two separate types of YAML file (text / > binary/compressed)? more than one protocol has had a "binary" version > made of it. > > Devin > > On Fri, Oct 28, 2011 at 6:49 PM, William Spitzak <sp...@rh...> > wrote: > > PLEASE!!! This is the main reason we cannot use unaltered YAML: > > > > SUPPORT FOR INVALID UTF-8 AND UTF-16 > > > > There needs to be a way to put an arbitrary byte sequence into a scalar > > without losing the ability to make valid byte sequences human-readable. > > > > Currently YAML is limited to only putting byte sequences that are valid > > UTF-8 into scalars, unless some transformation is done that makes some > > (often all) Unicode unreadable in the YAML input. This has the > > counter-productive effect of *discouraging* use of Unicode on any > > backend that uses bytes where there is no guarantee that the backend > > limits the byte sequences to valid UTF-8. Examples are all byte-based > > file formats, most internet protocols, Unix filenames, and Windows > > resource identifiers. > > > > My recommendation is here. However any solution that allows an arbitrary > > byte stream to be produced, while allowing valid UTF-8 bytes to be > > represented by the correct Unicode character in the YAML source, would > > be acceptable: > > > > 1. The backslash escape of \xNN represents a "raw UTF-8 byte" with the > > given value. This is only different from current YAML for 0x80-0xFF. The > > sequence \u00NN must be used for actual Unicode code points in this > range. > > > > 2. An api that requests YAML scalars as UTF-8 gets these bytes, inserted > > between the UTF-8 encoding of all other characters, as raw data. > > > > 3. An api that requests scalers in some other form, such as UTF-16, gets > > these bytes as unchanged code units. This makes \xNN work identically to > > current YAML/JSON when the UTF-16 api is used. It may also allow invalid > > forms of other encodings to be supported. > > > > In addition invalid UTF-16 must also be supported. Support of invalid > > UTF-16 is more common, due to it's use on Windows and therefore the > > realization by otherwise ignorant programmers of the inability to work > > without supporting them. Technically the YAML spec does not allow > > invalid UTF-16, but my proposal here formalizes the actual support that > > is in most (all?) YAML and JSON implementations: > > > > 1. The backslash escape of \uNNNN for NNNN in the range 0xD800..0xDFFF > > represents a "raw UTF-16 code unit". > > > > 2. An api that requests UTF-16 or other 16-bit code units will get these > > codes unchanged. > > > > 3. An api that requests bytes will get 3 for each of these, these three > > bytes match the encoding you get from UTF-8 if you extend it to these > > invalid code points. > > > > > ------------------------------------------------------------------------------ > > The demand for IT networking professionals continues to grow, and the > > demand for specialized networking skills is growing even more rapidly. > > Take a complimentary Learning@Cisco Self-Assessment and learn > > about Cisco certifications, training, and career opportunities. > > http://p.sf.net/sfu/cisco-dev2dev > > _______________________________________________ > > Yaml-core mailing list > > Yam...@li... > > https://lists.sourceforge.net/lists/listinfo/yaml-core > > > > > ------------------------------------------------------------------------------ > Get your Android app more play: Bring it to the BlackBerry PlayBook > in minutes. BlackBerry App World™ now supports Android™ Apps > for the BlackBerry® PlayBook™. Discover just how easy and simple > it is! http://p.sf.net/sfu/android-dev2dev > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > |
From: Peter M. <pet...@gm...> - 2011-10-29 05:53:33
|
William, There's already a binary tag in YAML, which allows arbitrary binary data to be encoded as base64. http://yaml.org/type/binary.html Is there any reason why that's not good enough for your needs? Best regards, Peter On Sat, Oct 29, 2011 at 3:03 PM, Ingy dot Net <in...@in...> wrote: > Hi Devin, > > I added your comments to > https://github.com/yaml2/YAML2/wiki/Unicode-Strictness > > This topic seems like it would be more poignant if people actually wrote > some test cases about what they were talking about. It's so easy to be > misunderstood on this type of issue. But with actual test files, it's much > less so. > > The best way to do this would be to simply create a repo on github and push > some files up. Even if it's not completely running code, it would be > helpful. Of course, running code in something like pyyaml would be even > better. > > I really don't want to have YAML2 discussions, without actual tests to make > people's points. William, it would be great if you could post some files > that elegantly show off your concerns. Otherwise it just feels like > conjecture. > > Ingy > > On Fri, Oct 28, 2011 at 7:54 PM, Devin Jeanpierre <jea...@gm...> > wrote: >> >> > There needs to be a way to put an arbitrary byte sequence into a scalar >> > without losing the ability to make valid byte sequences human-readable. >> >> Mixing byte data into what is ostensibly unicode seems like a bad >> idea. Either have a particular part of the document be unicode, or >> bytes, not both. >> >> That said, if you mix unicode and bytes in the same file, it ceases to >> be exactly readable in a standard text editor. So I don't like that >> either. Maybe two separate types of YAML file (text / >> binary/compressed)? more than one protocol has had a "binary" version >> made of it. >> >> Devin >> >> On Fri, Oct 28, 2011 at 6:49 PM, William Spitzak <sp...@rh...> >> wrote: >> > PLEASE!!! This is the main reason we cannot use unaltered YAML: >> > >> > SUPPORT FOR INVALID UTF-8 AND UTF-16 >> > >> > There needs to be a way to put an arbitrary byte sequence into a scalar >> > without losing the ability to make valid byte sequences human-readable. >> > >> > Currently YAML is limited to only putting byte sequences that are valid >> > UTF-8 into scalars, unless some transformation is done that makes some >> > (often all) Unicode unreadable in the YAML input. This has the >> > counter-productive effect of *discouraging* use of Unicode on any >> > backend that uses bytes where there is no guarantee that the backend >> > limits the byte sequences to valid UTF-8. Examples are all byte-based >> > file formats, most internet protocols, Unix filenames, and Windows >> > resource identifiers. >> > >> > My recommendation is here. However any solution that allows an arbitrary >> > byte stream to be produced, while allowing valid UTF-8 bytes to be >> > represented by the correct Unicode character in the YAML source, would >> > be acceptable: >> > >> > 1. The backslash escape of \xNN represents a "raw UTF-8 byte" with the >> > given value. This is only different from current YAML for 0x80-0xFF. The >> > sequence \u00NN must be used for actual Unicode code points in this >> > range. >> > >> > 2. An api that requests YAML scalars as UTF-8 gets these bytes, inserted >> > between the UTF-8 encoding of all other characters, as raw data. >> > >> > 3. An api that requests scalers in some other form, such as UTF-16, gets >> > these bytes as unchanged code units. This makes \xNN work identically to >> > current YAML/JSON when the UTF-16 api is used. It may also allow invalid >> > forms of other encodings to be supported. >> > >> > In addition invalid UTF-16 must also be supported. Support of invalid >> > UTF-16 is more common, due to it's use on Windows and therefore the >> > realization by otherwise ignorant programmers of the inability to work >> > without supporting them. Technically the YAML spec does not allow >> > invalid UTF-16, but my proposal here formalizes the actual support that >> > is in most (all?) YAML and JSON implementations: >> > >> > 1. The backslash escape of \uNNNN for NNNN in the range 0xD800..0xDFFF >> > represents a "raw UTF-16 code unit". >> > >> > 2. An api that requests UTF-16 or other 16-bit code units will get these >> > codes unchanged. >> > >> > 3. An api that requests bytes will get 3 for each of these, these three >> > bytes match the encoding you get from UTF-8 if you extend it to these >> > invalid code points. >> > >> > >> > ------------------------------------------------------------------------------ >> > The demand for IT networking professionals continues to grow, and the >> > demand for specialized networking skills is growing even more rapidly. >> > Take a complimentary Learning@Cisco Self-Assessment and learn >> > about Cisco certifications, training, and career opportunities. >> > http://p.sf.net/sfu/cisco-dev2dev >> > _______________________________________________ >> > Yaml-core mailing list >> > Yam...@li... >> > https://lists.sourceforge.net/lists/listinfo/yaml-core >> > >> >> >> ------------------------------------------------------------------------------ >> Get your Android app more play: Bring it to the BlackBerry PlayBook >> in minutes. BlackBerry App World™ now supports Android™ Apps >> for the BlackBerry® PlayBook™. Discover just how easy and simple >> it is! http://p.sf.net/sfu/android-dev2dev >> _______________________________________________ >> Yaml-core mailing list >> Yam...@li... >> https://lists.sourceforge.net/lists/listinfo/yaml-core > > > ------------------------------------------------------------------------------ > Get your Android app more play: Bring it to the BlackBerry PlayBook > in minutes. BlackBerry App World™ now supports Android™ Apps > for the BlackBerry® PlayBook™. Discover just how easy and simple > it is! http://p.sf.net/sfu/android-dev2dev > > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > > -- Email: pet...@gm... WWW: http://www.pkmurphy.com.au/ |
From: William S. <sp...@rh...> - 2011-10-31 18:14:00
|
That is no good because it defeats the "if the string is valid Unicode it is readable" requirement. If you binary-encode all the filenames in your YAML file, great you can name anything, but I thought the purpose of YAML was to be *READABLE*. Peter Murphy wrote: > William, > > There's already a binary tag in YAML, which allows arbitrary binary > data to be encoded as base64. > > http://yaml.org/type/binary.html > > Is there any reason why that's not good enough for your needs? > > Best regards, > Peter |
From: William S. <sp...@rh...> - 2011-10-31 18:31:46
|
Ingy dot Net wrote: > Hi Devin, > > I added your comments to > https://github.com/yaml2/YAML2/wiki/Unicode-Strictness > > This topic seems like it would be more poignant if people actually wrote > some test cases about what they were talking about. It's so easy to be > misunderstood on this type of issue. But with actual test files, it's > much less so. It is impossible to make a stand-alone YAML test case, as it is currently impossible to write invalid UTF-8 byte sequences into a scalar. An example of a program is to, on Linux, do this: 1. Set your locale to a UTF-8 one, as is default on all modern systems 2. Create GoodFile with non-ASCII UTF-8 characters in the filename 3. Using low-level code, create BadFile with invalid UTF-8 in the filename (just add a single 0x80 byte to the end of GoodFile's name, it is even better if there is valid UTF-8 in the name as well). TCL is a good way to do this, or you can write C code. 4. Now imagine a YAML file that has a structure that has a "filename" member. Figure out a way to write a YAML description so that both of these files can be named and GoodFile uses the *CORRECT* Unicode characters in the YAML file (ie do NOT "double encode" as about 100 people have suggested), and BadFile at least displays the ASCII characters the same so that an uneducated user can at least make a good guess as to the which file shown in another list is named in the YAML file. 5. In case you think "oh the non-UTF-8 files are ILLEGAL and therefore I will pretend that magic makes them suddenly non-existent when I switch to UTF-8", lets imagine the YAML source is to this "magic" program and it's job is to rename bad filenames to correct UTF-8. You must provide this program with a list of the files to fix. As this program does other useful things with valid UTF-8 (perhaps it also changes normalization forms) the input must also be readable when the files are valid UTF-8. |
From: Oren Ben-K. <or...@be...> - 2011-10-31 18:54:35
|
Ok, that's a fair use case. Please add it to the Wiki. It is pretty horrible that the OS allows such shenanigans, but I guess that's a price we have to pay for building OS-es before Unicode existed. Nothing prevents you from adding a \x80 at the end of the file name in the YAML file, using VI or Notepad or whatever. I bet that some YAML libraries will even load it into a non-valid UTF-8 "string" in memory, "illegal" though it may be. But does this mean we need to mandate that all YAML implementations silently create invalid UTF-8 strings in memory? That seems a bit excessive... If I had to deal with this use case, I'd use something like: filename: !badstr BadName\x80 Which is perfectly valid YAML. The application is fairly warned the scalar is an invalid string and is free to deal with it as it sees fit - up to and including loading it into a normal string object, if that works for it. At the same time, "innocent" YAML applications are not exposed to random exceptions raised by their too-strict string libraries. Have fun, Oren Ben-Kiki On Mon, Oct 31, 2011 at 8:31 PM, William Spitzak <sp...@rh...> wrote: > 1. Set your locale to a UTF-8 one, as is default on all modern systems > > 2. Create GoodFile with non-ASCII UTF-8 characters in the filename > > 3. Using low-level code, create BadFile with invalid UTF-8 in the > filename... > > 4. Now imagine a YAML file that has a structure that has a "filename" > member... |
From: William S. <sp...@rh...> - 2011-10-31 20:24:50
|
Oren Ben-Kiki wrote: > Ok, that's a fair use case. Please add it to the Wiki. > > It is pretty horrible that the OS allows such shenanigans, but I guess > that's a price we have to pay for building OS-es before Unicode existed. I want to take SERIOUS objection to your idea that not code enforcing Unicode standards in the operating system kernel is "shenanigans". Windows allows invalid UTF-16 in it's filenames, yet you don't hear people saying this is somehow "not supporting Unicode". This is a DISGUSTING double standard and I consider an insult to many people including the recently-deceased Ritchie. The FIRST operating system to correctly handle Unicode was Plan9, more than a quarter century ago, and it explicitly uses bytes in all the apis, despite the fact that back then it was strongly believed that there would never be more than 2^16 Unicode code points. This is because it was written by people who were uninterested in scoring politically-correct points, and realized that even if ASCII had some "advantage" by being shorter, this actually was a benefit to all users of all languages by getting rid of "encodings" immediately. We still have "encodings" in Windows and Unix decades later, showing how correct they were. > Nothing prevents you from adding a \x80 at the end of the file name in > the YAML file, using VI or Notepad or whatever. I bet that some YAML > libraries will even load it into a non-valid UTF-8 "string" in memory, > "illegal" though it may be. No, the YAML 1 parser will expand "\x80" to the equivalent of "\xC2\x80" in my proposed YAML 2, which is not distinguishable from other sequences that are allowed in valid UTF-8. Therefore this will not work. > But does this mean we need to mandate that all YAML implementations > silently create invalid UTF-8 strings in memory? That seems a bit > excessive... To exactly work as I defined, you must use an internal format that can store all valid Unicode code points, plus 128 invalid UTF-8 bytes and 2048 invalid UTF-16 words. For instance UTF-32 where the codes for the surrogate halves are used for invalid UTF-16, and 128 codes greater than 0x10FFFF are used for invalid UTF-8. You CANNOT use UTF-16 internally, because there is a back-compatibility requirement that two \uNNNN in a row that happen to form a valid high+low surrogate pair must turn into six UTF-8 bytes rather than 4. This rules out using UTF-16 as an intermediate form as there are not enough different code units to store this information and distinguish it from valid non-BMP characters. The same argument may seem to rule out using UTF-8 internally, but I think it can be acceptable that the result of several \xNN that happen to be arranged as a valid UTF-8 character encoding is *undefined*. An implementation can return either a single Unicode character for this or N code units when UTF-16 is asked for. We could also require that the result be a single Unicode character, although that may *force* backends to use UTF-8 in any practical implementation, as you were worried about. > If I had to deal with this use case, I'd use something like: > > filename: !badstr BadName\x80 This requires use of something other than backslash, as the YAML parser interprets the backslash in a lossy method by which this data cannot be recovered (you may think this is false, but you are thinking about "double encoding" which restricts the strings to ISO-8859-1). '%' is often suggested, but then real '%' has to be quoted and this in fact makes the YAML definition much more complicated. Also this will force a huge amount of other (though potentially desirable) changes to YAML syntax to unify the URL and scalar rules. My other complaint about any such suggestion is that any program outputting YAML will probably just put "!badstr" in front of EVERY instance of the strings, rather than doing a test, and thus defeating your proposed purpose. The test will still be to see if the % sequence is in the string, which is no different than looking if a \x is in the string. |
From: Oren Ben-K. <or...@be...> - 2011-10-31 20:52:16
|
The YAML parser will only expand \x80 if it was inside double quotes. My example did not use double quotes: foo: !badstr BadName\x80 Which is different from: foo: !badstr "BadName\x80" In the former case, the scalar is passed as-is (well, subject to plain scalar processing such as folding, which do not include expanding escape codes) to the application - specifically, by using whatever rules the badstr tag has set up. Just like the int tag knows how to convert the characters 1 0 into an integer with the decimal value 10, a badstr tag would know to convert the characters \ x 8 0 to a single byte with hexadecimal value 0x80. In both these cases, this isn't done by the YAML parser itself. Escape sequences inside double quotes are handled by the YAML parser itself, outside the application tag control. It is interesting that Plan9 also does not enforce validity. FWIW, I think that Windows allowing invalid UTF-16 in file names is a travesty, but that's just me (I also think that windows should have made the space character equivalent to underscore in file names, the same way that it makes capital and lower case letters equivalent, so I could easily type in /.../program_files/... in a command line argument. But I digress). This is neither here nor there. As you correctly point out, most systems use an actual "must be valid str" internal representation for the data type "string"; they tend to throw exceptions and so on if seeing stuff like invalid high+low surrogate pairs and so on. !!str is meant to be loaded into exactly that - simple UTF-16 or UTF-8 internal "string" representation, which is rather fussy about validity in many systems. As you point out, for your use case you need a different internal representation that can handle all valid Unicode data point + some invalid ones. I suggested (a bit tongue-in-cheek) to tag this other, not-quite-a-normal-string internal representation "badstr". A better name would be "binstr" - A string that also contains "binary data" (arbitrary bytes). Using an explicit binstr tag indicates to the YAML parser to "please load this into a String-like internal data type that also supports the additional invalid code points". This may or may not be the same internal data type that normal strings are loaded into - depending on how fussy your system's string data type is. Perhaps in Plan9, these might be the same internal data type, while in Java they might be different data types. If we allowed all normal strings to contain arbitrary bytes, this means that we wouldn't be able to load them into the normal fussy-about-validity internal string type. We would be forced to load everything into the less-usual, not-what-you'd-expect less-fussy string-like data type; this isn't a defensible design choice for languages whose standard string type is fussy about validity - YAML strings would lose interoperability with the rest of the system. On output, I assume that an application would strive to use !!str rather than !!binstr as much as possible. But that's really an application decision, and not the YAML library's choice. E.g., a hypothetical Windows-file-names dumping application would only resort to emitting !!binstr if the filename was, indeed, a string-with-weird-characters as opposed to a normal string. This would allow it to be safely loaded into (say) a Java application without causing an exception, while minimizing the pain to the human YAML reader., Have fun, Oren Ben-Kiki |
From: William S. <sp...@rh...> - 2011-10-31 18:17:40
|
Devin Jeanpierre wrote: >> There needs to be a way to put an arbitrary byte sequence into a scalar >> without losing the ability to make valid byte sequences human-readable. > > Mixing byte data into what is ostensibly unicode seems like a bad > idea. Either have a particular part of the document be unicode, or > bytes, not both. As somebody else pointed out, my suggestion involves putting various combinations of the '\' and 'x' and 'u' and the hex digits into the file. All of these are valid Unicode. That said, it would also be convenient for parsers to at least have a switch by which they will accept all non-ASCII bytes in a scaler absolutely unchanged. This will make it possible to read bad YAML files produced by programs that assumed all the input was valid UTF-8 and wrote it literally to the output file. This is a common problem with hand-editing when the character set is wrong so it should be supported. However I knew that would be controversial so I did not propose it as well. I am willing to patch YAML to a non-standard to support this. |
From: Oren Ben-K. <or...@be...> - 2011-10-29 07:43:38
|
Just to be clear - we are talking about allowing \xXX and \uXXXX with arbitrary XX values, regardless of the result is a valid Unicode point; as opposed to allowing arbitrary unescaped bytes in the YAML stream itself (which would make it unreadable/uneditable). If so, then Peter has a good point - what is wrong with Base64 encoding? That is, what is the use case where most of the data is valid Unicode, but it is sprinkled with an occasional arbitrary binary data? I'll grant you that if you need such data, it is way more readable to use \x and/or \u escape sequences. But what kind of data is this? Does such data get loaded into a normal string type in the application, or into some sort of a binary buffer type? Note YAML makes it possible for you to use a tagged unquoted scalar containing any escape mechanism you want, e.g.: foo: !bar Baz \xXX \uUUUU #XXXX or whatever. You may also come up with a way to use implicit tagging to avoid the need for an explicit tag. I'm still unclear what this would be used for... Have fun, Oren Ben-Kiki. On Sat, Oct 29, 2011 at 7:53 AM, Peter Murphy <pet...@gm...>wrote: > William, > > There's already a binary tag in YAML, which allows arbitrary binary > data to be encoded as base64. > > http://yaml.org/type/binary.html > > Is there any reason why that's not good enough for your needs? > > Best regards, > Peter > > > On Sat, Oct 29, 2011 at 3:03 PM, Ingy dot Net <in...@in...> wrote: > > Hi Devin, > > > > I added your comments to > > https://github.com/yaml2/YAML2/wiki/Unicode-Strictness > > > > This topic seems like it would be more poignant if people actually wrote > > some test cases about what they were talking about. It's so easy to be > > misunderstood on this type of issue. But with actual test files, it's > much > > less so. > > > > The best way to do this would be to simply create a repo on github and > push > > some files up. Even if it's not completely running code, it would be > > helpful. Of course, running code in something like pyyaml would be even > > better. > > > > I really don't want to have YAML2 discussions, without actual tests to > make > > people's points. William, it would be great if you could post some files > > that elegantly show off your concerns. Otherwise it just feels like > > conjecture. > > > > Ingy > > > > On Fri, Oct 28, 2011 at 7:54 PM, Devin Jeanpierre < > jea...@gm...> > > wrote: > >> > >> > There needs to be a way to put an arbitrary byte sequence into a > scalar > >> > without losing the ability to make valid byte sequences > human-readable. > >> > >> Mixing byte data into what is ostensibly unicode seems like a bad > >> idea. Either have a particular part of the document be unicode, or > >> bytes, not both. > >> > >> That said, if you mix unicode and bytes in the same file, it ceases to > >> be exactly readable in a standard text editor. So I don't like that > >> either. Maybe two separate types of YAML file (text / > >> binary/compressed)? more than one protocol has had a "binary" version > >> made of it. > >> > >> Devin > >> > >> On Fri, Oct 28, 2011 at 6:49 PM, William Spitzak <sp...@rh...> > >> wrote: > >> > PLEASE!!! This is the main reason we cannot use unaltered YAML: > >> > > >> > SUPPORT FOR INVALID UTF-8 AND UTF-16 > >> > > >> > There needs to be a way to put an arbitrary byte sequence into a > scalar > >> > without losing the ability to make valid byte sequences > human-readable. > >> > > >> > Currently YAML is limited to only putting byte sequences that are > valid > >> > UTF-8 into scalars, unless some transformation is done that makes some > >> > (often all) Unicode unreadable in the YAML input. This has the > >> > counter-productive effect of *discouraging* use of Unicode on any > >> > backend that uses bytes where there is no guarantee that the backend > >> > limits the byte sequences to valid UTF-8. Examples are all byte-based > >> > file formats, most internet protocols, Unix filenames, and Windows > >> > resource identifiers. > >> > > >> > My recommendation is here. However any solution that allows an > arbitrary > >> > byte stream to be produced, while allowing valid UTF-8 bytes to be > >> > represented by the correct Unicode character in the YAML source, would > >> > be acceptable: > >> > > >> > 1. The backslash escape of \xNN represents a "raw UTF-8 byte" with the > >> > given value. This is only different from current YAML for 0x80-0xFF. > The > >> > sequence \u00NN must be used for actual Unicode code points in this > >> > range. > >> > > >> > 2. An api that requests YAML scalars as UTF-8 gets these bytes, > inserted > >> > between the UTF-8 encoding of all other characters, as raw data. > >> > > >> > 3. An api that requests scalers in some other form, such as UTF-16, > gets > >> > these bytes as unchanged code units. This makes \xNN work identically > to > >> > current YAML/JSON when the UTF-16 api is used. It may also allow > invalid > >> > forms of other encodings to be supported. > >> > > >> > In addition invalid UTF-16 must also be supported. Support of invalid > >> > UTF-16 is more common, due to it's use on Windows and therefore the > >> > realization by otherwise ignorant programmers of the inability to work > >> > without supporting them. Technically the YAML spec does not allow > >> > invalid UTF-16, but my proposal here formalizes the actual support > that > >> > is in most (all?) YAML and JSON implementations: > >> > > >> > 1. The backslash escape of \uNNNN for NNNN in the range 0xD800..0xDFFF > >> > represents a "raw UTF-16 code unit". > >> > > >> > 2. An api that requests UTF-16 or other 16-bit code units will get > these > >> > codes unchanged. > >> > > >> > 3. An api that requests bytes will get 3 for each of these, these > three > >> > bytes match the encoding you get from UTF-8 if you extend it to these > >> > invalid code points. > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > The demand for IT networking professionals continues to grow, and the > >> > demand for specialized networking skills is growing even more rapidly. > >> > Take a complimentary Learning@Cisco Self-Assessment and learn > >> > about Cisco certifications, training, and career opportunities. > >> > http://p.sf.net/sfu/cisco-dev2dev > >> > _______________________________________________ > >> > Yaml-core mailing list > >> > Yam...@li... > >> > https://lists.sourceforge.net/lists/listinfo/yaml-core > >> > > >> > >> > >> > ------------------------------------------------------------------------------ > >> Get your Android app more play: Bring it to the BlackBerry PlayBook > >> in minutes. BlackBerry App World™ now supports Android™ Apps > >> for the BlackBerry® PlayBook™. Discover just how easy and > simple > >> it is! http://p.sf.net/sfu/android-dev2dev > >> _______________________________________________ > >> Yaml-core mailing list > >> Yam...@li... > >> https://lists.sourceforge.net/lists/listinfo/yaml-core > > > > > > > ------------------------------------------------------------------------------ > > Get your Android app more play: Bring it to the BlackBerry PlayBook > > in minutes. BlackBerry App World™ now supports Android™ Apps > > for the BlackBerry® PlayBook™. Discover just how easy and simple > > it is! http://p.sf.net/sfu/android-dev2dev > > > > _______________________________________________ > > Yaml-core mailing list > > Yam...@li... > > https://lists.sourceforge.net/lists/listinfo/yaml-core > > > > > > > > -- > Email: pet...@gm... > WWW: http://www.pkmurphy.com.au/ > |
From: Oren Ben-K. <or...@be...> - 2011-10-31 08:17:14
|
There is one notable disadvantage to using Base64 encoding: it makes it impossible for a human reading the file to tell what the value of the Nth byte is. Using \uXXXX notation, this is trivial. Of course, using \uXXXX notation is also inefficient and abusing the notation. Is it "useful" (in the real world) to have an alternative presentation for !!bin blobs, such that the data is presented as XX XX XX XX XX bytes? Would this be addressing the need that lead to the request for arbitrary \xXX and \uXXXX? Something like: blob: !!bin <prefix-TBD> A0B3 C4 DA BC12 (Hexadecimal, two chars per byte, white space is ignored). If this is the case, it is trivial to add it. Have fun, Oren Ben-Kiki On Sat, Oct 29, 2011 at 9:43 AM, Oren Ben-Kiki <or...@be...> wrote: > Just to be clear - we are talking about allowing \xXX and \uXXXX with > arbitrary XX values, regardless of the result is a valid Unicode point; as > opposed to allowing arbitrary unescaped bytes in the YAML stream itself > (which would make it unreadable/uneditable). > > If so, then Peter has a good point - what is wrong with Base64 encoding? > That is, what is the use case where most of the data is valid Unicode, but > it is sprinkled with an occasional arbitrary binary data? I'll grant you > that if you need such data, it is way more readable to use \x and/or \u > escape sequences. > > But what kind of data is this? Does such data get loaded into a normal > string type in the application, or into some sort of a binary buffer type? > > Note YAML makes it possible for you to use a tagged unquoted scalar > containing any escape mechanism you want, e.g.: > > foo: !bar Baz \xXX \uUUUU #XXXX or whatever. > > You may also come up with a way to use implicit tagging to avoid the need > for an explicit tag. I'm still unclear what this would be used for... > > Have fun, > > Oren Ben-Kiki. > > > On Sat, Oct 29, 2011 at 7:53 AM, Peter Murphy <pet...@gm...>wrote: > >> William, >> >> There's already a binary tag in YAML, which allows arbitrary binary >> data to be encoded as base64. >> >> http://yaml.org/type/binary.html >> >> Is there any reason why that's not good enough for your needs? >> >> Best regards, >> Peter >> >> >> On Sat, Oct 29, 2011 at 3:03 PM, Ingy dot Net <in...@in...> wrote: >> > Hi Devin, >> > >> > I added your comments to >> > https://github.com/yaml2/YAML2/wiki/Unicode-Strictness >> > >> > This topic seems like it would be more poignant if people actually wrote >> > some test cases about what they were talking about. It's so easy to be >> > misunderstood on this type of issue. But with actual test files, it's >> much >> > less so. >> > >> > The best way to do this would be to simply create a repo on github and >> push >> > some files up. Even if it's not completely running code, it would be >> > helpful. Of course, running code in something like pyyaml would be even >> > better. >> > >> > I really don't want to have YAML2 discussions, without actual tests to >> make >> > people's points. William, it would be great if you could post some files >> > that elegantly show off your concerns. Otherwise it just feels like >> > conjecture. >> > >> > Ingy >> > >> > On Fri, Oct 28, 2011 at 7:54 PM, Devin Jeanpierre < >> jea...@gm...> >> > wrote: >> >> >> >> > There needs to be a way to put an arbitrary byte sequence into a >> scalar >> >> > without losing the ability to make valid byte sequences >> human-readable. >> >> >> >> Mixing byte data into what is ostensibly unicode seems like a bad >> >> idea. Either have a particular part of the document be unicode, or >> >> bytes, not both. >> >> >> >> That said, if you mix unicode and bytes in the same file, it ceases to >> >> be exactly readable in a standard text editor. So I don't like that >> >> either. Maybe two separate types of YAML file (text / >> >> binary/compressed)? more than one protocol has had a "binary" version >> >> made of it. >> >> >> >> Devin >> >> >> >> On Fri, Oct 28, 2011 at 6:49 PM, William Spitzak <sp...@rh...> >> >> wrote: >> >> > PLEASE!!! This is the main reason we cannot use unaltered YAML: >> >> > >> >> > SUPPORT FOR INVALID UTF-8 AND UTF-16 >> >> > >> >> > There needs to be a way to put an arbitrary byte sequence into a >> scalar >> >> > without losing the ability to make valid byte sequences >> human-readable. >> >> > >> >> > Currently YAML is limited to only putting byte sequences that are >> valid >> >> > UTF-8 into scalars, unless some transformation is done that makes >> some >> >> > (often all) Unicode unreadable in the YAML input. This has the >> >> > counter-productive effect of *discouraging* use of Unicode on any >> >> > backend that uses bytes where there is no guarantee that the backend >> >> > limits the byte sequences to valid UTF-8. Examples are all byte-based >> >> > file formats, most internet protocols, Unix filenames, and Windows >> >> > resource identifiers. >> >> > >> >> > My recommendation is here. However any solution that allows an >> arbitrary >> >> > byte stream to be produced, while allowing valid UTF-8 bytes to be >> >> > represented by the correct Unicode character in the YAML source, >> would >> >> > be acceptable: >> >> > >> >> > 1. The backslash escape of \xNN represents a "raw UTF-8 byte" with >> the >> >> > given value. This is only different from current YAML for 0x80-0xFF. >> The >> >> > sequence \u00NN must be used for actual Unicode code points in this >> >> > range. >> >> > >> >> > 2. An api that requests YAML scalars as UTF-8 gets these bytes, >> inserted >> >> > between the UTF-8 encoding of all other characters, as raw data. >> >> > >> >> > 3. An api that requests scalers in some other form, such as UTF-16, >> gets >> >> > these bytes as unchanged code units. This makes \xNN work >> identically to >> >> > current YAML/JSON when the UTF-16 api is used. It may also allow >> invalid >> >> > forms of other encodings to be supported. >> >> > >> >> > In addition invalid UTF-16 must also be supported. Support of invalid >> >> > UTF-16 is more common, due to it's use on Windows and therefore the >> >> > realization by otherwise ignorant programmers of the inability to >> work >> >> > without supporting them. Technically the YAML spec does not allow >> >> > invalid UTF-16, but my proposal here formalizes the actual support >> that >> >> > is in most (all?) YAML and JSON implementations: >> >> > >> >> > 1. The backslash escape of \uNNNN for NNNN in the range >> 0xD800..0xDFFF >> >> > represents a "raw UTF-16 code unit". >> >> > >> >> > 2. An api that requests UTF-16 or other 16-bit code units will get >> these >> >> > codes unchanged. >> >> > >> >> > 3. An api that requests bytes will get 3 for each of these, these >> three >> >> > bytes match the encoding you get from UTF-8 if you extend it to these >> >> > invalid code points. >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> >> > The demand for IT networking professionals continues to grow, and the >> >> > demand for specialized networking skills is growing even more >> rapidly. >> >> > Take a complimentary Learning@Cisco Self-Assessment and learn >> >> > about Cisco certifications, training, and career opportunities. >> >> > http://p.sf.net/sfu/cisco-dev2dev >> >> > _______________________________________________ >> >> > Yaml-core mailing list >> >> > Yam...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/yaml-core >> >> > >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Get your Android app more play: Bring it to the BlackBerry PlayBook >> >> in minutes. BlackBerry App World™ now supports Android™ Apps >> >> for the BlackBerry® PlayBook™. Discover just how easy and >> simple >> >> it is! http://p.sf.net/sfu/android-dev2dev >> >> _______________________________________________ >> >> Yaml-core mailing list >> >> Yam...@li... >> >> https://lists.sourceforge.net/lists/listinfo/yaml-core >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Get your Android app more play: Bring it to the BlackBerry PlayBook >> > in minutes. BlackBerry App World™ now supports Android™ Apps >> > for the BlackBerry® PlayBook™. Discover just how easy and >> simple >> > it is! http://p.sf.net/sfu/android-dev2dev >> > >> > _______________________________________________ >> > Yaml-core mailing list >> > Yam...@li... >> > https://lists.sourceforge.net/lists/listinfo/yaml-core >> > >> > >> >> >> >> -- >> Email: pet...@gm... >> WWW: http://www.pkmurphy.com.au/ >> > > |
From: William S. <sp...@rh...> - 2011-10-31 21:14:43
|
I think this is an interesting idea, and did not realize that backslash was allowed outside quotes in YAML. The big problem is that by not allowing quotes it also means the backslash sequence must be used for a lot of valid UTF-8 as well. Whitespace and quotes would have to be replaced with unreadable \xNN substitutes. All YAML delimiters would have to be replaced with unreadable substitutes unless the parser is altered so things like "\," are preserved. "\\" however would work, interestingly enough. Interpreters will need to check if the input is quoted, not all YAML api's provide this. This is needed or just quoting would quadruple the number of backslashes and no error is produced by YAML if this is not done. It may also be necessary to implement all YAML escapes such as "\b". Oren Ben-Kiki wrote: > The YAML parser will only expand \x80 if it was inside double quotes. My > example did not use double quotes: > > foo: !badstr BadName\x80 > > Which is different from: > > foo: !badstr "BadName\x80" |
From: Oren Ben-K. <or...@be...> - 2011-11-01 05:53:44
|
Yes, you'd need to write \\ to insert a literal \. In general, I assume you'd expand the same escape sequences as these that are automatically expanded by the YAML parser for double-quoted scalars. YAML parsers already handle quoted and unquoted scalars in different ways - this is part of the YAML spec (double-quoted vs. plain vs. literal vs. folded scalars). All YAML parsers must do this. API-wise, you'd need to set up support for the binstr tag. We could add it to the official type repository as !!binstr. Have fun, Oren Ben-Kiki On Mon, Oct 31, 2011 at 11:14 PM, William Spitzak <sp...@rh...>wrote: > I think this is an interesting idea, and did not realize that backslash > was allowed outside quotes in YAML. > > The big problem is that by not allowing quotes it also means the backslash > sequence must be used for a lot of valid UTF-8 as well. Whitespace and > quotes would have to be replaced with unreadable \xNN substitutes. All YAML > delimiters would have to be replaced with unreadable substitutes unless the > parser is altered so things like "\," are preserved. "\\" however would > work, interestingly enough. > > Interpreters will need to check if the input is quoted, not all YAML api's > provide this. This is needed or just quoting would quadruple the number of > backslashes and no error is produced by YAML if this is not done. > > It may also be necessary to implement all YAML escapes such as "\b". > > > Oren Ben-Kiki wrote: > >> The YAML parser will only expand \x80 if it was inside double quotes. My >> example did not use double quotes: >> >> foo: !badstr BadName\x80 >> >> Which is different from: >> >> foo: !badstr "BadName\x80" >> > |