On Sun, Dec 29, 2002 at 12:11:46PM +0200, Oren Ben-Kiki wrote:
> Michael G Schwern wrote:
> > It seems odd that _ is not included in the word_char
> > definition, yet - is, especially given that the production
> > names themselves use underscores so heavily. Typo?
> No, there's a reason for it. 'word_char' is used in several productions:
> - Domain names
> - Language names (special case of domain names)
> - Directive names
> - Anchor name
> Domain names must not contain '_', hence the 'word_char' production is
> restrictive. However, the use of 'word_char' for directives and anchors
> is somewhat arbitrary.
Putting restrictions on YAML just to make the spec a little simpler seems,
yes, arbitrary is the right word. :) Probably better to seperate out the
domain/language name set rather than have it drag the rest down.
> Technically, there's reason why we couldn't use 'flow_non_space' instead
> (actually, for directives, 'flow_non_space - mapping_entry_separator').
That would work, too. It has the advantage of allowing Unicode characters
in directive and anchor names.
While we're on the subject, the domain name syntax is slightly more
restrictive than you've got in YAML. It must begin with a letter and cannot
end with a hyphen.
From RFC 1035
<domain> ::= <subdomain> | " "
<subdomain> ::= <label> | <subdomain> "." <label>
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>
<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case
<digit> ::= any one of the ten digits 0 through 9
Taking all that into account, the altered productions might look something
directive_name ::= (flow_non_space - mapping_entry_separator)+
anchor ::= flow_non_space+
domain_char ::= ascii_letter | decimal_digit | '-'
domain_name ::= subdomain_name ( '.' subdomain_name )+
subdomain_name ::= ascii_letter ( domain_char*
( domain_char - '-' )
domain_family ::= ( domain_name |
( '-' domain_day_month
( '-' domain_day_month )?
language_family ::= ( domain_char+ '/' trans_char* ) |
( prefix-of-above? prefix_separator suffix-of-above )
and word_char, no longer used, evaporates.
> > I ran into this while working on transfering the spec to
> > YAML. I found it useful to just take any HTML anchors and
> > just stick a * in front of the name to make it an alias but
> > ran into a problem with something as simple as *map_in_seq.
> Interesting. Note however that a fragment identifier may contain many
> other non-word characters; I don't see much point in singling out '_'.
_ is the classic way to encode a space when you can't use a space. Its just
the standard *this_is_a_variable_name style instead of *StudlyCaps, which
the omission is so glaring compared to other non-word characters.
Also, it comes so naturally because of the \w Perl regex character set which
is the same as [A-Za-z0-9_].
Michael G. Schwern <schwern@...> http://www.pobox.com/~schwern/
Perl Quality Assurance <perl-qa@...> Kwalitee Is Job One
What is a classy place like this doing around a low-life like you?