4.1.1 Tokenization

Status: Beta

Brought to you by: matt597890

#1 4.1.1 Tokenization

Milestone: Partially_Completed

Status: open

Owner: nobody

Labels: Syntax (14)

Priority: 5

Updated: 2005-08-06

Created: 2005-08-06

Creator: Matt

Private: No

All levels of CSS -- level 1, level 2, and any future levels -
- use the same core syntax. This allows UAs to parse
(though not completely understand) style sheets written
in levels of CSS that didn't exist at the time the UAs
were created. Designers can use this feature to create
style sheets that work with older user agents, while also
exercising the possibilities of the latest levels of CSS.

At the lexical level, CSS style sheets consist of a
sequence of tokens. The list of tokens for CSS2 is as
follows. The definitions use Lex-style regular
expressions. Octal codes refer to ISO 10646
([ISO10646]). As in Lex, in case of multiple matches,
the longest match determines the token. Token
Definition

--------------------------------------------------------------------------------

IDENT {ident}
ATKEYWORD @{ident}
STRING {string}
HASH #{name}
NUMBER {num}
PERCENTAGE {num}%
DIMENSION {num}{ident}
URI url${w}{string}{w}$
|url${w}([!#$%&*-~]|{nonascii}|{escape})*{w}$
UNICODE-RANGE U\+[0-9A-F?]{1,6}(-[0-9A-F]{1,6})?
CDO 
; ;
{ \{
} \}
( $
) $
[ \[
] \]
S [ \t\r\n\f]+
COMMENT \/\*[^*]*\*+([^/][^*]*\*+)*\/
FUNCTION {ident}\(
INCLUDES ~=
DASHMATCH |=
DELIM any other character not matched by the above
rules

The macros in curly braces ({}) above are defined as
follows: Macro Definition

--------------------------------------------------------------------------------

ident {nmstart}{nmchar}*
name {nmchar}+
nmstart [a-zA-Z]|{nonascii}|{escape}
nonascii [^\0-\177]
unicode \\[0-9a-f]{1,6}[ \n\r\t\f]?
escape {unicode}|\\[ -~\200-\4177777]
nmchar [a-z0-9-]|{nonascii}|{escape}
num [0-9]+|[0-9]*\.[0-9]+
string {string1}|{string2}
string1 \"([\t !#$%&(-~]|\\{nl}|\'|{nonascii}|{escape})*\"
string2 \'([\t !#$%&(-~]|\\{nl}|\"|{nonascii}|{escape})*\'
nl \n|\r\n|\r|\f
w [ \t\r\n\f]*

Below is the core syntax for CSS. The sections that
follow describe how to use it. Appendix D describes a
more restrictive grammar that is closer to the CSS level
2 language.

stylesheet : [ CDO | CDC | S | statement ]*;
statement : ruleset | at-rule;
at-rule : ATKEYWORD S* any* [ block | ';' S* ];
block : '{' S* [ any | block | ATKEYWORD S* | ';' ]
* '}' S*;
ruleset : selector? '{' S* declaration? [ ';' S*
declaration? ]* '}' S*;
selector : any+;
declaration : property ':' S* value;
property : IDENT S*;
value : [ any | block | ATKEYWORD S* ]+;
any : [ IDENT | NUMBER | PERCENTAGE |
DIMENSION | STRING
| DELIM | URI | HASH | UNICODE-RANGE |
INCLUDES
| FUNCTION | DASHMATCH | '(' any* ')' | '['
any* ']' ] S*;

COMMENT tokens do not occur in the grammar (to
keep it readable), but any number of these tokens may
appear anywhere between other tokens.

The token S in the grammar above stands for
whitespace. Only the characters "space" (Unicode code
32), "tab" (9), "line feed" (10), "carriage return" (13),
and "form feed" (12) can occur in whitespace. Other
space-like characters, such as "em-space" (8195)
and "ideographic space" (12288), are never part of
whitespace.

4.1.1 Tokenization

Group

Searches

Help

#1 4.1.1 Tokenization

Discussion