|
From: Maxwell C. <ma...@um...> - 2007-09-15 05:48:07
|
Yep, that is (most) of the basic information on the grammar. Lost count
of how many times I've read those pages :-).
You seem to have gotten understandably a little lost on what exactly the
CSS grammar is. It's really not a straightforward question. There's
really 3 different grammars we're discussing here.
1. The underlying CSS grammar. This describes a few very basic CSS
structures. It is pretty much only used for error handling, and will be
a small part of the real parser as kind of an underlying layer. You
can't actually get any useful information from parsing CSS this way.
This is the part you're referring to that ostensibly never changes
between different versions of CSS (in real life it does).
2. The stated grammar for each version. The CSS spec for 2.1 and 3 both
contain a grammar in pseudo-EBNF. With this you can parse well enough
to get information to construct most of a CSSOM tree if you follow their
recommendations on the CSSOM (more on this in my real message). Most of
the parsers I've really dug into roughly follow this, as does the parser
I wrote for UMM. However, this still isn't a complete enough
description of the grammar to get all of the information that Themis
needs. It contains only a general grammar for individual property
values, which isn't enough if we intend to actually use this CSS to
display a document.
Usually what is done is that a parser is written for this grammar that
is officially called the CSS parser. Unfortunately you then have to
graft somewhere in your program code that checks that the parsed
expression for a property value is actually valid for that property and
extracts the information in some useful form, which ends up doing most
of the work of a parser, and badly.
(We already know this grammar is LL(1). In fact, CSS2.1 syntax is a
regular language, unless I missed something. The parser I've written so
far for Synura is a DFA-which is why I'm throwing it out to upgrade to
CSS3.)
3. The complete grammar. The language described by this grammar should
be exactly equal to the language of stylesheets which correctly follow
*every* part of the spec (or as close as you can get while keeping the
language context-free). It'll contain a rule like:
color_value: color_keyword |
HASH |
'rgb(' NUMBER COMMA NUMBER COMMA NUMBER ')'
The distinction between #2 and #3 is really important for the kind of
objects your parser spits out. If your parser doesn't go beyond #2, for
the color value 'rgb(255, 255, 255)', it'll spit out a tree of objects
like:
Expression
Term
Function
Expression
Number
Comma
Number
Comma
Number
(see libcroco* for a library that outputs stuff like this. Synura was
originally intended to be a layer on top of libcroco that actually made
it usable)
If a parser/CSSOM package recognizes grammar #3 it can spit out
something like
class Color
{
int red, green, blue;
};
for the same input, which is clearly far more useful.
The grammar that I'm studying, and hoping to write a parser for, is #3.
In practice everybody uses a grammar and object model somewhere between
#2 and #3, but I'm hoping to get closer to #3 than most people.
The tasks required to complete an analysis are:
1. Write a CFG for #3, preferably in a form that can be read by
automated tools.
2. See if the grammar is LL(1), LALR(1), or whatever, switching between
left & right recursion and stuff as necessary. (And yes, the CSS spec
does mention these for the stated grammars, but it appears to be
non-normative and the stated grammars have all the above shortcomings).
I'd like to tackle task #1 by myself at first, but you're of course
welcome to try your hand at it too. As far the second task, I could
really benefit from some software that will automagically figure this
stuff out for me. SLK will do it for 'strong' LL(k), but I'd like a
more general tool in addition. ANTLR might do it for LL(k), but I can't
get it to run. If we can't find tools to do it, I'm in the process of
writing some quick-and-dirty scripts that might be able to figure it
out.**
Again, once I'm done with this I can write a the listserve with a
complete write-up on all our options for getting a modern CSS parser.
I'll have 3 components:
1. What needs to be done to update the current CSS code in Themis to
work with a more modern version of CSS and really form a solid
foundation for future work.
Themis has the most advanced parser I've seen in a long while, but is
unfortunately completely handwritten. After I've finished my research
some people more involved in the project than me should have a
discussion on whether it is worth the effort to maintain and update a
handwritten parser.
2. Our options for other CSS libraries and code Themis can take (haven't
looked at UZI but most of the stuff out there-outside of the big stuff
like Firefox and Prince-is pretty dismal).
3. What I will write if I decide to really let loose and create the
perfect CSS parser from scratch, maybe using a few pieces of code from
Themis and Synura's old parser.
* To be fair to libcroco, it handled the 'color' property just fine - it
was other properties and some other related shortcomings that made me
come to find that library limiting.
** The omission of flex/bison here is completely intentional - I spent a
few months going down that road. Of course these tools can
theoretically parse CSS grammar, but in practice it becomes unspeakable
ugly due to the aforementioned quirks of CSS syntax. The prevalence of
hand-written parsers suggests most people agree with me.
On Sat, 2007-09-15 at 02:23 +0000, Ar...@co... wrote:
> > I'm currently doing a more in-depth analysis of the CSS grammar
> > (currently trying to determine if the CSS3 grammar is LL(1)), so I can
> > better judge the available parsers. There is a number of really odd
> Not sure how much this helps:
>
> CSS 2.1 Revision 1 Parsing rules:
> http://www.w3.org/TR/2007/CR-CSS21-20070719/syndata.html#syntax
> All futures versions of CSS will conform to these parsing rules (including CSS 3 -- so says the spec at least :) )
> This page has some additional notes about the grammar:
> http://www.w3.org/TR/2007/CR-CSS21-20070719/grammar.html
> It is in LALR(1) but suggests not to use it in the real world, since it doesn't conform perfectly to the parsing standards.
>
> For CSS3 selectors, the specs has an LL(1) sample here:
> http://www.w3.org/TR/css3-selectors/#w3cselgrammar
> Again it states: "but note that most UA's should not use it directly, since it doesn't express the parsing conventions"
>
> Also, here's something interesting.... An older working draft of CSS 2 has a sample using LL1:
> http://www.w3.org/TR/2002/WD-CSS21-20020802/grammar.html
>
>
> From what I can make out, it needs to follow the parsing conventions for CSS and that part will never change on any version of CSS. However, beyond that I don't really know what's going on. Does the spec require the browser to use LALR(1) for CSS 2.1 or is that merely there as an example?
>
> If you could help explain a few things to me, maybe I could ask around for you.
>
> Thanks,
> Kevin
> -------------- Original message ----------------------
> From: Maxwell Collins <ma...@um...>
> > Yes, I've been following the mailing list.
> >
> > I'm currently doing a more in-depth analysis of the CSS grammar
> > (currently trying to determine if the CSS3 grammar is LL(1)), so I can
> > better judge the available parsers. There is a number of really odd
> > properties of the CSS grammar, and one of the most important properties
> > of a CSS parser is how it handles these. I'm going to completely
> > analyze all of the oddities of the CSS 2.1 grammar (which I'm familiar
> > with), then try to see if I can anticipate any similar issues that may
> > have arisen with some of the new CSS3 stuff.
> >
> > Once I've completed that, I'll send my thoughts on all of the available
> > CSS Parsers/CSSOM packages (including some beyond the three you listed).
> >
> > On Fri, 2007-09-14 at 06:32 +0000, Ar...@co... wrote:
> > > Hey,
> > > Are you still following the mailing list?
> > > I posted something about a person working on a project called UZI.... Have
> > you looked at any of the stuff he has to see if it might be something worth
> > considering for Themis?
> > >
> > > Anyways, what's your thoughts on CSS (if you've gotten a chance to look)?
> > Does the current CSS look like it's a good start, does it look like the
> > foundation is bad and that it would just be better to start with a new CSS
> > parser (like yours). I know you were interested in the particular area of CSS,
> > but the reason I'm contacting you is to see how UZI mixes into things. In
> > comparing the 3: Themis' CSS, your CSS, or UZI's CSS, which appears to have the
> > more solid foundation (regardless of whether it's even close to feature
> > complete)... and which is closer to feature complete?
> > > Of course, what involvement UZI might have in the Themis project is uncertain
> > at this point, but I didn't want their to be a conflict between what you were
> > interested in and that project (since that project also include CSS stuff).
> > >
> > > Kevin
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Microsoft
> > Defy all challenges. Microsoft(R) Visual Studio 2005.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > Themis-dev mailing list
> > The...@li...
> > https://lists.sourceforge.net/lists/listinfo/themis-dev
>
|