From: Stefan S. <ste...@or...> - 2003-01-31 19:05:47
|
hi there, libxml++ currently defines an abstract 'Parser' interface, which is implemented once to provide a DOM parser, and once for SAX. I'm wondering how useful such ab abstraction is. The implementation is completely different, and even for users there isn't anything the two have in common, i.e. the semantics of 'parse file "foo"' is very different. I'v (locally) introduced a 'Document' class, which is created by a 'create_document_from[file, stream, memory]' factory function. Note that this is really a function, as there isn't any state involved. Either you end up with a document, or you don't (an exception may be thrown to indicate failure). What do you think about such a change ? Do we still need an abstract Parser class ? Regards, Stefan |
From: Christophe de V. <cde...@al...> - 2003-02-01 01:02:34
|
Le Vendredi 31 Janvier 2003 20:06, Stefan Seefeld a =E9crit : > hi there, hi > > libxml++ currently defines an abstract 'Parser' interface, > which is implemented once to provide a DOM parser, and once > for SAX. > > I'm wondering how useful such ab abstraction is. The implementation > is completely different, and even for users there isn't anything > the two have in common, i.e. the semantics of 'parse file "foo"' is > very different. > The main interest I see in keeping an abstraction for parser is that they h= ave=20 common options. One thing I want to be done on it is to be able to configur= e=20 the parsers through this abstact Parser class. The options I'm refering to= =20 concerns, for exemple, if the document has to be validated or not. I don't= =20 have in mind the complete list of them but they're is other ones that are=20 interesting to have. The difference between both parsers is the output, but the input is exactly= =20 the same : we can parse a file, a buffer, a stream, and the abstraction=20 defines only how input can be given to the Parser, not the way to retrieve= =20 any result. > I'v (locally) introduced a 'Document' class, which is created > by a 'create_document_from[file, stream, memory]' factory function. > Note that this is really a function, as there isn't any state involved. > Either you end up with a document, or you don't (an exception may be > thrown to indicate failure). I don't think having a Document class is incompatible with a parser=20 abstraction. Your factory function is in fact a DomParser, which returns a= =20 Document instead of a root_node (this probably better, since it's closer to= =20 libxml). Moreover, I personnaly think that having a DomParser class makes more natur= al=20 the fact to have several parser working at the same time. One other argument I would have is that it is, in my opinion, closer to=20 libxml. > > What do you think about such a change ? Do we still need an abstract > Parser class ? > I vote pro ;-) Unless you prove me I'm wrong of course... Cheers, Christophe |
From: Stefan S. <se...@sy...> - 2003-02-01 14:43:51
|
Christophe de Vienne wrote: > The main interest I see in keeping an abstraction for parser is that they have > common options. One thing I want to be done on it is to be able to configure > the parsers through this abstact Parser class. The options I'm refering to > concerns, for exemple, if the document has to be validated or not. I don't > have in mind the complete list of them but they're is other ones that are > interesting to have. > The difference between both parsers is the output, but the input is exactly > the same : we can parse a file, a buffer, a stream, and the abstraction > defines only how input can be given to the Parser, not the way to retrieve > any result. ok, I see. Well, this option-setting mechanism isn't currently part of the API. Would it be, I probably wouldn't ask the question. > I don't think having a Document class is incompatible with a parser > abstraction. Your factory function is in fact a DomParser, which returns a > Document instead of a root_node (this probably better, since it's closer to > libxml). indeed. > Moreover, I personnaly think that having a DomParser class makes more natural > the fact to have several parser working at the same time. I don't understand that. Again, generating a document object doesn't involve any state that justifies a parser *object*. Even the parser options you are talking about could well be put into the 'create_document' function arguments (with sensible default values). Stefan |
From: Christophe de V. <cde...@al...> - 2003-02-01 15:24:48
|
Le Samedi 1 F=E9vrier 2003 16:42, Stefan Seefeld a =E9crit : > Christophe de Vienne wrote: > > Moreover, I personnaly think that having a DomParser class makes more > > natural the fact to have several parser working at the same time. > > I don't understand that. Again, generating a document object doesn't > involve any state that justifies a parser *object*. Even the parser optio= ns > you are talking about could well be put into the 'create_document' functi= on > arguments (with sensible default values). > There is one case, which we did not implement yet although, that justify=20 having a state : libxml propose an interface, xmlParseChunk, that allow to= =20 parse a document by small pieces. With a Parser class it will be easy and=20 natural to have, for exemple, two parsers building two document at the same= =20 time without interfering. Even if it's a rare case, there is not much to do= =20 to make it possible with a parser object, while having a function would for= ce=20 us to add a state. Regards, Christophe |
From: Stefan S. <se...@sy...> - 2003-02-01 18:36:01
|
Christophe de Vienne wrote: > There is one case, which we did not implement yet although, that justify > having a state : libxml propose an interface, xmlParseChunk, that allow to > parse a document by small pieces. With a Parser class it will be easy and > natural to have, for exemple, two parsers building two document at the same > time without interfering. Even if it's a rare case, there is not much to do > to make it possible with a parser object, while having a function would force > us to add a state. I don't understand that statement. Yes, I agree, if we ever want to parse a document chunk-wise, we need a stateful parser that can remember where it left of. I fully agree that for this case we want a parser class. Yet I maintain that there is nothing justifying a common base class for SAX parsers and parsers that create a DOM document. There isn't any code both would share, and there isn't any semantic commonality that could possibly make someone want to use an abstract parser reference, not knowing whether a call to 'parse' would mean 'generate a document' or 'call back on event X'. And I don't see the relevance of wanting to parse two things at the same time. What do you mean with 'at the same time' ? In different threads ? Chunks belonging to different documents ? If it's the second, I agree that we need a parser class (see above). But if you aren't parsing in chunks, there isn't any state to remember that couldn't be passed to the actual 'parse' call (such as whether or not to preserve white space etc.) In a nutshell, yes, I agree that in case of chunks a dom parser would be a solution. But even then there isn't anything this parser has in common with the sax parser. Regards, Stefan |
From: Christophe de V. <cde...@al...> - 2003-02-01 19:07:10
|
Le Samedi 1 F=E9vrier 2003 20:34, Stefan Seefeld a =E9crit : > Yet I maintain that there is nothing justifying a common base class for > SAX parsers and parsers that create a DOM document. There isn't any code > both would share, and there isn't any semantic commonality that could > possibly make someone want to use an abstract parser reference, not knowi= ng > whether a call to 'parse' would mean 'generate a document' or 'call back = on > event X'. The way parse_stream or an eventual parse_chunk is implemented could be=20 shared, since only the parser state is initialised differently (and partial= y=20 only, since some options are the same ones). Even the parse_file and parse_memory could share the same implementation in= =20 both classes if we used more low level libxml calls. > > And I don't see the relevance of wanting to parse two things at the same > time. What do you mean with 'at the same time' ? In different threads ? > Chunks belonging to different documents ? If it's the second, I agree that > we need a parser class (see above). I was thinking of both cases, but more specialy of the second one. > But if you aren't parsing in chunks, > there isn't any state to remember that couldn't be passed to the actual > 'parse' call (such as whether or not to preserve white space etc.) > Imagine I have several documents to parse with the same options. I could=20 instanciate a single parser instance that would become a factory of Documen= t,=20 parameterized once for all, and without having to give theses options each= =20 time to a factory function (which would mean store them somewhere). Moreover, considering the underlying C layer, we could have not implemented= =20 any accessor to use a particular option. Having the parser state initialias= ed=20 by the parser instance and accessors before doing the real parsing would le= t=20 the possibility to alter it through a cobj() accessor in a herited class > In a nutshell, yes, I agree that in case of chunks a dom parser would be > a solution. But even then there isn't anything this parser has in common > with the sax parser. > The inputs are in common. In libxml for exemple, the xmlParseChunk is the s= ame=20 for a sax or a dom parser : a Document will be produced if I don't specify = a=20 saxHandler. Still in libxml, the domparser is built on top of saxparser : t= he=20 two concepts share more things that it seems at first sight. Having an abstract Parser class will, in my opinion, reflects better the=20 underlying lib. Regards, Christophe. |
From: Stefan S. <se...@sy...> - 2003-02-01 19:29:08
|
Christophe de Vienne wrote: > The way parse_stream or an eventual parse_chunk is implemented could be > shared, since only the parser state is initialised differently (and partialy > only, since some options are the same ones). > Even the parse_file and parse_memory could share the same implementation in > both classes if we used more low level libxml calls. ok. Well, right now they don't share *any* code. And even if they would use those common function(s), it would be a single line (or two). So code reuse can't be an issue here. My main concern really is the semantic difference between DOM and SAX. My point really is that you decide once whether you want to use an event driven approach, or whether you want a full in-memory representation which you can walk through and manipulate. There isn't anything those two things have in common. Yes, document construction can (and is in libxml2) be implemented on top of a SAX interface, but that's an 'implementation detail'. > Imagine I have several documents to parse with the same options. I could > instanciate a single parser instance that would become a factory of Document, > parameterized once for all, and without having to give theses options each > time to a factory function (which would mean store them somewhere). ok, I can see that (though I'm not sure that there is anything favoring to use the same options for all documents). I do think that in 90% of all cases creation of a document is an atomic action, at least as far as the user's code is concerned. In that light I'd at least provide a factory function doing all this (possibly being implemented as Document *create_document_from_file(const std::string &filename, const options &o) { DomParser parser(o); Document *document = parser.parse(filename); return document; } so we are all happy :-) > Moreover, considering the underlying C layer, we could have not implemented > any accessor to use a particular option. Having the parser state initialiased > by the parser instance and accessors before doing the real parsing would let > the possibility to alter it through a cobj() accessor in a herited class see my other mail: I don't think using 'cobj()' accessors should be encouraged. It totally breaks encapsulation. > The inputs are in common. In libxml for exemple, the xmlParseChunk is the same > for a sax or a dom parser : a Document will be produced if I don't specify a > saxHandler. right, and I think that is very unfortunate. It's two functions lumped into one; And its semantics is very different, depending on the arguments you pass. > Still in libxml, the domparser is built on top of saxparser : the > two concepts share more things that it seems at first sight. again, would you derive your dom parser privately from the sax parser, i.e. would you use 'derived from' in terms of 'implemented by', I would (possibly) agree. Regards, Stefan |
From: Jonathan W. <co...@co...> - 2003-02-01 20:17:53
|
On Sat, Feb 01, 2003 at 03:27:20PM -0500, Stefan Seefeld wrote: > Christophe de Vienne wrote: > > >The way parse_stream or an eventual parse_chunk is implemented could be > >shared, since only the parser state is initialised differently (and > >partialy only, since some options are the same ones). > >Even the parse_file and parse_memory could share the same implementation > >in both classes if we used more low level libxml calls. > > ok. Well, right now they don't share *any* code. And even if they would use > those common function(s), it would be a single line (or two). So code reuse > can't be an issue here. In any case, public inheritance shouldn't be used for code reuse. > >Still in libxml, the domparser is built on top of saxparser : the > >two concepts share more things that it seems at first sight. > > again, would you derive your dom parser privately from the sax parser, > i.e. would you use 'derived from' in terms of 'implemented by', I would > (possibly) agree. Yes. The C library has no access control, but the C++ wrapper should use protected or private inheritance in order to share implementations. jon -- |
From: Christophe de V. <cde...@al...> - 2003-02-01 20:59:50
|
Le Samedi 1 F=E9vrier 2003 21:27, Stefan Seefeld a =E9crit : > [snip] > In that light I'd at least provide a factory > function doing all this (possibly being implemented as > > Document *create_document_from_file(const std::string &filename, const > options &o) { > DomParser parser(o); > Document *document =3D parser.parse(filename); > return document; > } > > so we are all happy :-) youpi ;-) > > > Moreover, considering the underlying C layer, we could have not > > implemented any accessor to use a particular option. Having the parser > > state initialiased by the parser instance and accessors before doing the > > real parsing would let the possibility to alter it through a cobj() > > accessor in a herited class > > see my other mail: I don't think using 'cobj()' accessors should be > encouraged. It totally breaks encapsulation. encouraged not, possible in certain cases maybe. > > The inputs are in common. In libxml for exemple, the xmlParseChunk is t= he > > same for a sax or a dom parser : a Document will be produced if I don't > > specify a saxHandler. > > right, and I think that is very unfortunate. It's two functions lumped in= to > one; And its semantics is very different, depending on the arguments you > pass. > > > Still in libxml, the domparser is built on top of saxparser : the > > two concepts share more things that it seems at first sight. > > again, would you derive your dom parser privately from the sax parser, > i.e. would you use 'derived from' in terms of 'implemented by', I would > (possibly) agree. > Well, after rereading this long thread, I changed a little bit my mind on t= he=20 Parser abstraction, although I still have the feeling we should keep it. If the two parsers doesn't have the same semantic as far as what they produ= ce,=20 their parse_xxx methods do have exactly the same semantic, so why not havin= g=20 them in a common interface, even if in 99.00% of cases polymorphism will no= t=20 be used. Keeping it would permit, for exemple, to make an adaptator to be able to pa= rse=20 from a new type of source which would be working for both parsers. Cheers, Christophe |
From: Stefan S. <se...@sy...> - 2003-02-01 21:15:29
|
Christophe de Vienne wrote: >>again, would you derive your dom parser privately from the sax parser, >>i.e. would you use 'derived from' in terms of 'implemented by', I would >>(possibly) agree. >> > > > Well, after rereading this long thread, I changed a little bit my mind on the > Parser abstraction, although I still have the feeling we should keep it. > If the two parsers doesn't have the same semantic as far as what they produce, > their parse_xxx methods do have exactly the same semantic, so why not having > them in a common interface, even if in 99.00% of cases polymorphism will not > be used. the dom parser's 'parse' method(s) returns a document, the sax parser's 'parse' method returns nothing. How's this semantically the same, or equivalent ? (ok, I'm excluding the case where you have to assemble the document from chunks here) > Keeping it would permit, for exemple, to make an adaptator to be able to parse > from a new type of source which would be working for both parsers. I don't understand that. You are still talking about code reuse, right ? Stefan |
From: Christophe de V. <cde...@al...> - 2003-02-01 21:54:27
|
Le Samedi 1 F=E9vrier 2003 23:13, Stefan Seefeld a =E9crit : > Christophe de Vienne wrote: > >>again, would you derive your dom parser privately from the sax parser, > >>i.e. would you use 'derived from' in terms of 'implemented by', I would > >>(possibly) agree. > > > > Well, after rereading this long thread, I changed a little bit my mind = on > > the Parser abstraction, although I still have the feeling we should keep > > it. If the two parsers doesn't have the same semantic as far as what th= ey > > produce, their parse_xxx methods do have exactly the same semantic, so > > why not having them in a common interface, even if in 99.00% of cases > > polymorphism will not be used. > > the dom parser's 'parse' method(s) returns a document, the sax parser's > 'parse' method returns nothing. How's this semantically the same, or > equivalent ? > I did not realise that in your exemple of create_document_from_file(...) th= e=20 parse method returned something. =46or me the parse methods tells the parser to do the actual parsing, after= what=20 we eventually go to get the result. Returning the Document would be the=20 semantic of a createDocument method, which could cohexist with parse. The two parsers, even if working very differently, has a common point : the= y=20 parse something. I find logical to put the functions for that action in a=20 common interface. > (ok, I'm excluding the case where you have to assemble the document from > chunks here) This means that parse_file should return a Document while when using=20 parse_chunk we would have to use an accessor to get it. Is it really unbearable to write each time : DomParser p; p.parse(something); Document * doc =3D p.getDocument(); especially if we introduce the create_document_from_file(...) function ? > > > Keeping it would permit, for exemple, to make an adaptator to be able to > > parse from a new type of source which would be working for both parsers. > > I don't understand that. You are still talking about code reuse, right ? > no. The remark Jonathan Wakely about public inheritance made me realize tha= t=20 this is not the real interest of having a common interface. The situation I was thinking of would be to create a class that is able to= =20 read a document from a special source, for exemple network with a weird=20 protocol, and to which we could give the Parser which has to parse it. My conclusion for now is that we would loose something (small but still) in= =20 removing the common interface, while I don't see what we gain. I'd like to have some more points of view since we'll have to make a decisi= on=20 at last. Best regards, Christophe |
From: Stefan S. <se...@sy...> - 2003-02-01 22:14:01
|
Christophe de Vienne wrote: >>the dom parser's 'parse' method(s) returns a document, the sax parser's >>'parse' method returns nothing. How's this semantically the same, or >>equivalent ? >> > > > I did not realise that in your exemple of create_document_from_file(...) the > parse method returned something. > For me the parse methods tells the parser to do the actual parsing, after what > we eventually go to get the result. Returning the Document would be the > semantic of a createDocument method, which could cohexist with parse. > The two parsers, even if working very differently, has a common point : they > parse something. I find logical to put the functions for that action in a > common interface. hmm, from my point of view 'parsing something' is the 'how', while 'creating a document' is the 'what'. In other words, in the DOM way of life I create a document from file/memory and the fact that something is parsed is an implementation detail. The SAX model then renders that detail explicit, i.e. in the world of SAX the fact that something is parsed is the actual semantics. That's precisely why it is natural to implement a DOM document creation on SAX. In short, SAX and DOM don't operate on the same level, they both describe two very different worlds. > Is it really unbearable to write each time : > DomParser p; > p.parse(something); > Document * doc = p.getDocument(); > > especially if we introduce the create_document_from_file(...) function ? well, it's of course not unbearable, but it's unintuitive and inelegant. Again, I'm excluding stateful (chunk-wise) parsing here. If parsing is an atomic action, why should I need two methods to get a document from a file ? > The situation I was thinking of would be to create a class that is able to > read a document from a special source, for exemple network with a weird > protocol, and to which we could give the Parser which has to parse it. well, for 'special sources' there is the 'std::streambuf' abstraction in C++. Us that when you want to abstract away the physical nature of the source you are reading from. > My conclusion for now is that we would loose something (small but still) in > removing the common interface, while I don't see what we gain. We gain clarity and elegance. (did you ever get to appreciate the elegance of simple mathematical formulae ? If so, you'll surely understand what I mean). > I'd like to have some more points of view since we'll have to make a decision > at last. heh, you are right. Let's listen to the others :-) Stefan |
From: Christophe de V. <cde...@al...> - 2003-02-01 23:59:59
|
Le Dimanche 2 F=E9vrier 2003 00:12, Stefan Seefeld a =E9crit : > Christophe de Vienne wrote: > > The situation I was thinking of would be to create a class that is able > > to read a document from a special source, for exemple network with a > > weird protocol, and to which we could give the Parser which has to parse > > it. > > well, for 'special sources' there is the 'std::streambuf' abstraction in > C++. Us that when you want to abstract away the physical nature of the > source you are reading from. > right > > I'd like to have some more points of view since we'll have to make a > > decision at last. > > heh, you are right. Let's listen to the others :-) yes Christophe |