Thread: [Sax-devel] Default behavior of no-args XMLReaderFactory.createXMLReader() method
Brought to you by:
dmegginson
From: Elliotte R. H. <el...@me...> - 2001-10-29 11:20:11
|
The current no-args XMLReaderFactory.createXMLReader() method fails if the org.xml.sax.driver system property isn't set. Parser vendors are suposed to replace this method with one of their own that returns a known XMLReader class, but few do. I propose a change to the code for this method in the default distribution that searches through a list of known parser classes in orderuntil it finds one. e.g. after failing to find org.xml.sax.driver, something like the following code could be executed: XMLReader parser = null; try { // Xerces parser = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser" ); } catch (SAXException e1) { try { // Crimson parser = XMLReaderFactory.createXMLReader( org.apache.crimson.parser.XMLReaderImpl" ); } catch (SAXException e2) { try { // AElfred parser = XMLReaderFactory.createXMLReader( "gnu.xml.aelfred2.XmlReader" ); } catch (SAXException e3) { try { // Oracle parser = XMLReaderFactory.createXMLReader( "oracle.xml.parser.v2.SAXParser" ); } catch (SAXException e4) { try { // default parser = XMLReaderFactory.createXMLReader(); } catch (SAXException e5) { } } } } } return parser; The code's for illustrative purposes only. We should probabaly store the list of parser class names in some form of array or list and iterate through it. More parsers could be added as their names became known. This would not change the API at all. However, it would make XMLReaderFactory.createXMLReader() more likely to succeed. Furthermore, it would have the nice fringe benefit of encouraging parser vendors to replace this method like they should have done in the first place since if they failed to replace it then their competitor's parser might get loaded instead of theirs! -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | Java I/O (O'Reilly & Associates, 1999) | | http://www.ibiblio.org/javafaq/books/javaio/ | | http://www.amazon.com/exec/obidos/ISBN=1565924851/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: David M. <da...@me...> - 2001-10-29 12:42:28
|
Elliotte Rusty Harold writes: > The current no-args XMLReaderFactory.createXMLReader() method fails if > the org.xml.sax.driver system property isn't set. Parser vendors are > suposed to replace this method with one of their own that returns a > known XMLReader class, but few do. I propose a change to the code for > this method in the default distribution that searches through a list of > known parser classes in orderuntil it finds one. e.g. after failing to > find org.xml.sax.driver, something like the following code could be > executed: I find that I already do something like this in application code most of the time I write something SAX-based. If we wanted to do this, it would be better to make it generic -- i.e. add a function that takes an array of property names and returns a parser from the first one that succeeds. That way, we'd avoid any political problems about which one we put first on the list (why isn't MSXML first?), and make it easier for application writers to customize for their own environments. All the best, David -- David Megginson da...@me... |
From: David B. <da...@pa...> - 2001-10-29 19:34:59
|
> The current no-args XMLReaderFactory.createXMLReader() method fails if > the org.xml.sax.driver system property isn't set. Parser vendors are > suposed to replace this method with one of their own that returns a > known XMLReader class, but few do. That'd be a bug in the parser distro, yes? And the text doesn't say "replace"; there's some additional behavior now defined. http://sax.sourceforge.net/apidoc/org/xml/sax/helpers/XMLReaderFactory.html#createXMLReader() > I propose a change to the code for > this method in the default distribution that searches through a list of > known parser classes in orderuntil it finds one What's wrong about expecting parser distros to do their jobs? I think that having the default distro "pick winners" is wrong. - Dave |
From: Elliotte R. H. <el...@me...> - 2001-10-30 00:18:21
|
>That'd be a bug in the parser distro, yes? And the text doesn't say "replace"; >there's some additional behavior now defined. > >http://sax.sourceforge.net/apidoc/org/xml/sax/helpers/XMLReaderFactory.html#createXMLReader() > > It's not at all clear that this behavior is a bug. The documentation says "should", not "must". >What's wrong about expecting parser distros to do their jobs? >I think that having the default distro "pick winners" is wrong. > Some parser vendors, notably Apache and possibly others, are extremely uncomfortable modifying libraries they take from elsewhere. Among other things, this means every time SAX is revved, they have to rev Xerces. It is proving very difficult to convince the Apache XML Project to change anything in SAX even though SAX explicitly requests that parser vendors do so. The default distro only picks a winner if a parser refuses to declare itself the winner. This is a strong kick in the pants to parser vendors to do what they're supposed to be doing. :-) -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: David B. <da...@pa...> - 2001-10-31 21:47:17
|
> >That'd be a bug in the parser distro, yes? And the text doesn't say "replace"; > >there's some additional behavior now defined. > > > > http://sax.sourceforge.net/apidoc/org/xml/sax/helpers/XMLReaderFactory.html#createXMLReader() > > It's not at all clear that this behavior is a bug. If distros don't set up any of the configuration mechanisms, it's hard to see that they've done a core part of their job. Users call that sort of thing a "bug" not a "feature. > The documentation says "should", not "must". There's also the META-INF/services/org.xml.sax.driver mechanism, which covers many of the other cases. Not all; which is why the "should". I notice that the Apache project just tweaked its JAXP code to depend on the META-INF and omit the last-ditch default -- treating that as a "should", not a "must" (arguably contrary to the JAXP spec, but it's rather vague on this and other details). > >What's wrong about expecting parser distros to do their jobs? > >I think that having the default distro "pick winners" is wrong. > > Some parser vendors, notably Apache and possibly others, are > extremely uncomfortable modifying libraries they take from > elsewhere. Among other things, this means every time SAX is > revved, they have to rev Xerces. That's an argument against bundling software from elsewhere, and not a response to that "picking winners" issue. But it's a lost battle, since Xerces doesn't define SAX (or DOM, or JAXP, or ...). Though I certainly see how it extends to arguing against compiled in defaults -- for all that some bootstrapping problems have no other solution. This is a situation where a preprocessor would help Java be a better "systems programming" platform. Setting such defaults is normally a build parameter, but Java fights that kind of engineering process. > It is proving very difficult to > convince the Apache XML Project to change anything in SAX > even though SAX explicitly requests that parser vendors do so. Heck, they've not yet merged the SAX2 r2pre2 release. And it took pretty much forever to merge the SAX2 r2pre(1) release ... they'd be better off even just using more current stuff, even without picking some particular "winner". - Dave |
From: Edwin G. <ed...@su...> - 2001-11-01 00:24:48
|
David Brownell wrote: > > > >That'd be a bug in the parser distro, yes? And the text doesn't say "replace"; > > >there's some additional behavior now defined. > > > > > > http://sax.sourceforge.net/apidoc/org/xml/sax/helpers/XMLReaderFactory.html#createXMLReader() > > > > It's not at all clear that this behavior is a bug. > > If distros don't set up any of the configuration mechanisms, > it's hard to see that they've done a core part of their job. > Users call that sort of thing a "bug" not a "feature. > > > The documentation says "should", not "must". I'm not sure I totally understand this issue and I can't access the URL right now b/c of network problems. > > There's also the META-INF/services/org.xml.sax.driver > mechanism, which covers many of the other cases. Not all; > which is why the "should". I notice that the Apache project > just tweaked its JAXP code to depend on the META-INF > and omit the last-ditch default -- treating that as a "should", > not a "must" (arguably contrary to the JAXP spec, but it's > rather vague on this and other details). Regarding the fallback implementation in the common Apache JAXP javax.xml.* code, it was removed so that we could avoid a hardwired value in the java.xml.* code which specifies a specific implementation. This is b/c some people objected to specifying a particular value. It seemed that relying on the META-INF/services mechanism would be enough. However, unfortunately, it appears that this does not work right while running as an applet in NS 4.7. I haven't investigated it fully b/c I didn't have time. So we are back to having to hardwire a fallback implementation and having multiple versions of code for each parser implementation to handle the applet case, until this can be resolved. (Any info here would help.) BTW, I agree the JAXP spec is vague in places so I've been trying to implement something reasonable in those cases. I'm leaning towards the view that SAX code should not pick a parser in XMLReaderFactory. This is similar to the way the JAXP javax.xml.parsers code works. The most up-to-date source is kept in the xml-commons module at apache and it is implementation neutral. When the code gets copied to a particular parser implementation, the fallback is chosen at that time. So if I understand the problem correctly, this is what SAX currently does, but that no one bothers to reimplement XMLReaderFactory to specify their own implementation by default. One thing that could encourage implementors to specify a default is to make it easy for them to do so. If I recall, the code XMLReaderFactory does not make it easy. The code in javax.xml.parsers.SAXParserFactory has a place to put the fallback implementation class name. > Heck, they've not yet merged the SAX2 r2pre2 release. And it > took pretty much forever to merge the SAX2 r2pre(1) release ... > they'd be better off even just using more current stuff, even without > picking some particular "winner". Yes, I agree the SAX code in Xerces needs to be synced up. I believe there were some objections by some people to doing this under the false impression that the new code would break existing apps. I think it will take someone to just go and do it. I'll try and get to it, if no one else does. -Edwin |
From: Elliotte R. H. <el...@me...> - 2001-11-01 17:39:08
|
Here's another proposal to work-around the lack of build-time configuration in Java: what if we deliberately broke the source code for XMLReaderFactory.createXMLReader()? For instance what if it contained a line like the following: // parser lookup failed; fall back to a known class return new PUT YOUR OWN CONSTRUCTOR HERE!; We would break everybody's build unless they did the right thing. A slightly less drastic variation of this that did not break builds would be: public XMLReader createXMLReader() { /* commented out code to create a new XMLReader... */ throw new UnsupportedOperationException( "This parser does not provide no-args XMLReader"); } -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: Elliotte R. H. <el...@me...> - 2001-11-01 17:39:14
|
>David Brownell wrote: >> >> > >That'd be a bug in the parser distro, yes? Even if it is a bug in the parser distro, and even if it is Java's fault for not providing better build-time and runtime default support, it might still behoove us to fix it if we can. My concern is not so much for the parser vendors as for the users who get caught up in the parsers' bugs. The situation I'm in right now is that I'm writing a lot of example code for a book about using SAX (and other things). I know from past experience that a lot of readers are going to skip right over all my detailed warnings about how you have to set the org.xml.sax.driver system property, and consequently waste a lot of time when they run my programs or copy a code fragment into their own program, and it just doesn't work. Note further, that even if they do have a parser like AElfred that gets this right, just one buggy parser somewhere in the classpath may override the 17 correct parsers they also have. OK, so what can I do about it? 1. I can just use org.apache.xerces.parsers.SAXParser explicitly in all my examples, and tell people to use Xerces. But I don't want to tie my book to tightly to anyone parser, and I'm sure other parser vendors feel the same way. :-) 2. I can implement the logic myself to select from several known parsers. If this were shipping code instead of tutorial examples, that's what I'd do. However that's a minimum of 10 extra lines of code per program which both wastes paper, and, much more importantly, obscures the new and interesting material in each program with the same old parser selection code. 3. I can write my own factory class in the com.macfaq.xml package, talk about it in Chapter 5, and use it in the rest of the book. Unfortunately past experience has taught me that examples need to be as self-contained as possible. In particular, using classes from outside the current chapter is guaranteed to confuse readers. Computer books are not read in order from page 1 to page 1000. What I want to happen is for XMLReaderFactory.createXMLReader() to just work, like it's supposed to. Unfortunately because of bad design decisions in Java itself and in many parsers this seems unlikely to happen. The only central point that seems like it might be plausibly capable of working around these flaws is SAX. It's not an ideal system, but until Java provides system wide system properties or build time defaults, this feels to me like the right solution. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: David B. <da...@pa...> - 2001-11-01 01:07:46
|
> > There's also the META-INF/services/org.xml.sax.driver > > mechanism, which covers many of the other cases. Not all; > > which is why the "should". I notice that the Apache project > > just tweaked its JAXP code to depend on the META-INF > > and omit the last-ditch default -- treating that as a "should", > > not a "must" (arguably contrary to the JAXP spec, but it's > > rather vague on this and other details). > > Regarding the fallback implementation in the common Apache JAXP > javax.xml.* code, it was removed so that we could avoid a hardwired > value in the java.xml.* code which specifies a specific implementation. > This is b/c some people objected to specifying a particular value. That's the same logic as me saying that the reference SAX distro shouldn't be in the business of "picking winners". > It seemed > that relying on the META-INF/services mechanism would be enough. > However, unfortunately, it appears that this does not work right while > running as an applet in NS 4.7. I haven't investigated it fully b/c I > didn't have time. So we are back to having to hardwire a fallback > implementation and having multiple versions of code for each parser > implementation to handle the applet case, until this can be resolved. Well, multiple versions of that particular file. As I pointed out, Java really ought to support build-time configuration mechanisms better than it does; that's a "good" use of the C preprocessor. > I'm leaning towards the view that SAX code should not pick a parser in > XMLReaderFactory. This is similar to the way the JAXP javax.xml.parsers > code works. The most up-to-date source is kept in the xml-commons > module at apache and it is implementation neutral. When the code gets > copied to a particular parser implementation, the fallback is chosen at > that time. So if I understand the problem correctly, this is what SAX > currently does, but that no one bothers to reimplement XMLReaderFactory > to specify their own implementation by default. More or less on the mark. The GNUJAXP distro does (with AElfred2), but I don't know that anyone else has picked up SAX2 r2pre2 yet. > One thing that could encourage implementors to specify a default is to > make it easy for them to do so. If I recall, the code XMLReaderFactory > does not make it easy. Go back and look again at line 140, in the middle of a short block of code that's clearly marked as DISTRO-SPECIFIC. Replace // className = "com.example.sax.XmlReader"; with something that's appropriate and not commented out ... ;-) It should be easy enough to create a patch that gets automagically applied as part of importing updates. > Yes, I agree the SAX code in Xerces needs to be synced up. I believe > there were some objections by some people to doing this under the false > impression that the new code would break existing apps. If it's got a new bug, that would break correct existing apps, tell me what it is! The SAX2 r2pre2 release isn't intended to have those. It's possible that some existing app was relying on an existing bug, of course, but given what changed I think that's unlikely. And such application bugs are normally not reasons to back out bugfixes. > I think it will > take someone to just go and do it. I'll try and get to it, if no one > else does. Excellent! Let me know if any problems turn up. (You might want to grab the latest from the 'sax2r2' CVS branch; I think a couple files had minor changes.) - Dave |
From: Edwin G. <ed...@su...> - 2001-11-01 03:48:25
|
David Brownell wrote: > > That's the same logic as me saying that the reference SAX distro > shouldn't be in the business of "picking winners". Yes, I am saying that I agree with you. > Go back and look again at line 140, in the middle of a short block of > code that's clearly marked as DISTRO-SPECIFIC. Replace > > // className = "com.example.sax.XmlReader"; > > with something that's appropriate and not commented out ... ;-) > It should be easy enough to create a patch that gets automagically > applied as part of importing updates. OK, good I haven't looked at the new version yet. > > > Yes, I agree the SAX code in Xerces needs to be synced up. I believe > > there were some objections by some people to doing this under the false > > impression that the new code would break existing apps. > > If it's got a new bug, that would break correct existing apps, tell me > what it is! The SAX2 r2pre2 release isn't intended to have those. I am agreeing with you that the latest version of SAX2 should be included in xerces. I was just mentioning other people's reservations about doing so. They may have changed their mind since. I think w/ apache projects in particular there needs to be someone to go and actually do the work. > > It's possible that some existing app was relying on an existing bug, > of course, but given what changed I think that's unlikely. And such > application bugs are normally not reasons to back out bugfixes. > > > I think it will > > take someone to just go and do it. I'll try and get to it, if no one > > else does. > > Excellent! Let me know if any problems turn up. (You might want > to grab the latest from the 'sax2r2' CVS branch; I think a couple > files had minor changes.) OK, I'm not sure whether they will want a specific release or not. I'll see if I can get this done for the next release of parser software. -Edwin |
From: David B. <da...@pa...> - 2001-11-05 00:12:52
|
> OK, so what can I do about it? What I did was repeat many times that for older distributions, folk will need to set that system property themselves ... and do what I could to make sure that newer distributions don't need to have that problem. This has been one of my pet peeves about SAX bootstrapping, from way early on. Unfortunately it's been a rather longstanding problem ("fails if '-Dorg.xml.sax.driver=...' isn't set"); it will take be a long time before the problem can be just a historical footnote. > What I want to happen is for XMLReaderFactory.createXMLReader() to > just work, like it's supposed to. But from what I see, it already does (in SAX2 r2pre2). That API behaves no differently from _any_ implementation-neutral bootstrapping framework that's reached any kind of standard status. There's always a point where the interface spec must say that individual distributions "Do The Right Thing Here" ... and where distro providers are either going to do it, or be the cause of the ensuing problems. - Dave |
From: David B. <da...@pa...> - 2001-11-05 00:12:52
|
> what if we deliberately broke the source code > for XMLReaderFactory.createXMLReader()? For instance what if it > contained a line like the following: > > // parser lookup failed; fall back to a known class > return new PUT YOUR OWN CONSTRUCTOR HERE!; That would make the reference distro become unusable, and hence unmaintainable. Same as for the second variation (below). The current distro (SAX2 r2pre) is easy to patch to add a a "last gasp" default. And it maintains backward compatibility ("-Dorg.xml.sax.driver" or "-Dorg.xml.sax.parser") while fully supporting META-INF/services/org.xml.sax.driver resources for a more "hands off" mechanism. - Dave > A slightly less drastic variation of this that did not break builds would be: > > public XMLReader createXMLReader() { > > /* commented out code to create a new XMLReader... */ > throw new UnsupportedOperationException( > "This parser does not provide no-args XMLReader"); > > } > > |