Thread: [Sax-devel] setEntityResolver(null)
Brought to you by:
dmegginson
From: Elliotte R. H. <el...@me...> - 2003-03-31 19:18:41
|
XOM user Nils Kilden-Pedersen has reported a bug that his parser is throwing a NullPointerException because XOM calls setEntityResolver(null) at one point inside the Builder class (which sits on top of a SAX parser). I'm still trying to determine exactly which SAX parser he's using (probably Xerces or Crimson), and I can probably work around this, but in the meantime he has raised an issue about the proper behavior of SAX. It has previously been my understanding that you can remove an EntityResolver, and the various other listeners set on a XMLReader by passing null to the appropriate setter method such as setEntityResolver. The SAX API documentation does not justify this, but it doesn't say it's prohibited either. See http://www.saxproject.org/apidoc/org/xml/sax/XMLReader.html#setEntityResolver(org.xml.sax.EntityResolver) However, my user pointed out to me that the version of the SAX API documentation included with J2EE does state that this method throws a NullPointerException. See http://java.sun.com/j2ee/sdk_1.3/techdocs/api/org/xml/sax/XMLReader.html#setEntityResolver(org.xml.sax.EntityResolver) I am requesting discussion and official resolution of whether or not its acceptable to pass null to setEntityResolver to remove any currently installed EntityResolver. Personally, since there's no other way to remove an EntityResolver without providing a new one, I think this is necessary. However, whatever the decision I would like to ask that the answer be explicitly placed in the official SAX docs. I think we probably also need to ask and answer this question for the other setter methods: setContentHandler, setDTDHandler, etc. and perhaps also for the extensions such as LexicalHandler and DeclHandler that are set via properties. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | Processing XML with Java (Addison-Wesley, 2002) | | http://www.cafeconleche.org/books/xmljava | | http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.cafeconleche.org/ | +----------------------------------+---------------------------------+ |
From: Miles S. <mi...@mi...> - 2003-04-01 10:39:08
|
Elliotte Rusty Harold wrote, > It has previously been my understanding that you can remove an > EntityResolver, and the various other listeners set on a XMLReader by > passing null to the appropriate setter method such as > setEntityResolver. The SAX API documentation does not justify this, > but it doesn't say it's prohibited either. See <snip/> It's always been my understanding that the argument to setEntityResolver should be non-null: it's strongly suggested by "Applications may register a *new* or *different* resolver" ... tho' I agree that this could just about be stretched to cover null as well. On the whole I think that the elaboration in the J2SE version of the SAX javadoc is reasonable. > I am requesting discussion and official resolution of whether or not > its acceptable to pass null to setEntityResolver to remove any > currently installed EntityResolver. Personally, since there's no > other way to remove an EntityResolver without providing a new one, I > think this is necessary. <snip/> Not so. It'd be quite possible to define NonResolvingEntityResolver and/or DefaultEntityResolver implementors of the EntityResolver{2} interface with the obvious semantics and have applications pass instances of those to setEntityResolver. I think the question would then be whether or not those implementations should be part of the standard SAX distribution. I don't like the idea of using null for this job for a couple of reasons. First, it breaks existing implementations, as you've observed. Second, it's not immediately clear whether it's supposed to mean "Never resolve external entities" or "Resolve external entities in the default way" ... I couldn't work out which behaviour you were expecting from the description in your post. Of course, this could be specified in the javadoc for setEntityResolver, but I think that even then the interface would be confusion prone. Given the security issues that the default entity resolution policy can have in some applications, I think any such confusion should be avoided if possible. Using instances of EntityResolver{2} implementors that have descriptive names along the lines of NonResolvingEntityResolver and DefaultEntityResolver helps here. It also provides a convenient and very visible place in the javadoc to discuss the semantics and their implications (ie. the class-level javadoc for each of the implementating classes rather than the method-level javadoc for setEntityResolver). Cheers, Miles |
From: Karl W. <ka...@wa...> - 2003-04-01 14:19:49
|
> > I am requesting discussion and official resolution of whether or not > > its acceptable to pass null to setEntityResolver to remove any > > currently installed EntityResolver. Personally, since there's no > > other way to remove an EntityResolver without providing a new one, I > > think this is necessary. > <snip/> > > Not so. > > It'd be quite possible to define NonResolvingEntityResolver and/or > DefaultEntityResolver implementors of the EntityResolver{2} interface > with the obvious semantics and have applications pass instances of > those to setEntityResolver. That would only make sense if it should never be null, but the default value is Null, so I have to agree with Elliott on that. All he wants is the ability to reset to the initial value and behaviour, if I understand him correctly. > I don't like the idea of using null for this job for a couple of > reasons. First, it breaks existing implementations, as you've observed. What percentage of them? Is there a consistent pattern? Most SAX implementation on top of Expat would allow Null. > Second, it's not immediately clear whether it's supposed to mean "Never > resolve external entities" or "Resolve external entities in the default > way" ... I couldn't work out which behaviour you were expecting from > the description in your post. It would make most sense to go back to the initial/default behaviour. For non-resolving one can then install a NonResolvingEntityResolver, unless that is already the default behaviour. > Of course, this could be specified in the javadoc for setEntityResolver, > but I think that even then the interface would be confusion prone. > Given the security issues that the default entity resolution policy can > have in some applications, I think any such confusion should be avoided > if possible. Using instances of EntityResolver{2} implementors that > have descriptive names along the lines of NonResolvingEntityResolver > and DefaultEntityResolver helps here. It also provides a convenient and > very visible place in the javadoc to discuss the semantics and their > implications (ie. the class-level javadoc for each of the > implementating classes rather than the method-level javadoc for > setEntityResolver). I think your comments are sensible, but would require more changes to the spec - i.e. disallowing Null for the initial value and pre-defining a NonResolvingEntityResolver and DefaultEntityResolver, at least for consistency reasons. Karl |
From: Jeff R. <jef...@de...> - 2003-04-01 17:05:20
|
Just for the record this has come up before-- oddly enough by Elliotte... : ) "Unregistering callback interfaces from XMLReader" Nov, 2001 http://sourceforge.net/mailarchive/message.php?msg_id=80822 That was when the doucmentation was wrong... now, the documentation is "Fixed". Originally it seemed as though David maintained the same view that Karl is today. setEntityResolver(null) seems natural to me as well... and it may just be that the documentation needs one more clarifying sentence. Cheers, Jeff Rafter Defined Systems http://www.defined.net XML Development and Developer Web Hosting |
From: Miles S. <mi...@mi...> - 2003-04-01 17:53:07
|
Jeff Rafter wrote, > That was when the doucmentation was wrong... now, the documentation > is "Fixed". Originally it seemed as though David maintained the same > view that Karl is today. setEntityResolver(null) seems natural to me > as well... and it may just be that the documentation needs one more > clarifying sentence. Umm ... so originally the spec _required_ that a null arg result in an NullPointerException being throw, and now you want to _require_ that a null arg result in default behaviour being restored? Even if this is the proposed behaviour is sensible (I have my doubts, but what the hell) this about turn is going to break things. Cheers, Miles |
From: Miles S. <mi...@mi...> - 2003-04-02 14:00:14
|
Karl Waclawek wrote, > That would only make sense if it should never be null, but the > default value is Null, so I have to agree with Elliott on that. All > he wants is the ability to reset to the initial value and behaviour, > if I understand him correctly. I don't think the default value _is_ null. The default is what you get when setEntityResolver isn't invoked at all. It's possible that internally an implementation might have a field which is null in this case, but that's an implementation detail. > > I don't like the idea of using null for this job for a couple of > > reasons. First, it breaks existing implementations, as you've > > observed. > > What percentage of them? Is there a consistent pattern? > Most SAX implementation on top of Expat would allow Null. Well, at some point the JAXP spec acquired a "throws NullPointerException" clause on setEntityResolver, and that's reflected in the the current J2SE Javadoc, and presumably in the current J2SE implementation. So that accounts for a significant percentage. To be perfectly honest I'm not sure if JAXP did something wrong here or if the original SAX2 spec specified the throws clause and it was removed retrospectively. Some of the comments in this thread suggest the latter, and that jibes with my (quite possibly mistaken) recollection that there was always a presumption against nulls even if it wasn't properly documented or consistently enforced. Whatever, JAXP 1.3 (aka JSR-206 [1]) is just about to kick off, so now would be a *very* good time to try and sort this out. > I think your comments are sensible, but would require more changes > to the spec - i.e. disallowing Null for the initial value and > pre-defining a NonResolvingEntityResolver and DefaultEntityResolver, > at least for consistency reasons. Actually, my proposal doesn't require any changes to the spec beyond the reinstatement of the throws clause. It'd be desirable, but absolutely _not_ necessary to provide implementations of the two resolvers as part of the SAX distribution, because they can both be defined by end users. Cheers, Miles [1] http://www.jcp.org/en/jsr/detail?id=206 |
From: Karl W. <ka...@wa...> - 2003-04-02 14:32:47
|
> Karl Waclawek wrote, > > That would only make sense if it should never be null, but the > > default value is Null, so I have to agree with Elliott on that. All > > he wants is the ability to reset to the initial value and behaviour, > > if I understand him correctly. > > I don't think the default value _is_ null. The default is what you get > when setEntityResolver isn't invoked at all. It's possible that > internally an implementation might have a field which is null in this > case, but that's an implementation detail. getEntityResolver returns Null, if setEntityRresolver was never invoked. > > I think your comments are sensible, but would require more changes > > to the spec - i.e. disallowing Null for the initial value and > > pre-defining a NonResolvingEntityResolver and DefaultEntityResolver, > > at least for consistency reasons. > > Actually, my proposal doesn't require any changes to the spec beyond the > reinstatement of the throws clause. It'd be desirable, but absolutely > _not_ necessary to provide implementations of the two resolvers as part > of the SAX distribution, because they can both be defined by end users. Well, assuming you want to return to the default behaviour you set the defaultEntityResolver. Now, calling getEntityResolver returns the defaultEntityResolver, while in the initial case it returned Null, for the same entity resolver. That looks inconsistent. Karl |
From: Miles S. <mi...@mi...> - 2003-04-02 14:57:51
|
Karl Waclawek wrote, > getEntityResolver returns Null, if setEntityRresolver was never > invoked. Right, and the spec says it returns, The current entity resolver, or null if none has been registered. which doesn't say that null _is_ the default entity resolver, only that a return value of null _indicates_ that the current entity resolver is the default. OK, I accept that this sounds like I'm nitpicking, and I agree that given the defn of getEntityResolver it'd be reasonable to have a null arg to setEntityResolver reset the resolver to the default. But this needs to be flagged up explicitly in the Javadoc for setEntityResolver. And we ought to say what the default policy actually _is_. DefaultHandler defines a policy reasonably clearly, but that doesn't tell us what the default behaviour is if setEntityResolver is never invoked ... it's _probably_ the same as the behaviour of DefaultHandler in almost all implementations, but there's nothing in the spec that requires this. Cheers, Miles |
From: Karl W. <ka...@wa...> - 2003-04-02 15:35:17
|
> Karl Waclawek wrote, > > getEntityResolver returns Null, if setEntityRresolver was never > > invoked. > > Right, and the spec says it returns, > > The current entity resolver, or null if none has been registered. > > which doesn't say that null _is_ the default entity resolver, only that > a return value of null _indicates_ that the current entity resolver is > the default. Yes, that is what I meant, actually. > OK, I accept that this sounds like I'm nitpicking, Yes. :-) > and I agree that > given the defn of getEntityResolver it'd be reasonable to have a null > arg to setEntityResolver reset the resolver to the default. > > But this needs to be flagged up explicitly in the Javadoc for > setEntityResolver. I agree. > And we ought to say what the default policy actually > _is_. DefaultHandler defines a policy reasonably clearly, but that > doesn't tell us what the default behaviour is if setEntityResolver is > never invoked ... it's _probably_ the same as the behaviour of > DefaultHandler in almost all implementations, but there's nothing in > the spec that requires this. Yes, strictly speaking, the behaviour when resoveEntity returns Null is not defined to be the default behaviour, although I think it is meant to be. Karl |
From: Miles S. <mi...@mi...> - 2003-04-02 16:34:58
|
Karl Waclawek wrote, > Miles Sabin wrote, > > And we ought to say what the default policy actually > > _is_. DefaultHandler defines a policy reasonably clearly, but that > > doesn't tell us what the default behaviour is if setEntityResolver > > is never invoked ... it's _probably_ the same as the behaviour of > > DefaultHandler in almost all implementations, but there's nothing > > in the spec that requires this. > > Yes, strictly speaking, the behaviour when resoveEntity returns Null > is not defined to be the default behaviour, although I think it is > meant to be. I think this is the most important issue. If the default policy is implementation defined as things stand, then the effect of specifying "a null arg resets the resolution policy to the default" on setEntityResolver amounts to saying "a null arg results in implementations defined behaviour". And I don't think that'll make anyone happy. Cheers, Miles |
From: Karl W. <ka...@wa...> - 2003-04-02 16:44:38
|
> > Yes, strictly speaking, the behaviour when resoveEntity returns Null > > is not defined to be the default behaviour, although I think it is > > meant to be. > > I think this is the most important issue. > > If the default policy is implementation defined as things stand, Doesn't it say that the SystemId has to be interpreted as an URL and must be fully resolved by the parser? > then > the effect of specifying "a null arg resets the resolution policy to > the default" on setEntityResolver amounts to saying "a null arg results > in implementations defined behaviour". And I don't think that'll make > anyone happy. But that means: you don't want implementation defined behaviour here, right? Karl |
From: Miles S. <mi...@mi...> - 2003-04-02 16:59:19
|
Karl Waclawek wrote, > > > Yes, strictly speaking, the behaviour when resoveEntity returns > > > Null is not defined to be the default behaviour, although I think > > > it is meant to be. > > > > I think this is the most important issue. > > > > If the default policy is implementation defined as things stand, > > Doesn't it say that the SystemId has to be interpreted as an URL > and must be fully resolved by the parser? Err ... that's the behaviour of the DefaultHandler if supplied to an XMLReader via setEntityResolver. What behaviour does the spec require if I do this, XMLReader reader = XMLReaderFactory.createXMLReader(); reader.setContentHandler(myHandler); reader.parse(someInputSource); Unless I'm missing something, the spec is completely silent on entity resolution policy in this (surely very common) case. Like I said, it's _probably_ true that _most_ implementations will use the same policy as the DefaultHandler, but that doesn't appear to be required by the spec. All it says is, If the application does not register an entity resolver, the XMLReader will perform its own default resolution. in the javadoc for setEntityResolver without say what that default is. > > then the effect of specifying "a null arg resets the resolution > > policy to the default" on setEntityResolver amounts to saying "a > > null arg results in implementations defined behaviour". And I don't > > think that'll make anyone happy. > > But that means: you don't want implementation defined behaviour here, > right? Yup. Cheers, Miles |
From: Karl W. <ka...@wa...> - 2003-04-02 18:26:19
|
> Karl Waclawek wrote, > > Doesn't it say that the SystemId has to be interpreted as an URL > > and must be fully resolved by the parser? > > Err ... that's the behaviour of the DefaultHandler if supplied to an > XMLReader via setEntityResolver. I am not reading the docs like that: resolveEntity for the DefaultHandler is documented like this: <quote> Always return null, so that the parser will use the system identifier provided in the XML document. This method implements the SAX default behaviour: application writers can override it in a subclass to do special translations such as catalog lookups or URI redirection. </quote> So, the default behaviour is to return Null - delegating back to the parser! And further, the docs for EntityResolver.resolveEntity state: <quote> However, SAX specifies how to interpret any InputSource returned by this method, and that if none is returned, then the system ID will be dereferenced as a URL. </quote> I would say that defines the default/built-in behaviour of the parser for any case of an external entity reference where it does not get an input source passed back from the application. > What behaviour does the spec require > if I do this, > > XMLReader reader = XMLReaderFactory.createXMLReader(); > reader.setContentHandler(myHandler); > reader.parse(someInputSource); > > Unless I'm missing something, the spec is completely silent on entity > resolution policy in this (surely very common) case. Well, I read the docs as requiring the above. Regards, Karl |
From: Miles S. <mi...@mi...> - 2003-04-03 01:27:36
|
Karl Waclawek wrote, > So, the default behaviour is to return Null - delegating back to the > parser! And further, the docs for EntityResolver.resolveEntity state: > > <quote> > However, SAX specifies how to interpret any InputSource returned by > this method, and that if none is returned, then the system ID will be > dereferenced as a URL. </quote> > > I would say that defines the default/built-in behaviour of the parser > for any case of an external entity reference where it does not get an > input source passed back from the application. OK, you're right, I'm wrong. That either means that I'm stupid or the docs need quite a bit of clarification (or both ;-). Cheers, Miles |
From: Karl W. <ka...@wa...> - 2003-04-03 02:48:31
|
> > OK, you're right, I'm wrong. > > That either means that I'm stupid or the docs need quite a bit of > clarification (or both ;-). Well, it may help you to know that originally I thought that the default behaviour was to not resolve the entity at all, and then this thread forced me to read the docs more carefully ... ;) Personally I believe that entity resolving is not a parser's job. From a programmer's view I think that separation of concerns is a good principle and should apply here too. In other words, there should not be any built-in entity resolving, but rather some separate default implementation, just the way you thought it worked. Karl |
From: Miles S. <mi...@mi...> - 2003-04-03 10:16:18
|
Karl Waclawek wrote, > Personally I believe that entity resolving is not a parser's job. > From a programmer's view I think that separation of concerns > is a good principle and should apply here too. In other words, > there should not be any built-in entity resolving, but rather some > separate default implementation, just the way you thought it worked. I would prefer this as well. But there are too many applications which rely on default entity resolution for it to be possible to declare now that the default policy is not to resolve. I think what we need is a clear and prominent statement in the Javadoc of the default policy, along with a warning along the lines of, This implies that by default an XMLReader will make outgoing network connections to resolve external entities. Consequently either you should ensure that input documents only contain trustworthy and reliable external entity URIs, or you should replace the default entity resolver with one which allows your application to have more precise control over entity resolution. maybe with a brief sketch of what "trustworthy" and "reliable" mean, and the possible consequences of attempting to resolve via an untrustworthy or unreliable URI. It probably needs to go in the class-level Javadoc for XMLReader. Cheers, Miles |