It would be very useful if Expat reported skipped
entities, like in the SAX2 specification.
I have identified the following situations for that:
B) External Entities are reported as skipped:
- if no external entity ref handler is set
- if the entity ref handler returns a special value
(e.g. we can define 2 as meaning: "skip this one")
B) Internal Entities are reported as skipped:
- SetDefaultHandler was called (which turns off
expansion of internal general entities)
C) Any entity reference is reported as skipped
- if no declaration is found & that is not an error
(otherwise return a well-formedness error)
Karl
Logged In: YES
user_id=290026
I propose the following signature for the handler:
enum XML_Skip_Reason {
XML_SKIP_UNDEFINED,
XML_SKIP_NOHANDLER,
XML_SKIP_REQUESTED
};
typedef void (*XML_SkippedEntityHandler)
(void *userData,
const XML_Char *entityName,
int is_parameter_entity,
const XML_Char *systemId,
const XML_Char *publicId,
enum XML_Skip_Reason skipReason);
where the values of skipReason have the following meanings:
- XML_SKIP_UNDEFINED: entity was skipped because no
declaration was found, and this was not an error
- XML_SKIP_NOHANDLER: entity was skipped because there was
no ExternalEntityRefHandler installed
- XML_SKIP_REQUESTED: the ExternalEntityRefHandler returned
a value of 2, which means the handler requested the
entity to be skipped
I hope this makes sense. Comments welcome!
Karl
Logged In: YES
user_id=3066
This feature description and proposed callback interface
sounds good to me. We might want to think about how such a
handler would interact with (or be combined with) a handler
so that defined general entities (including "standard" ones
like < and friends) can be reported, for applications
that need to produce output with minimal changes. (This is
commonly needed if the output is going to land in front of a
human rather than another processing tool.)
Let's target this for 1.95.4. Assigning to Karl since he's
indicated specific interest. ;-)
Logged In: YES
user_id=290026
Thinking some more about it, I believe that the signature
I proposed is overkill, and we can get away with his:
typedef void (*XML_SkippedEntityHandler)
(void *userData,
const XML_Char *entityName,
int is_parameter_entity);
Reasons:
In the old proposal there were two cases when PublicId
and SystemId would have been reported:
1) The application decided to skip the entity and passed
a return value of 2 from the ExternalEntityRefHandler
2) No ExternalEntityRefHandler was installed
I think both of them don't need a skippedEntityHandler,
because
For 1) It is of no particular usefulness if the application
code in the ExternalEntityRefHandler delegates the
skip-notification back to Expat. This can be done directly
from the handler at least as easily and efficiently, and
Expat itself does not need this information, since the
very fact of nothing being parsed is all that is important
to it.
For 2) If no ExternalEntityRefHandler is installed, then why
install a skippedEntityHandler? They would have
essentially the same signature, and in the end that would
mean the same as in 1) - telling Expat we want to skip the
entity. Again, that can already be easily achieved with the
exisiting API.
So, which events then remain that would require
a skippedEntityHandler? Only when entity refs are
encountered for which no declaration was read, *and*
when this is not an error.
Now, as far as Fred's suggestion of combining this
with some InternalEntityRefHandler, is concerned:
In that case we should also report the entity value.
Would we not be mixing two different problems here?
Karl
Logged In: YES
user_id=290026
I forgot case B) from the initial request.
This would, of course, still be valid,
but would also not require more than
the simple callback interface I proposed.
Karl
Logged In: YES
user_id=290026
Have a look at patch # 559910, where the latest, simplified
proposal is implemented.
Karl
Logged In: YES
user_id=13222
As longer, as I think about it, I more and more believe, it
was a mistake, to change the reporting of undeclared
entities along the line as described in bug 544679 without
also adding a skippedEntitiy handler.
(I already mentioned my objection in the discussion of
544679, but maybe I wasn't loud enough.)
Please consider adding the skippedEntity handler, as
described by Karl.
Without a skippedEntity handler, it isn't possible to
detect a misstyped internal entitiy, if your document has a
external subset or external parameter entities, even if you
parse all external entities.
This may break existing applications (well, it breaks at
least one of mine), and should have been mentioned in the
announcement (even if the new behaviour is more correct,
according to the _letters_ of the XML rec.)
And I think, it was a bad idea, to fix 544679 without adding
a skippedEntity handler at the same time.
rolf
Logged In: YES
user_id=290026
Rolf,
stop twisting my arm - I checked the patch in. :-)
It may be necessary to make changes to it
when we add the InternalEntityRefHandler.
Karl
Logged In: YES
user_id=3066
Closed since this has already been checked in. If it needs
tweaking, thats either a bug report or a request for more
performance or whatever (a feature request). Since this
doesn't seem like a performance-relevant feature, I'm not
going to expect the later.