RE: [patch 0/6] Support for malformed documents handling v4

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

You can find the updated patch in devel-jms-xpdf-error branch of pdfedit project. The only thing added is a wrapper around xpdf Parser.

michal, you can commit it this way.

jozo

> Date: Fri, 4 Apr 2008 15:22:57 +0200
> From: ms...@gm...
> To: mis...@ho...
> CC: pdf...@li...
> Subject: Re: [patch 0/6] Support for malformed documents handling v4
>
> On Thu, Apr 03, 2008 at 05:32:39PM +0000, Jozef Misutka wrote:
>>>> hi,first i think this is an important working patch. but i suggest
>>>> small improvements. i can imagine michal had really big time consuming
>>>> problems with xpdf patching so i propose small unintrusive changes
>>>> before my ack.
>>>> 1) wrap xpdf Parserwrapper would just delegate calls to
>>>> xpdf Parser and will do the error handling once and throw if
>>>> necessary.
>>
>> you did not understand... this is a very small example:
>>
>> namespace xpdf
>>{
>> struct ParserWrapper
>> {
>> ...
>> Parser* parser;
>> ...
>> void getObj(Object& obj)
>> {
>> if (0 != parser->getObj (obj))
>> throw;
>> }
>> ...
>> }
>>}
>>
>
> Of course I did understand but as I have mentione in previous email (see
> bellow) xpdf code uses Parser also for paths which are called from our
> code (e.g. Gfx). Doing it in two ways is not very good idea.
> And besides that. We don't want to throw from Parser::getObj anyway just
> because of simple reason - cleanup. There are many places when we do
> something like
>
> parser->getObj(obj1);
> ...
> parser->getObj(obj2);
> ...
> parser->getObj(obj3);
>
> // do something with obj1, obj2 and obj3
>
> and if there is some problem we have to do cleanup:
>
> parser->getObj(obj1);
> ...
> if(!parser->getObj(obj2))
> obj1.free();
> ...
> if(!parser->getObj(obj3))
> {
> obj1.free();
> obj2.free();
> }
>
> So even if exception is thrown, we need to deallocate them. Of course we
> could use same kind of smart pointers but then I don't see a point in
> this change because we would need more code than to check getObj calls.
>
>>
>>> I am not sure about this change, because it is rather intrusive. Xpdf
>>> code is deigned to ignore errors as much as possible and we would need
>>> to do deepreview whether it won't cause some regression. Parser is not
>>> a place to handle errors. It simply doesn't have enough information to
>>> qualify severity. XRef is the place and we already have wrapper for it
>>> (CXref which handles such errors in this patch).
>>
>>
>>>> + we do not have to worry about forgetting return values of
>>>> Parser->getObj
>>>> + this wrapper could check the objects in future
>>>
>>> How?
>>>>
>>>> + if we decide to reimplement parser (ehm :)) then no change to our
>>>> code would be necessary(+ this wrapper could be easier to debug)
>>>
>>> We have only few places in our code, where parser is used (CXref and
>>> some content stream stuff)
>>
>>>> + mainly just a type change needed in current kernel comparing to
>>>> checking values at each call to parser
>
>>> Don't forget about xpdf code (we should not use different parser in
>>> ourcode than xpdf because kernel code path can lead indirectly to
>>> usingxpdf parser anyway and we really don't want to have some surprises,
>>> do we?).
>
>>>> 2) reimplement void CDECL error(int pos, const char *msg, ...) {...}
>>>> if it is really an error - simply throw, otherwise it is not an error
>>>> and should be removed from xpdf (this is a very small change)
>
>>> This is not possible, because intention for this function is to
>>> *report* errors not to *handle* them. Code calling this function doesn't
>>> expectthat it throws. I have seen some messages without noticing any
>>> problem.
>
>>>> 3) CXref should overload XRef::setErrCode(int err)> and do something
>>>> reasonable?What? It simply set code to prevent from direct
>>>> manipulation witherrCode. CXref as our wrapper handles this value
>>>> sanely for us.
>>
>> does CXref check the error code?
>
> Of course it does. Check the patch.
>
>>>> ok, this is really crap (which is
>>>> totally normal in xpdf) but we could make a global Error object which
>>>> would accessed by both setErrCode() and error()
>>>
>>> Don't get it. Why (error is only simple wrapper for printf)?
>>
>> yep, and that is clearly not enough in my opinion.
>
> There are 600+ places where this function is used. This is way too much
> to be worth of chaning and checking whether it can produce some problem.
> error is simply designed to be printf wrapper and I don't see a point
> why to change it and risk regressions.
>
>> we should somehow distinguish between errors.
>
> This is really hard. IMO we can only do some compromise and focus on the
> most hot paths (Parser, XRef)
>
>>>> michal, you know this code far better so please state your opinion...
>
>>> Thanks for review and suggestions, but they are rather intrusive to
>>> thecode. We can discuss them further but my position to the xpdf code
>>> is*don't touch as much as possible* for basically two reasons:later
>>> merging with upstream which is necessary (we are not able to fix
>>> security problems) and that this code is really fragile and there are
>>> many places which expect some behavior which is not obvious.
>
> --
> Michal Hocko

_________________________________________________________________
Use video conversation to talk face-to-face with Windows Live Messenger.
http://www.windowslive.com/messenger/connect_your_way.html?ocid=TXT_TAGLM_WL_Refresh_messenger_video_042008