Thread: [SimpleParse] Graceful failure ?
Brought to you by:
mcfletch
From: H. T. H. <hi...@co...> - 2004-03-18 22:24:27
|
Hello, Is there a way to get Simpleparse to gracefully exit instead of going into an infinite loop ? For example if I have a grammar definition like this column_specs := '<s>',ws,('<c>',ws)+,'\n' then the system would go into an infinite loop if the input contains '<S>' '<c>' How to solve this problem ? Regards, hth -- |
From: Mike C. F. <mcf...@ro...> - 2004-03-18 23:08:09
|
H. T. Hind wrote: >Hello, > >Is there a way to get Simpleparse to gracefully exit instead >of going into an infinite loop ? > >For example if I have a grammar definition like this > >column_specs := '<s>',ws,('<c>',ws)+,'\n' > >then the system would go into an infinite loop if the input >contains '<S>' '<c>' > > I don't see how it could with the given definition. Infinite loops occur when you have a grammar like this: column_specs := '<s>',ws,('<c>'?,ws)+,'\n' that is, where you have a group that can match entirely with a null group and then you add an outer repeating modifier. As defined, the engine allows for external parsers and the like which may, in fact, have a null-length match, so to catch this, would need to extend how mxTextTools communicates success/failure to allow for another state. It's possible to catch it, yes, but it would require some considerable work to modify the engine to be able to support the feature. Take care, Mike _______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/ |
From: H. T. H. <hi...@co...> - 2004-03-19 00:27:44
|
On Thu, Mar 18, 2004 at 06:08:01PM -0500, Mike C. Fletcher wrote: > H. T. Hind wrote: > > >Hello, > > > >Is there a way to get Simpleparse to gracefully exit instead > >of going into an infinite loop ? > > > >For example if I have a grammar definition like this > > > >column_specs := '<s>',ws,('<c>',ws)+,'\n' > > > >then the system would go into an infinite loop if the input > >contains '<S>' '<c>' > > > > > I don't see how it could with the given definition. Infinite loops > occur when you have a grammar like this: > > column_specs := '<s>',ws,('<c>'?,ws)+,'\n' > > that is, where you have a group that can match entirely with a null > group and then you add an outer repeating modifier. As defined, the > engine allows for external parsers and the like which may, in fact, have > a null-length match, so to catch this, would need to extend how > mxTextTools communicates success/failure to allow for another state. Is there another way to define the above that avoids the loop ? We might not be able to anticipate all the data that we might encounter, hence it is essential to have the ability to fail with an error such that a human can intervene and fix the data. > It's possible to catch it, yes, but it would require some considerable > work to modify the engine to be able to support the feature. > Would you recommend then something like PLY ? Regards, hth |
From: Mike C. F. <mcf...@ro...> - 2004-03-20 11:01:58
|
H. T. Hind wrote: ... >>column_specs := '<s>',ws,('<c>'?,ws)+,'\n' >> >> ... >Is there another way to define the above that avoids the loop ? >We might not be able to anticipate all the data that we might >encounter, hence it is essential to have the ability to fail with >an error such that a human can intervene and fix the data. > > This doesn't have anything to do with the data, the grammar itself is saying "if you match nothing here, go ahead and try to match it again". That is; to say the *exact* same thing with an engine such as mxTextTools, you can't spell it without defining a recursion, however, you probably don't want to anyway: column_specs := '<s>',ws,('<c>',ws)*,'\n' is probably what you are looking for. That is, to match the group within the brackets, you *must* find at least one '<c>', and optionally some whitespace, but if you don't find that, ignore the whole group, while allowing for repetition of the group. >>It's possible to catch it, yes, but it would require some considerable >>work to modify the engine to be able to support the feature. >> >> >> > >Would you recommend then something like PLY ? > > If you feel like it would meet your needs. You haven't yet outlined what you're actually trying to parse that would be a problem for SimpleParse (given a proper grammar). Other parser-generators may use Earley or similar parsing mechanisms, which are more general than SimpleParse, but your parsing task here doesn't seem particularly to require such generality. Anyway, good luck, Mike _______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/ |
From: H. T. H. <hi...@co...> - 2004-03-21 01:52:26
|
On Sat, Mar 20, 2004 at 06:01:48AM -0500, Mike C. Fletcher wrote: > H. T. Hind wrote: > ... > > >>column_specs := '<s>',ws,('<c>'?,ws)+,'\n' > >> > >> > ... > > >Is there another way to define the above that avoids the loop ? > >We might not be able to anticipate all the data that we might > >encounter, hence it is essential to have the ability to fail with > >an error such that a human can intervene and fix the data. > > > > The data had '<S>' , i.e a capital S instead of a lower case s and that resulted in the app going into a loop. > This doesn't have anything to do with the data, the grammar itself is > saying "if you match nothing here, go ahead and try to match it again". > That is; to say the *exact* same thing with an engine such as > mxTextTools, you can't spell it without defining a recursion, however, > you probably don't want to anyway: My issue is , that it may not be possible to define all the valid input that we'll get. In the case where the input is invalid, I'd want the application to fail instead of going into an infinite loop. -- ---------------------------------------------------------------------- Ask a question and you're a fool for three minutes; do not ask a question and you're a fool for the rest of your life. (Chinese proverb) |
From: Mike C. F. <mcf...@ro...> - 2004-03-21 20:58:28
|
H. T. Hind wrote: ... >>>>column_specs := '<s>',ws,('<c>'?,ws)+,'\n' >>>> ... >The data had '<S>' , i.e a capital S instead of a lower case s >and that resulted in the app going into a loop. > > But not from the section described if spelled like this: column_specs := '<s>',ws,('<c>',ws)*,'\n' that is, you can't get a loop *from this part of the grammar* because there's no null-matching construct. When writing a grammar, you write the grammar to include everything which is valid and reject that which is not valid. An '<S>' presented anywhere to the above production will simply cause a fail, not a loop. >My issue is , that it may not be possible to define all the valid >input that we'll get. In the case where the input is invalid, I'd >want the application to fail instead of going into an infinite >loop. > > Understood, what I'm saying is, you don't need to define all possible inputs, you just need to avoid defining constructs that can successfully match a NULL string. You might also want to look at the "cut" or "errorOnFail" directive, which can raise SyntaxErrors if part of a construct fails (doesn't help with null-matching constructs, however). HTH, Mike _______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/ |
From: H. T. H. <hi...@co...> - 2004-03-26 07:16:36
|
On Sun, Mar 21, 2004 at 03:58:16PM -0500, Mike C. Fletcher wrote: > H. T. Hind wrote: > ... > > >>>>column_specs := '<s>',ws,('<c>'?,ws)+,'\n' > >>>> > ... > > >The data had '<S>' , i.e a capital S instead of a lower case s > >and that resulted in the app going into a loop. > > > > > But not from the section described if spelled like this: > > column_specs := '<s>',ws,('<c>',ws)*,'\n' > > that is, you can't get a loop *from this part of the grammar* because > there's no null-matching construct. When writing a grammar, you write > the grammar to include everything which is valid and reject that which > is not valid. An '<S>' presented anywhere to the above production will > simply cause a fail, not a loop. Thank you for the pointer. I looked at the grammar and indeed there was a null-matching construct. > >My issue is , that it may not be possible to define all the valid > >input that we'll get. In the case where the input is invalid, I'd > >want the application to fail instead of going into an infinite > >loop. > > > > > Understood, what I'm saying is, you don't need to define all possible > inputs, you just need to avoid defining constructs that can successfully > match a NULL string. You might also want to look at the "cut" or > "errorOnFail" directive, which can raise SyntaxErrors if part of a > construct fails (doesn't help with null-matching constructs, however). Avoiding the NULL string was it. Now it correctly fails . Thanks, HTH |