Thread: [sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

sdcc-devel

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-07 17:02:37

Bugs item #1350513, was opened at 2005-11-07 09:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-07 17:09:11

Bugs item #1350513, was opened at 2005-11-07 09:02
Message generated for change (Comment added) made by nobody
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 09:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-07 17:28:48

Bugs item #1350513, was opened at 2005-11-07 09:02
Message generated for change (Comment added) made by nobody
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 09:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 09:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 12:55:21

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by maartenbrock
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 17:32:11

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 17:55:37

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 18:16:35

Bugs item #1350513, was opened at 2005-11-07 12:02
Message generated for change (Comment added) made by jimatjtan
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 13:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 12:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 12:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 07:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 12:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 12:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 18:26:21

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 18:42:27

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by maartenbrock
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 18:51:08

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 19:05:04

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-08 23:19:33

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by maartenbrock
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-09 00:12:42

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-09 12:46:59

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-09 20:29:39

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by maartenbrock
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-09 21:03:24

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-10 10:27:44

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by bernhardheld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-10 12:15:47

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 13:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-10 15:32:32

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by bernhardheld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 16:32

Message:
Logged In: YES 
user_id=203539

> I'm not observing a stack overflow
Yes, you do! Please accept two facts:
a. with the given (undocumented) usage of setjmp/longjmp 
you end up with two consecutive stack frames.
b. the stack from main() overflows the stack from task_0(), if 
an interrupt occurs during longjmp(). Please note: the stack 
will overflow too, if main uses the stack in any other way.

Now you have 3 joices:
1. don't use setjmp/longjmp in this particular, undocumented 
way
2. increase the stack for main
3. disable interrupts in longjmp()

> Your suggestion of adding a bit of extra stack space (6
> bytes in your proposal) works (I tried it!)
Thanks, this shows that I'm right. Try it again with 2 bytes, 
then you've got the same stack consumption as in the Keil 
example (please re-read my last posting about this).

> There should not be
> intervening stack frames between scheduler and task.
You call setjmp() both from main() and task_0(). main() and 
task_0() have different stack frames, consequently your 
scheduler and your task have different stack frames (see fact 
a).
And now please stop dreaming. You can't use 
setjmp/longjmp that way and complain about the 
consequences.

> One way of protection is disabling interrupts during the
> execution of this critical section as in my initial posting,
> which tackles the root cause.
Well, I don't agree about "the root cause". But do it, if this is 
what you want. But be prepared for the next overflow at the 
next corner.

And to be specific: I still don't see a bug in sdcc's longjmp().

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 13:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-10 16:21:05

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 17:21

Message:
Logged In: YES 
user_id=1375403

OK, OK...

I think I'm getting the message... You and Maarten are
saying that I can only longjmp back from caller to callee
and NOT vice versa. Is that what you are saying? Because in
that case longjmp adjusts SP towards a shrinking stack, so
corruption due to interrupts does not occur.

In that case, great, I'll accept my usage as non-standard,
continue with my "local solution" as working "step-out", 
withdraw my bug report, and thank you all for putting me
straight. I don't see why I should run the risk of further
overflows within the constraints I mentioned if I do and
this thread is still open I'll report it. 

One last thing that I kept reminding myself to ask but
forgot each time: why is longjmp/setjmp unsupported in SDCC?

BTW, I seem to be missing the point about the difference in
the Keil example and my usage, because I don't think I
changed this particular part of the application. Appreciate
if you could elaborate a bit on this. 

Finally, I really do appreciate all this feedback -- I'm
learning a lot and enjoying it.

Thanks,
Jim



----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 16:32

Message:
Logged In: YES 
user_id=203539

> I'm not observing a stack overflow
Yes, you do! Please accept two facts:
a. with the given (undocumented) usage of setjmp/longjmp 
you end up with two consecutive stack frames.
b. the stack from main() overflows the stack from task_0(), if 
an interrupt occurs during longjmp(). Please note: the stack 
will overflow too, if main uses the stack in any other way.

Now you have 3 joices:
1. don't use setjmp/longjmp in this particular, undocumented 
way
2. increase the stack for main
3. disable interrupts in longjmp()

> Your suggestion of adding a bit of extra stack space (6
> bytes in your proposal) works (I tried it!)
Thanks, this shows that I'm right. Try it again with 2 bytes, 
then you've got the same stack consumption as in the Keil 
example (please re-read my last posting about this).

> There should not be
> intervening stack frames between scheduler and task.
You call setjmp() both from main() and task_0(). main() and 
task_0() have different stack frames, consequently your 
scheduler and your task have different stack frames (see fact 
a).
And now please stop dreaming. You can't use 
setjmp/longjmp that way and complain about the 
consequences.

> One way of protection is disabling interrupts during the
> execution of this critical section as in my initial posting,
> which tackles the root cause.
Well, I don't agree about "the root cause". But do it, if this is 
what you want. But be prepared for the next overflow at the 
next corner.

And to be specific: I still don't see a bug in sdcc's longjmp().

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 13:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-10 16:49:01

Bugs item #1350513, was opened at 2005-11-07 12:02
Message generated for change (Comment added) made by jimatjtan
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-10 11:49

Message:
Logged In: YES 
user_id=175928

> Because in
> that case longjmp adjusts SP towards a shrinking stack, so
> corruption due to interrupts does not occur.

Yes, longjmp works by shrinking the stack back to where it
was when you called setjmp.  You can't go the other way
because the data there is no longer guaranteed valid.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage

I see none.  I suspect they just got lucky that it works at all.
Notice the comment in their example code for task0:
"byte i,x; // note: local variable might be corrupted after
delay"

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 11:21

Message:
Logged In: YES 
user_id=1375403

OK, OK...

I think I'm getting the message... You and Maarten are
saying that I can only longjmp back from caller to callee
and NOT vice versa. Is that what you are saying? Because in
that case longjmp adjusts SP towards a shrinking stack, so
corruption due to interrupts does not occur.

In that case, great, I'll accept my usage as non-standard,
continue with my "local solution" as working "step-out", 
withdraw my bug report, and thank you all for putting me
straight. I don't see why I should run the risk of further
overflows within the constraints I mentioned if I do and
this thread is still open I'll report it. 

One last thing that I kept reminding myself to ask but
forgot each time: why is longjmp/setjmp unsupported in SDCC?

BTW, I seem to be missing the point about the difference in
the Keil example and my usage, because I don't think I
changed this particular part of the application. Appreciate
if you could elaborate a bit on this. 

Finally, I really do appreciate all this feedback -- I'm
learning a lot and enjoying it.

Thanks,
Jim



----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 10:32

Message:
Logged In: YES 
user_id=203539

> I'm not observing a stack overflow
Yes, you do! Please accept two facts:
a. with the given (undocumented) usage of setjmp/longjmp 
you end up with two consecutive stack frames.
b. the stack from main() overflows the stack from task_0(), if 
an interrupt occurs during longjmp(). Please note: the stack 
will overflow too, if main uses the stack in any other way.

Now you have 3 joices:
1. don't use setjmp/longjmp in this particular, undocumented 
way
2. increase the stack for main
3. disable interrupts in longjmp()

> Your suggestion of adding a bit of extra stack space (6
> bytes in your proposal) works (I tried it!)
Thanks, this shows that I'm right. Try it again with 2 bytes, 
then you've got the same stack consumption as in the Keil 
example (please re-read my last posting about this).

> There should not be
> intervening stack frames between scheduler and task.
You call setjmp() both from main() and task_0(). main() and 
task_0() have different stack frames, consequently your 
scheduler and your task have different stack frames (see fact 
a).
And now please stop dreaming. You can't use 
setjmp/longjmp that way and complain about the 
consequences.

> One way of protection is disabling interrupts during the
> execution of this critical section as in my initial posting,
> which tackles the root cause.
Well, I don't agree about "the root cause". But do it, if this is 
what you want. But be prepared for the next overflow at the 
next corner.

And to be specific: I still don't see a bug in sdcc's longjmp().

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 07:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 05:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 16:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 15:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 07:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 18:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 14:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 13:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 13:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 13:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 12:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 12:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 07:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 12:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 12:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-10 20:03:57

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 21:03

Message:
Logged In: YES 
user_id=1375403

OK, thanks. All is clear now. Implementation of coroutines
with setjmp/longjmp is not supported by the standard. I
accept that my bug report was unfounded within the permitted
usage of setjmp/longjmp. Sorry that it took so long for me
to get this. (It still leaves the question why
setjmp/longjmp is unsupported in SDCC in the first place).

The fact that it works in my app is because there is no data
on the stack in danger of becoming invalid. The longjmp from
the scheduler (caller) to the task function (callee) grows
the stack by just two bytes for a return address because I'm
compiling with an option to *not* use the stack for local
vars and function arguments. It was however the timer
interrupt caused it to fail.

Thanks for reminding me of the comment in the Keil code. I
did see it, but didn't grasp the full significance of it.

Thanks all, this was a good learning experience.

Regards,
Jim

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-10 17:49

Message:
Logged In: YES 
user_id=175928

> Because in
> that case longjmp adjusts SP towards a shrinking stack, so
> corruption due to interrupts does not occur.

Yes, longjmp works by shrinking the stack back to where it
was when you called setjmp.  You can't go the other way
because the data there is no longer guaranteed valid.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage

I see none.  I suspect they just got lucky that it works at all.
Notice the comment in their example code for task0:
"byte i,x; // note: local variable might be corrupted after
delay"

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 17:21

Message:
Logged In: YES 
user_id=1375403

OK, OK...

I think I'm getting the message... You and Maarten are
saying that I can only longjmp back from caller to callee
and NOT vice versa. Is that what you are saying? Because in
that case longjmp adjusts SP towards a shrinking stack, so
corruption due to interrupts does not occur.

In that case, great, I'll accept my usage as non-standard,
continue with my "local solution" as working "step-out", 
withdraw my bug report, and thank you all for putting me
straight. I don't see why I should run the risk of further
overflows within the constraints I mentioned if I do and
this thread is still open I'll report it. 

One last thing that I kept reminding myself to ask but
forgot each time: why is longjmp/setjmp unsupported in SDCC?

BTW, I seem to be missing the point about the difference in
the Keil example and my usage, because I don't think I
changed this particular part of the application. Appreciate
if you could elaborate a bit on this. 

Finally, I really do appreciate all this feedback -- I'm
learning a lot and enjoying it.

Thanks,
Jim



----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 16:32

Message:
Logged In: YES 
user_id=203539

> I'm not observing a stack overflow
Yes, you do! Please accept two facts:
a. with the given (undocumented) usage of setjmp/longjmp 
you end up with two consecutive stack frames.
b. the stack from main() overflows the stack from task_0(), if 
an interrupt occurs during longjmp(). Please note: the stack 
will overflow too, if main uses the stack in any other way.

Now you have 3 joices:
1. don't use setjmp/longjmp in this particular, undocumented 
way
2. increase the stack for main
3. disable interrupts in longjmp()

> Your suggestion of adding a bit of extra stack space (6
> bytes in your proposal) works (I tried it!)
Thanks, this shows that I'm right. Try it again with 2 bytes, 
then you've got the same stack consumption as in the Keil 
example (please re-read my last posting about this).

> There should not be
> intervening stack frames between scheduler and task.
You call setjmp() both from main() and task_0(). main() and 
task_0() have different stack frames, consequently your 
scheduler and your task have different stack frames (see fact 
a).
And now please stop dreaming. You can't use 
setjmp/longjmp that way and complain about the 
consequences.

> One way of protection is disabling interrupts during the
> execution of this critical section as in my initial posting,
> which tackles the root cause.
Well, I don't agree about "the root cause". But do it, if this is 
what you want. But be prepared for the next overflow at the 
next corner.

And to be specific: I still don't see a bug in sdcc's longjmp().

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 13:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-11 13:03:40

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by bernhardheld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-11 14:03

Message:
Logged In: YES 
user_id=203539

> why is longjmp/setjmp unsupported in SDCC?
Because nobody ported and tested it for all ports and 
memory models. May be YOU want to step in?

> You and Maarten are saying that I can only longjmp back
> from caller to callee and NOT vice versa. Is that what you
> are saying?
Yes.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage, because I don't think I
> changed this particular part of the application. Appreciate
> if you could elaborate a bit on this.
I can't test it, but what I see is that the Keil source uses this 
call chain:
    main -> register_task -> task -> setjmp
What you're doing is:
    main -> task -> setjmp
With the call of register_task() 2 bytes for the return address 
will be pushed on the stack. There might be even more 
pushes, if Keil has to save registers on the stack (I don't 
know what Keil is doing here). Therefore at the end of the call 
chain the distance between the stack frames of task and 
main is (at least) 2 bytes larger than in your example. Or in 
other words: using the Keil call chain there will be 2 more 
bytes stack space in the stack frame of main. 

You already know how you can adopt the behaviour of the 
Keil example:
  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 2; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 2;
    }

This "hack" should just help to explain, why the Keil example 
is working and your source isn't. I really do not recommend 
using this very limited scheduler.

> Thanks all, this was a good learning experience.
Yes indeed, problems like this are a great exercise.

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 21:03

Message:
Logged In: YES 
user_id=1375403

OK, thanks. All is clear now. Implementation of coroutines
with setjmp/longjmp is not supported by the standard. I
accept that my bug report was unfounded within the permitted
usage of setjmp/longjmp. Sorry that it took so long for me
to get this. (It still leaves the question why
setjmp/longjmp is unsupported in SDCC in the first place).

The fact that it works in my app is because there is no data
on the stack in danger of becoming invalid. The longjmp from
the scheduler (caller) to the task function (callee) grows
the stack by just two bytes for a return address because I'm
compiling with an option to *not* use the stack for local
vars and function arguments. It was however the timer
interrupt caused it to fail.

Thanks for reminding me of the comment in the Keil code. I
did see it, but didn't grasp the full significance of it.

Thanks all, this was a good learning experience.

Regards,
Jim

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-10 17:49

Message:
Logged In: YES 
user_id=175928

> Because in
> that case longjmp adjusts SP towards a shrinking stack, so
> corruption due to interrupts does not occur.

Yes, longjmp works by shrinking the stack back to where it
was when you called setjmp.  You can't go the other way
because the data there is no longer guaranteed valid.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage

I see none.  I suspect they just got lucky that it works at all.
Notice the comment in their example code for task0:
"byte i,x; // note: local variable might be corrupted after
delay"

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 17:21

Message:
Logged In: YES 
user_id=1375403

OK, OK...

I think I'm getting the message... You and Maarten are
saying that I can only longjmp back from caller to callee
and NOT vice versa. Is that what you are saying? Because in
that case longjmp adjusts SP towards a shrinking stack, so
corruption due to interrupts does not occur.

In that case, great, I'll accept my usage as non-standard,
continue with my "local solution" as working "step-out", 
withdraw my bug report, and thank you all for putting me
straight. I don't see why I should run the risk of further
overflows within the constraints I mentioned if I do and
this thread is still open I'll report it. 

One last thing that I kept reminding myself to ask but
forgot each time: why is longjmp/setjmp unsupported in SDCC?

BTW, I seem to be missing the point about the difference in
the Keil example and my usage, because I don't think I
changed this particular part of the application. Appreciate
if you could elaborate a bit on this. 

Finally, I really do appreciate all this feedback -- I'm
learning a lot and enjoying it.

Thanks,
Jim



----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 16:32

Message:
Logged In: YES 
user_id=203539

> I'm not observing a stack overflow
Yes, you do! Please accept two facts:
a. with the given (undocumented) usage of setjmp/longjmp 
you end up with two consecutive stack frames.
b. the stack from main() overflows the stack from task_0(), if 
an interrupt occurs during longjmp(). Please note: the stack 
will overflow too, if main uses the stack in any other way.

Now you have 3 joices:
1. don't use setjmp/longjmp in this particular, undocumented 
way
2. increase the stack for main
3. disable interrupts in longjmp()

> Your suggestion of adding a bit of extra stack space (6
> bytes in your proposal) works (I tried it!)
Thanks, this shows that I'm right. Try it again with 2 bytes, 
then you've got the same stack consumption as in the Keil 
example (please re-read my last posting about this).

> There should not be
> intervening stack frames between scheduler and task.
You call setjmp() both from main() and task_0(). main() and 
task_0() have different stack frames, consequently your 
scheduler and your task have different stack frames (see fact 
a).
And now please stop dreaming. You can't use 
setjmp/longjmp that way and complain about the 
consequences.

> One way of protection is disabling interrupts during the
> execution of this critical section as in my initial posting,
> which tackles the root cause.
Well, I don't agree about "the root cause". But do it, if this is 
what you want. But be prepared for the next overflow at the 
next corner.

And to be specific: I still don't see a bug in sdcc's longjmp().

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 13:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-11-11 15:34:04

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by remarc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-11 16:33

Message:
Logged In: YES 
user_id=1375403

Bernard,

While this thread is no longer about a bug report, I hope
the forum will permit to give a final reaction.

I think I now fully understand your point about the
in-between call creating just enough extra space (2 bytes,
without allocation of  auto vars and args on the stack) to
allow an interrupt to occur without stack corruption. I did
have the in-between call in my application but ignorantly
left it out of the demo in an attempt to simplify. Despite
the extra call I did get lock-ups in my full app. I can now
attribute this to the fact that initially I used C code (non
_naked) in my interrupt routine (requiring more than 2
bytes), and later switch to _naked assembly (requiring just 2).

Without the protection of disabling interrupts in longjmp()
the Keil example will fail if interrupts are introduced that
start to use the stack liberally (saving registers). With my
local solution of disabling interrupts (or actually
reordering the code in longjmp() as another Jim suggested in
this thread), I'm not in danger of this. Again, restriction
to this is that there should be no longjmps to the scheduler
from deeper nested functions.

This solution will not win a beauty contest but for my
learning purposes (especially now that it is analysed to the
bone with your help) it will do, and perhaps I'll step up to
a more advanced scheduler in a next project.

You will appreciate that I do not (yet) feel qualified to
pick up the longjmp work in SDCC. However, I could not find
any comments in the manual or in the (MCS51) code why it
wasn't supported (not finished, not tested, buggy??).

Finally, the demo code I posted DOES work with my modified
version of longjmp().

Thanks,
Jim

----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-11 14:03

Message:
Logged In: YES 
user_id=203539

> why is longjmp/setjmp unsupported in SDCC?
Because nobody ported and tested it for all ports and 
memory models. May be YOU want to step in?

> You and Maarten are saying that I can only longjmp back
> from caller to callee and NOT vice versa. Is that what you
> are saying?
Yes.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage, because I don't think I
> changed this particular part of the application. Appreciate
> if you could elaborate a bit on this.
I can't test it, but what I see is that the Keil source uses this 
call chain:
    main -> register_task -> task -> setjmp
What you're doing is:
    main -> task -> setjmp
With the call of register_task() 2 bytes for the return address 
will be pushed on the stack. There might be even more 
pushes, if Keil has to save registers on the stack (I don't 
know what Keil is doing here). Therefore at the end of the call 
chain the distance between the stack frames of task and 
main is (at least) 2 bytes larger than in your example. Or in 
other words: using the Keil call chain there will be 2 more 
bytes stack space in the stack frame of main. 

You already know how you can adopt the behaviour of the 
Keil example:
  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 2; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 2;
    }

This "hack" should just help to explain, why the Keil example 
is working and your source isn't. I really do not recommend 
using this very limited scheduler.

> Thanks all, this was a good learning experience.
Yes indeed, problems like this are a great exercise.

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 21:03

Message:
Logged In: YES 
user_id=1375403

OK, thanks. All is clear now. Implementation of coroutines
with setjmp/longjmp is not supported by the standard. I
accept that my bug report was unfounded within the permitted
usage of setjmp/longjmp. Sorry that it took so long for me
to get this. (It still leaves the question why
setjmp/longjmp is unsupported in SDCC in the first place).

The fact that it works in my app is because there is no data
on the stack in danger of becoming invalid. The longjmp from
the scheduler (caller) to the task function (callee) grows
the stack by just two bytes for a return address because I'm
compiling with an option to *not* use the stack for local
vars and function arguments. It was however the timer
interrupt caused it to fail.

Thanks for reminding me of the comment in the Keil code. I
did see it, but didn't grasp the full significance of it.

Thanks all, this was a good learning experience.

Regards,
Jim

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-10 17:49

Message:
Logged In: YES 
user_id=175928

> Because in
> that case longjmp adjusts SP towards a shrinking stack, so
> corruption due to interrupts does not occur.

Yes, longjmp works by shrinking the stack back to where it
was when you called setjmp.  You can't go the other way
because the data there is no longer guaranteed valid.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage

I see none.  I suspect they just got lucky that it works at all.
Notice the comment in their example code for task0:
"byte i,x; // note: local variable might be corrupted after
delay"

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 17:21

Message:
Logged In: YES 
user_id=1375403

OK, OK...

I think I'm getting the message... You and Maarten are
saying that I can only longjmp back from caller to callee
and NOT vice versa. Is that what you are saying? Because in
that case longjmp adjusts SP towards a shrinking stack, so
corruption due to interrupts does not occur.

In that case, great, I'll accept my usage as non-standard,
continue with my "local solution" as working "step-out", 
withdraw my bug report, and thank you all for putting me
straight. I don't see why I should run the risk of further
overflows within the constraints I mentioned if I do and
this thread is still open I'll report it. 

One last thing that I kept reminding myself to ask but
forgot each time: why is longjmp/setjmp unsupported in SDCC?

BTW, I seem to be missing the point about the difference in
the Keil example and my usage, because I don't think I
changed this particular part of the application. Appreciate
if you could elaborate a bit on this. 

Finally, I really do appreciate all this feedback -- I'm
learning a lot and enjoying it.

Thanks,
Jim



----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 16:32

Message:
Logged In: YES 
user_id=203539

> I'm not observing a stack overflow
Yes, you do! Please accept two facts:
a. with the given (undocumented) usage of setjmp/longjmp 
you end up with two consecutive stack frames.
b. the stack from main() overflows the stack from task_0(), if 
an interrupt occurs during longjmp(). Please note: the stack 
will overflow too, if main uses the stack in any other way.

Now you have 3 joices:
1. don't use setjmp/longjmp in this particular, undocumented 
way
2. increase the stack for main
3. disable interrupts in longjmp()

> Your suggestion of adding a bit of extra stack space (6
> bytes in your proposal) works (I tried it!)
Thanks, this shows that I'm right. Try it again with 2 bytes, 
then you've got the same stack consumption as in the Keil 
example (please re-read my last posting about this).

> There should not be
> intervening stack frames between scheduler and task.
You call setjmp() both from main() and task_0(). main() and 
task_0() have different stack frames, consequently your 
scheduler and your task have different stack frames (see fact 
a).
And now please stop dreaming. You can't use 
setjmp/longjmp that way and complain about the 
consequences.

> One way of protection is disabling interrupts during the
> execution of this critical section as in my initial posting,
> which tackles the root cause.
Well, I don't agree about "the root cause". But do it, if this is 
what you want. But be prepared for the next overflow at the 
next corner.

And to be specific: I still don't see a bug in sdcc's longjmp().

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 13:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

[sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

From: SourceForge.net <no...@so...> - 2005-12-06 15:17:34

Bugs item #1350513, was opened at 2005-11-07 18:02
Message generated for change (Comment added) made by maartenbrock
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Run Time Library
>Group: fixed
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Maarten Brock (maartenbrock)
Summary: Possible bug in longjmp()

Initial Comment:
I believe I have found a bug in the implementation of
longjmp(). There appears to be critical section in that
code, during which an interrupt can clobber up the
stack frame under modification. I have adapted the code
by temporarily disabling interrupts. That took care of
the spurious lock-ups in my app. 

int my_longjmp (unsigned char *bp, int rv)
{
    unsigned char lsp;
    bit save_EA;

    save_EA = EA;
    EA = 0;

    // start of critical section
    lsp = *(bp+2);
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;
    SP = lsp;
    // end of critical section

    EA = save_EA;

    return rv;
}

Regards,
Jim Cramer
ji...@ji...

----------------------------------------------------------------------

>Comment By: Maarten Brock (maartenbrock)
Date: 2005-12-06 16:17

Message:
Logged In: YES 
user_id=888171

From SDCC 2.5.4 #1181 setjmp/longjmp are now supported for 
mcs51. They are tested in the regression tests too.

Other ports are not supported though. And illegal usage as 
mentioned in this thread is not supported either.

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-11 16:33

Message:
Logged In: YES 
user_id=1375403

Bernard,

While this thread is no longer about a bug report, I hope
the forum will permit to give a final reaction.

I think I now fully understand your point about the
in-between call creating just enough extra space (2 bytes,
without allocation of  auto vars and args on the stack) to
allow an interrupt to occur without stack corruption. I did
have the in-between call in my application but ignorantly
left it out of the demo in an attempt to simplify. Despite
the extra call I did get lock-ups in my full app. I can now
attribute this to the fact that initially I used C code (non
_naked) in my interrupt routine (requiring more than 2
bytes), and later switch to _naked assembly (requiring just 2).

Without the protection of disabling interrupts in longjmp()
the Keil example will fail if interrupts are introduced that
start to use the stack liberally (saving registers). With my
local solution of disabling interrupts (or actually
reordering the code in longjmp() as another Jim suggested in
this thread), I'm not in danger of this. Again, restriction
to this is that there should be no longjmps to the scheduler
from deeper nested functions.

This solution will not win a beauty contest but for my
learning purposes (especially now that it is analysed to the
bone with your help) it will do, and perhaps I'll step up to
a more advanced scheduler in a next project.

You will appreciate that I do not (yet) feel qualified to
pick up the longjmp work in SDCC. However, I could not find
any comments in the manual or in the (MCS51) code why it
wasn't supported (not finished, not tested, buggy??).

Finally, the demo code I posted DOES work with my modified
version of longjmp().

Thanks,
Jim

----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-11 14:03

Message:
Logged In: YES 
user_id=203539

> why is longjmp/setjmp unsupported in SDCC?
Because nobody ported and tested it for all ports and 
memory models. May be YOU want to step in?

> You and Maarten are saying that I can only longjmp back
> from caller to callee and NOT vice versa. Is that what you
> are saying?
Yes.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage, because I don't think I
> changed this particular part of the application. Appreciate
> if you could elaborate a bit on this.
I can't test it, but what I see is that the Keil source uses this 
call chain:
    main -> register_task -> task -> setjmp
What you're doing is:
    main -> task -> setjmp
With the call of register_task() 2 bytes for the return address 
will be pushed on the stack. There might be even more 
pushes, if Keil has to save registers on the stack (I don't 
know what Keil is doing here). Therefore at the end of the call 
chain the distance between the stack frames of task and 
main is (at least) 2 bytes larger than in your example. Or in 
other words: using the Keil call chain there will be 2 more 
bytes stack space in the stack frame of main. 

You already know how you can adopt the behaviour of the 
Keil example:
  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 2; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 2;
    }

This "hack" should just help to explain, why the Keil example 
is working and your source isn't. I really do not recommend 
using this very limited scheduler.

> Thanks all, this was a good learning experience.
Yes indeed, problems like this are a great exercise.

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 21:03

Message:
Logged In: YES 
user_id=1375403

OK, thanks. All is clear now. Implementation of coroutines
with setjmp/longjmp is not supported by the standard. I
accept that my bug report was unfounded within the permitted
usage of setjmp/longjmp. Sorry that it took so long for me
to get this. (It still leaves the question why
setjmp/longjmp is unsupported in SDCC in the first place).

The fact that it works in my app is because there is no data
on the stack in danger of becoming invalid. The longjmp from
the scheduler (caller) to the task function (callee) grows
the stack by just two bytes for a return address because I'm
compiling with an option to *not* use the stack for local
vars and function arguments. It was however the timer
interrupt caused it to fail.

Thanks for reminding me of the comment in the Keil code. I
did see it, but didn't grasp the full significance of it.

Thanks all, this was a good learning experience.

Regards,
Jim

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-10 17:49

Message:
Logged In: YES 
user_id=175928

> Because in
> that case longjmp adjusts SP towards a shrinking stack, so
> corruption due to interrupts does not occur.

Yes, longjmp works by shrinking the stack back to where it
was when you called setjmp.  You can't go the other way
because the data there is no longer guaranteed valid.

> BTW, I seem to be missing the point about the difference in
> the Keil example and my usage

I see none.  I suspect they just got lucky that it works at all.
Notice the comment in their example code for task0:
"byte i,x; // note: local variable might be corrupted after
delay"

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 17:21

Message:
Logged In: YES 
user_id=1375403

OK, OK...

I think I'm getting the message... You and Maarten are
saying that I can only longjmp back from caller to callee
and NOT vice versa. Is that what you are saying? Because in
that case longjmp adjusts SP towards a shrinking stack, so
corruption due to interrupts does not occur.

In that case, great, I'll accept my usage as non-standard,
continue with my "local solution" as working "step-out", 
withdraw my bug report, and thank you all for putting me
straight. I don't see why I should run the risk of further
overflows within the constraints I mentioned if I do and
this thread is still open I'll report it. 

One last thing that I kept reminding myself to ask but
forgot each time: why is longjmp/setjmp unsupported in SDCC?

BTW, I seem to be missing the point about the difference in
the Keil example and my usage, because I don't think I
changed this particular part of the application. Appreciate
if you could elaborate a bit on this. 

Finally, I really do appreciate all this feedback -- I'm
learning a lot and enjoying it.

Thanks,
Jim



----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 16:32

Message:
Logged In: YES 
user_id=203539

> I'm not observing a stack overflow
Yes, you do! Please accept two facts:
a. with the given (undocumented) usage of setjmp/longjmp 
you end up with two consecutive stack frames.
b. the stack from main() overflows the stack from task_0(), if 
an interrupt occurs during longjmp(). Please note: the stack 
will overflow too, if main uses the stack in any other way.

Now you have 3 joices:
1. don't use setjmp/longjmp in this particular, undocumented 
way
2. increase the stack for main
3. disable interrupts in longjmp()

> Your suggestion of adding a bit of extra stack space (6
> bytes in your proposal) works (I tried it!)
Thanks, this shows that I'm right. Try it again with 2 bytes, 
then you've got the same stack consumption as in the Keil 
example (please re-read my last posting about this).

> There should not be
> intervening stack frames between scheduler and task.
You call setjmp() both from main() and task_0(). main() and 
task_0() have different stack frames, consequently your 
scheduler and your task have different stack frames (see fact 
a).
And now please stop dreaming. You can't use 
setjmp/longjmp that way and complain about the 
consequences.

> One way of protection is disabling interrupts during the
> execution of this critical section as in my initial posting,
> which tackles the root cause.
Well, I don't agree about "the root cause". But do it, if this is 
what you want. But be prepared for the next overflow at the 
next corner.

And to be specific: I still don't see a bug in sdcc's longjmp().

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-10 13:15

Message:
Logged In: YES 
user_id=1375403

Thanks for shedding your light on this, Bernard.

Initially I wasn't interested in the stack at all,  I just
wanted to take the Keil example (which uses the
setjmp/longjmp construct) and expand it for my own purposes.
It is when I started to get lock-ups that I ended up
examining the stack. 

I'm not observing a stack overflow (i.e. the stack doesn't
grow unbounded, it just alternates one level between
scheduler (caller) and task (callee)). What I am seeing is a
return address in the process of being copied from a jmpbuf
struct onto the stack being corrupted by an interrupt coming
in between. The interrupt pushes and pops a return address
on stack just prior before longjmp could update the SP to
made the stack state valid again.

Your suggestion of adding a bit of extra stack space (6
bytes in your proposal) works (I tried it!) because in this
particular case the timer interrupt only requires 2 bytes
for its return address. It is however just circumventing the
problem with the longjmp issue rather than tackling the root
cause.  If the ISR would save registers on the stack 6 bytes
may not be enough.

I think the issue with trying to give each task its own
stack on the 8051 is the very constraint RAM space available
for data and stack (up to 256 bytes). On the Microchip
midrange PICs (one of my other targets) it wouldn't work at
all because its stack is not accessible by code (as an
aside, no setjmp/longjmp either). 

There are limitations with the setjmp/longjmp approach. A
task can only return to the scheduler from the function that
is called directly by the scheduler. (There should not be
intervening stack frames between scheduler and task). With
that constraint there is no limitation with regards to
calling other functions from within a task. Sofar, I have
not found this to be an issue (if I'm not mistaken the
commercial Salvo RTOS has the same limitation).

With respect to the SDCC implementation of longjmp, what I'm
trying to get across is that it is vulnerable to stack
corruption due to interrupts in the cricitical section
indicated below:

 // start of critical section
lsp = *(bp+2);
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
// end of critical section

One way of protection is disabling interrupts during the
execution of this critical section as in my initial posting,
which tackles the root cause.

I'll repost the URL to the KEIL example of a simple
scheduler for the 8051 here for anyone interested:
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D
(scroll down to the end of the page and click on "Simple OS
for 8051 microcontroller written in the Keil Software C
compiler");

Thanks again, Bernard.

Regards,
Jim


----------------------------------------------------------------------

Comment By: Bernhard Held (bernhardheld)
Date: 2005-11-10 11:27

Message:
Logged In: YES 
user_id=203539

Please allow me to throw in my two cents.

Jim, your problem is, that you don't care about the stack. 
You're jumping between two stack frames, the consequence 
is that you're punished by a plain stack overflow. I don't know 
exactly why the Keil example works. But one important 
difference might be, that task() is called via register_task(), 
which results in at least 2 extra bytes space on the stack.
Could you please try this:

  if (setjmp(reg_task_exit_env) == 0)
    {
      SP += 6; // allocate more stack space e.g. for interrupts
      task_0();
      SP -= 6;
    }

IMHO you should write a clean tast scheduler, and all tasks 
should have their own stack. Once I've done it for DOS using 
TurboC, it's not that difficult. Furthermore it's much easier to 
read than the setjmp/longjmp constructs. And you can call 
the scheduler from within a deeply nested function.

Finally: I can't find a bug in the old longjmp() function. It's just 
a problem of the correct stack frame (or enough space in all 
stack frames). Even with my_longjmp() all your problems will  
return, when you (or sdcc) start using the stack in task_0()!

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 22:03

Message:
Logged In: YES 
user_id=1375403

Maarten,

We seem to disagree on the interpretation of the spec. In my
view "the function containing the invocation of the setjmp
macrp" has NOT terminating (in fact you can see both caller
and callee caught in their own endless loops, thus never
terminating). I have adapted to the code from an example
written for the Keil compiler (URL was in code sample).
However regardless of whether my usage (which amounts to two
coroutines calling each other) is within the spec or not,
this still leaves the issue of the unprotected critical
section in the longjmp() implementation.

I'll leave it at your discretion how to deal, if at all,
with this issue in SDCC. I'm happy with my solution and will
move on to the next topic. I'm into SDCC for about a week
now and this is the only real issue I came across (and the
documentation did say that setjmp/longjmp was not
supported). So overall I'm still impressed with the SDCC effort.

Thanks very much for taking the time to look into this and
keep up the good work.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 21:29

Message:
Logged In: YES 
user_id=888171

Jim,

First thanks for the code example. It does however violate the 
spec. This is what it says in 7.13.2.1 #2:

The longjmp function restores the environment saved by the 
most recent invocation of
the setjmp macro in the same invocation of the program with 
the corresponding
jmp_buf argument. If there has been no such invocation, or if 
the function containing
the invocation of the setjmp macro has terminated execution 
(see note 208) in the interim, or if the
invocation of the setjmp macro was within the scope of an 
identifier with variably
modified type and execution has left that scope in the 
interim, the behavior is undefined.

And note 208 says:

For example, by executing a return statement or because 
another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set 
of nested calls.

And that longjmp is exactly what your code does.

I'll leave the bug report open to remind me to create a safe 
asm only version of setjmp/longjmp but expect to see no 
change in functionality.

Btw. if you had logged in when you opened the bug report 
you would have been able to upload files to it. Now the 
system treats you as just some bystander.

Good luck,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 13:46

Message:
Logged In: YES 
user_id=1375403

Maarten,

I'm attaching the smallest app that demonstrates the issue
while  running on actual hardware. I have added comments at
the longjmp calls of the before and after values of SP in
longjmp that I observed running in a simulator. 

Please let me know if you need further explanation.

Regards,
Jim Cramer

P.S. I'm not able to attach anything on this web screen. So
I'll try and paste it below:

// Minimal working app demonstrating potential bug in SDCC
longjmp

// code adapted from
http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D

#include <8052.h>
#include <setjmp.h>

typedef unsigned char byte;

int my_longjmp (unsigned char data *bp, int rv);
//#define LONGJMP longjmp         // <<== locks up
#define LONGJMP my_longjmp      // <<== no lock up

void task_0(void);
void timer2 (void) interrupt 5 _naked;

data jmp_buf reg_task_exit_env;
data jmp_buf kernel_env;
data jmp_buf task0_env;

volatile data byte rtos_tick;

sfr at 0xC9 T2MOD;

void main(void)
{
    byte wait_count;

    T2CON = 0x04;    // load T2 control register
    T2MOD = 0x00;

    TH2 = 0xFA;      // Set up for 1ms timer tick with
    RCAP2H = 0xFA;   // Fosc = 18.432Mhz and 12 clock period
per T2 cycle
    TL2 = 0x00;
    RCAP2L = 0x00;

    if (setjmp(reg_task_exit_env) == 0)
        task_0();

    rtos_tick = 0;   // reset flag

    ET2 = 1;         // enable T2
    TR2 = 1;         // start T2
    EA = 1;          // global interrupt enable

    P2_6 = 1;       // LED on port P2.6


    // ---------------
    // Scheduler loop
    // ---------------
    while (1)
    {
        if (setjmp(kernel_env) == 0)     // Invoke task
            LONGJMP(task0_env,1);        // SP before =
0x24, SP after = 0x26
           //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after
interrupt

        wait_count = 0;

        // spend some time in other tasks; for demo purposes
we just use a delay
        while (wait_count < 200)
        {
            while (rtos_tick == 0);
            wait_count++;
            rtos_tick = 0;
        }
    }
}

void task_0()
{
    bit flag = 0;

    if (setjmp(task0_env) == 0)
        LONGJMP(reg_task_exit_env,1);   // SP before = 0x26,
SP after = 0x24 : no issue

    // Perform task initialisation here


    //task0 loop
    while (1)
    {
        // Do task0 work: toggle LED on Port 2.6
        if (flag)
        {
            P2_6 = 0;
            flag = 0;
        }
        else
        {
            P2_6 = 1;
            flag = 1;
        }

        // Yield to scheduler
        if (setjmp(task0_env) == 0)
            LONGJMP(kernel_env,1);
    }
}

void timer2 (void) interrupt 5 _naked
{
    _asm
    clr TF2
    mov _rtos_tick,#1
    reti
    _endasm;
}

//--------------------------------------------

int my_longjmp (unsigned char data *bp, int rv)
{
    unsigned char lsp;

    lsp = *(bp+2);
    SP = lsp;
    *((unsigned char data *) lsp) = *bp++;
    *((unsigned char data *) lsp - 1) = *bp;


    return rv;
}



----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-09 01:12

Message:
Logged In: YES 
user_id=1375403

Maarten,

I understand that you will require more evidence before
assuming a bug in the core library. Apologies if I spoke too
firmly. My excuse is that I'm a newbie here and with
"embedded" (old-hand on the desktop, though -- going back to
6502 / Apple II - shows my age).

In my view the bug in the code is not in the functionality
but in leaving a cricital section unprotected against
interrupts.

I agree with you that the reordering is dangerous or
downright wrong if any local vars or function arguments are
stack based. That's why you can't use --auto-stack. In fact,
even with --auto-stack off there may still be an issue as
there are some calls (and thus stack usage) in the generated
code to resolve the generic pointer passed as the first
argument. I'm still researching that one. For now, I have
specifically declared the first argument as a "data" pointer
instead of a generic pointer to avoid the generated calls.
In my view the re-ordering removes the critical section
altogether and thus the need to disable interrupts - hence a
better solution my initial approach.

I have looked up the section in the C99 standard you
referred to. As fas as I can tell I'm meeting all stated
requirements.

I will try and create a small app to demonstrate/explain the
issue in more detail and come back on it.

I'm just a hobbyist and my app is for my personal education.
 The design is not critical (except for my understanding).  

I very much appreciate the huge effort that must have gone
into developing SDCC and if I can help to further root out a
potential bug I'lll gladly do so.

Regards,
Jim Cramer 




----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-09 00:19

Message:
Logged In: YES 
user_id=888171

Jim,

If you frequent the 8052.com forum, you would have read 
many times: testing does not prove the absence of bugs, 
only the absence of known bugs. I still consider reordering 
even more dangerous than the original code.

In the original code I also still see no bug and I can only 
guess you're using it in an unspecified way. If you can please 
read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure 
that when you execute longjmp() you have not returned from 
the function that called setjmp().

I'm not saying all this to annoy you. It's my warning that your 
design might be wrong. If you think I misunderstand, then 
please try to create a sample application that shows the bug 
but is otherwise as small as possible and upload it here. 
With the same respect, I'm trying to work with you, not 
against you.

Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 20:04

Message:
Logged In: YES 
user_id=1375403

Maarten,

I already figured that you can't use --stack-auto (i.e. must
use non-re-entrant code) because of the stack manipulation.
Within that constraint Jim P's solution (as well as my less
elegant one) works. The proof is here sitting on my desk
flashing LEDs without lock-ups (where as previously it would
lock up in about a second).

Longjmp/setjmp is very valuable for co-operative schedulers
such as I'm using now. With respect, pointing to
documentation that claims that the functionality is not
supported does not help to improve this otherwise great SDCC
stuff. 

Jim P's solution is elegant, does not alter the logic of the
code (just some re-ordering) and is in my opinion a bug fix
that I would strongly urge to be adopted. If that still
leaves setjmp/longjmp unsupported, so be it. At least some
of us can take the risk and start using it ;-)

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:51

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 19:42

Message:
Logged In: YES 
user_id=888171

Jim,

4) You have a point there. Critical might save on the stack, 
but save_EA could also end up on the stack in --stack-auto 
(allthough not in this simple case). I guess the best solution 
is to use inline assembly.
5) If the interrupt occurs after the first write and before the 
update of SP, the ISR will use stack space (far) above lsp. 
Nothing gets overwritten.
If lsp points to a piece of stack above SP those values are no 
longer valid. Anything could have written there, not only ISR's 
but also normal program flow.

jimatjtan,

Reordering makes things worse. If --stack-auto is used, lsp, 
bp and rv can be allocated on stack.

All in all, there probably was a reason why someone put in 
the manual that setjmp/longjmp are not supported.

Greets,
Maarten

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 19:26

Message:
Logged In: YES 
user_id=1375403

Hi Jim P.,

That is an elegant and better solution than I proposed. I'll
adopt it in my app.

Thanks a lot.

Regards,
Jim Cramer

----------------------------------------------------------------------

Comment By: Jim Paris (jimatjtan)
Date: 2005-11-08 19:16

Message:
Logged In: YES 
user_id=175928

I think reordering it would fix the problem:

lsp = *(bp+2);
SP = lsp;
*((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;


----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:55

Message:
Logged In: YES 
user_id=1375403

Forgot to mention under 5: happens in case the values
written are in a location above the current SP, e.g:

xxx                         <--current SP
low-address-byte      
high-address-byte    <-- updated SP

Problems arises if interrupt comes in before SP is updated
and if ISR uses the stack.

Regards,
Jim

----------------------------------------------------------------------

Comment By: remarc (remarc)
Date: 2005-11-08 18:32

Message:
Logged In: YES 
user_id=1375403

1) Sorry Maarten, newbie here. I have now registered and
logged in.
2) by accident.
3) I have done so and it solved my problem.
4) Not sure about that because longjmp() pulls tricks on the
stack.
5) Let me explain: the code fragment from the original
longjmp() that I have labelled "critical section" here first
writes a return address (saved from a previous call to a
corresponding setjmp()) to the stack and then updates the
stack pointer:
 *((unsigned char data *) lsp) = *bp++;
*((unsigned char data *) lsp - 1) = *bp;
SP = lsp;
If an interrupt comes in after the first or second statement
AND before SP gets updated in the third statement
AND if the ISR starts saving values on the stack
THEN the two stack bytes just modified get clobbered by the ISR.

As a result longjmp() doesn't return to the address set by
setjmp but jumps into some random code location.

Am I making sense?

6) Default options, small memory mode, non-reentrant code

Regards,
Jim

----------------------------------------------------------------------

Comment By: Maarten Brock (maartenbrock)
Date: 2005-11-08 13:55

Message:
Logged In: YES 
user_id=888171

Jim,

1) It helps (YOU) if you log in before posting a bug. You get 
email notifications of changes to this bug report, you can 
change settings, you can upload files, you won't get 
spammed because your email address is in the open, ...
2) No need to double post the PS
3) If you want to override a function in the library you can just 
write it and include it in your project. The linker will search 
your project before it starts looking in the library.
4) Instead of accessing EA you can also use the keyword 
critical
5) I do not understand how an interrupt can clobber up the 
stack frame in this case.
6) What options to SDCC did you use?

Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:28

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-11-07 18:06

Message:
Logged In: NO 

P.S. This applies to the MCS51 library -- Jim


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599

Thread: [sdcc-devel] [ sdcc-Bugs-1350513 ] Possible bug in longjmp()

The Small Device C Compiler (SDCC), targeting 8-bit architectures

sdcc-devel