From: SourceForge.net <no...@so...> - 2005-11-07 17:02:37
|
Bugs item #1350513, was opened at 2005-11-07 09:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-07 17:09:11
|
Bugs item #1350513, was opened at 2005-11-07 09:02 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 09:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-07 17:28:48
|
Bugs item #1350513, was opened at 2005-11-07 09:02 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 09:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 09:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 12:55:21
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by maartenbrock You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 17:32:11
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 17:55:37
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 18:16:35
|
Bugs item #1350513, was opened at 2005-11-07 12:02 Message generated for change (Comment added) made by jimatjtan You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 13:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 12:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 12:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 07:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 12:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 12:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 18:26:21
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 18:42:27
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by maartenbrock You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 18:51:08
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 19:05:04
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-08 23:19:33
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by maartenbrock You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-09 00:12:42
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-09 12:46:59
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-09 20:29:39
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by maartenbrock You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) >Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-09 21:03:24
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-10 10:27:44
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by bernhardheld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-10 12:15:47
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 13:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-10 15:32:32
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by bernhardheld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 16:32 Message: Logged In: YES user_id=203539 > I'm not observing a stack overflow Yes, you do! Please accept two facts: a. with the given (undocumented) usage of setjmp/longjmp you end up with two consecutive stack frames. b. the stack from main() overflows the stack from task_0(), if an interrupt occurs during longjmp(). Please note: the stack will overflow too, if main uses the stack in any other way. Now you have 3 joices: 1. don't use setjmp/longjmp in this particular, undocumented way 2. increase the stack for main 3. disable interrupts in longjmp() > Your suggestion of adding a bit of extra stack space (6 > bytes in your proposal) works (I tried it!) Thanks, this shows that I'm right. Try it again with 2 bytes, then you've got the same stack consumption as in the Keil example (please re-read my last posting about this). > There should not be > intervening stack frames between scheduler and task. You call setjmp() both from main() and task_0(). main() and task_0() have different stack frames, consequently your scheduler and your task have different stack frames (see fact a). And now please stop dreaming. You can't use setjmp/longjmp that way and complain about the consequences. > One way of protection is disabling interrupts during the > execution of this critical section as in my initial posting, > which tackles the root cause. Well, I don't agree about "the root cause". But do it, if this is what you want. But be prepared for the next overflow at the next corner. And to be specific: I still don't see a bug in sdcc's longjmp(). ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 13:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-10 16:21:05
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 17:21 Message: Logged In: YES user_id=1375403 OK, OK... I think I'm getting the message... You and Maarten are saying that I can only longjmp back from caller to callee and NOT vice versa. Is that what you are saying? Because in that case longjmp adjusts SP towards a shrinking stack, so corruption due to interrupts does not occur. In that case, great, I'll accept my usage as non-standard, continue with my "local solution" as working "step-out", withdraw my bug report, and thank you all for putting me straight. I don't see why I should run the risk of further overflows within the constraints I mentioned if I do and this thread is still open I'll report it. One last thing that I kept reminding myself to ask but forgot each time: why is longjmp/setjmp unsupported in SDCC? BTW, I seem to be missing the point about the difference in the Keil example and my usage, because I don't think I changed this particular part of the application. Appreciate if you could elaborate a bit on this. Finally, I really do appreciate all this feedback -- I'm learning a lot and enjoying it. Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 16:32 Message: Logged In: YES user_id=203539 > I'm not observing a stack overflow Yes, you do! Please accept two facts: a. with the given (undocumented) usage of setjmp/longjmp you end up with two consecutive stack frames. b. the stack from main() overflows the stack from task_0(), if an interrupt occurs during longjmp(). Please note: the stack will overflow too, if main uses the stack in any other way. Now you have 3 joices: 1. don't use setjmp/longjmp in this particular, undocumented way 2. increase the stack for main 3. disable interrupts in longjmp() > Your suggestion of adding a bit of extra stack space (6 > bytes in your proposal) works (I tried it!) Thanks, this shows that I'm right. Try it again with 2 bytes, then you've got the same stack consumption as in the Keil example (please re-read my last posting about this). > There should not be > intervening stack frames between scheduler and task. You call setjmp() both from main() and task_0(). main() and task_0() have different stack frames, consequently your scheduler and your task have different stack frames (see fact a). And now please stop dreaming. You can't use setjmp/longjmp that way and complain about the consequences. > One way of protection is disabling interrupts during the > execution of this critical section as in my initial posting, > which tackles the root cause. Well, I don't agree about "the root cause". But do it, if this is what you want. But be prepared for the next overflow at the next corner. And to be specific: I still don't see a bug in sdcc's longjmp(). ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 13:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-10 16:49:01
|
Bugs item #1350513, was opened at 2005-11-07 12:02 Message generated for change (Comment added) made by jimatjtan You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-10 11:49 Message: Logged In: YES user_id=175928 > Because in > that case longjmp adjusts SP towards a shrinking stack, so > corruption due to interrupts does not occur. Yes, longjmp works by shrinking the stack back to where it was when you called setjmp. You can't go the other way because the data there is no longer guaranteed valid. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage I see none. I suspect they just got lucky that it works at all. Notice the comment in their example code for task0: "byte i,x; // note: local variable might be corrupted after delay" ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 11:21 Message: Logged In: YES user_id=1375403 OK, OK... I think I'm getting the message... You and Maarten are saying that I can only longjmp back from caller to callee and NOT vice versa. Is that what you are saying? Because in that case longjmp adjusts SP towards a shrinking stack, so corruption due to interrupts does not occur. In that case, great, I'll accept my usage as non-standard, continue with my "local solution" as working "step-out", withdraw my bug report, and thank you all for putting me straight. I don't see why I should run the risk of further overflows within the constraints I mentioned if I do and this thread is still open I'll report it. One last thing that I kept reminding myself to ask but forgot each time: why is longjmp/setjmp unsupported in SDCC? BTW, I seem to be missing the point about the difference in the Keil example and my usage, because I don't think I changed this particular part of the application. Appreciate if you could elaborate a bit on this. Finally, I really do appreciate all this feedback -- I'm learning a lot and enjoying it. Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 10:32 Message: Logged In: YES user_id=203539 > I'm not observing a stack overflow Yes, you do! Please accept two facts: a. with the given (undocumented) usage of setjmp/longjmp you end up with two consecutive stack frames. b. the stack from main() overflows the stack from task_0(), if an interrupt occurs during longjmp(). Please note: the stack will overflow too, if main uses the stack in any other way. Now you have 3 joices: 1. don't use setjmp/longjmp in this particular, undocumented way 2. increase the stack for main 3. disable interrupts in longjmp() > Your suggestion of adding a bit of extra stack space (6 > bytes in your proposal) works (I tried it!) Thanks, this shows that I'm right. Try it again with 2 bytes, then you've got the same stack consumption as in the Keil example (please re-read my last posting about this). > There should not be > intervening stack frames between scheduler and task. You call setjmp() both from main() and task_0(). main() and task_0() have different stack frames, consequently your scheduler and your task have different stack frames (see fact a). And now please stop dreaming. You can't use setjmp/longjmp that way and complain about the consequences. > One way of protection is disabling interrupts during the > execution of this critical section as in my initial posting, > which tackles the root cause. Well, I don't agree about "the root cause". But do it, if this is what you want. But be prepared for the next overflow at the next corner. And to be specific: I still don't see a bug in sdcc's longjmp(). ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 07:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 05:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 16:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 15:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 07:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 18:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 14:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 13:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 13:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 13:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 12:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 12:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 07:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 12:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 12:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-10 20:03:57
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 21:03 Message: Logged In: YES user_id=1375403 OK, thanks. All is clear now. Implementation of coroutines with setjmp/longjmp is not supported by the standard. I accept that my bug report was unfounded within the permitted usage of setjmp/longjmp. Sorry that it took so long for me to get this. (It still leaves the question why setjmp/longjmp is unsupported in SDCC in the first place). The fact that it works in my app is because there is no data on the stack in danger of becoming invalid. The longjmp from the scheduler (caller) to the task function (callee) grows the stack by just two bytes for a return address because I'm compiling with an option to *not* use the stack for local vars and function arguments. It was however the timer interrupt caused it to fail. Thanks for reminding me of the comment in the Keil code. I did see it, but didn't grasp the full significance of it. Thanks all, this was a good learning experience. Regards, Jim ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-10 17:49 Message: Logged In: YES user_id=175928 > Because in > that case longjmp adjusts SP towards a shrinking stack, so > corruption due to interrupts does not occur. Yes, longjmp works by shrinking the stack back to where it was when you called setjmp. You can't go the other way because the data there is no longer guaranteed valid. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage I see none. I suspect they just got lucky that it works at all. Notice the comment in their example code for task0: "byte i,x; // note: local variable might be corrupted after delay" ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 17:21 Message: Logged In: YES user_id=1375403 OK, OK... I think I'm getting the message... You and Maarten are saying that I can only longjmp back from caller to callee and NOT vice versa. Is that what you are saying? Because in that case longjmp adjusts SP towards a shrinking stack, so corruption due to interrupts does not occur. In that case, great, I'll accept my usage as non-standard, continue with my "local solution" as working "step-out", withdraw my bug report, and thank you all for putting me straight. I don't see why I should run the risk of further overflows within the constraints I mentioned if I do and this thread is still open I'll report it. One last thing that I kept reminding myself to ask but forgot each time: why is longjmp/setjmp unsupported in SDCC? BTW, I seem to be missing the point about the difference in the Keil example and my usage, because I don't think I changed this particular part of the application. Appreciate if you could elaborate a bit on this. Finally, I really do appreciate all this feedback -- I'm learning a lot and enjoying it. Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 16:32 Message: Logged In: YES user_id=203539 > I'm not observing a stack overflow Yes, you do! Please accept two facts: a. with the given (undocumented) usage of setjmp/longjmp you end up with two consecutive stack frames. b. the stack from main() overflows the stack from task_0(), if an interrupt occurs during longjmp(). Please note: the stack will overflow too, if main uses the stack in any other way. Now you have 3 joices: 1. don't use setjmp/longjmp in this particular, undocumented way 2. increase the stack for main 3. disable interrupts in longjmp() > Your suggestion of adding a bit of extra stack space (6 > bytes in your proposal) works (I tried it!) Thanks, this shows that I'm right. Try it again with 2 bytes, then you've got the same stack consumption as in the Keil example (please re-read my last posting about this). > There should not be > intervening stack frames between scheduler and task. You call setjmp() both from main() and task_0(). main() and task_0() have different stack frames, consequently your scheduler and your task have different stack frames (see fact a). And now please stop dreaming. You can't use setjmp/longjmp that way and complain about the consequences. > One way of protection is disabling interrupts during the > execution of this critical section as in my initial posting, > which tackles the root cause. Well, I don't agree about "the root cause". But do it, if this is what you want. But be prepared for the next overflow at the next corner. And to be specific: I still don't see a bug in sdcc's longjmp(). ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 13:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-11 13:03:40
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by bernhardheld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Bernhard Held (bernhardheld) Date: 2005-11-11 14:03 Message: Logged In: YES user_id=203539 > why is longjmp/setjmp unsupported in SDCC? Because nobody ported and tested it for all ports and memory models. May be YOU want to step in? > You and Maarten are saying that I can only longjmp back > from caller to callee and NOT vice versa. Is that what you > are saying? Yes. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage, because I don't think I > changed this particular part of the application. Appreciate > if you could elaborate a bit on this. I can't test it, but what I see is that the Keil source uses this call chain: main -> register_task -> task -> setjmp What you're doing is: main -> task -> setjmp With the call of register_task() 2 bytes for the return address will be pushed on the stack. There might be even more pushes, if Keil has to save registers on the stack (I don't know what Keil is doing here). Therefore at the end of the call chain the distance between the stack frames of task and main is (at least) 2 bytes larger than in your example. Or in other words: using the Keil call chain there will be 2 more bytes stack space in the stack frame of main. You already know how you can adopt the behaviour of the Keil example: if (setjmp(reg_task_exit_env) == 0) { SP += 2; // allocate more stack space e.g. for interrupts task_0(); SP -= 2; } This "hack" should just help to explain, why the Keil example is working and your source isn't. I really do not recommend using this very limited scheduler. > Thanks all, this was a good learning experience. Yes indeed, problems like this are a great exercise. ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 21:03 Message: Logged In: YES user_id=1375403 OK, thanks. All is clear now. Implementation of coroutines with setjmp/longjmp is not supported by the standard. I accept that my bug report was unfounded within the permitted usage of setjmp/longjmp. Sorry that it took so long for me to get this. (It still leaves the question why setjmp/longjmp is unsupported in SDCC in the first place). The fact that it works in my app is because there is no data on the stack in danger of becoming invalid. The longjmp from the scheduler (caller) to the task function (callee) grows the stack by just two bytes for a return address because I'm compiling with an option to *not* use the stack for local vars and function arguments. It was however the timer interrupt caused it to fail. Thanks for reminding me of the comment in the Keil code. I did see it, but didn't grasp the full significance of it. Thanks all, this was a good learning experience. Regards, Jim ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-10 17:49 Message: Logged In: YES user_id=175928 > Because in > that case longjmp adjusts SP towards a shrinking stack, so > corruption due to interrupts does not occur. Yes, longjmp works by shrinking the stack back to where it was when you called setjmp. You can't go the other way because the data there is no longer guaranteed valid. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage I see none. I suspect they just got lucky that it works at all. Notice the comment in their example code for task0: "byte i,x; // note: local variable might be corrupted after delay" ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 17:21 Message: Logged In: YES user_id=1375403 OK, OK... I think I'm getting the message... You and Maarten are saying that I can only longjmp back from caller to callee and NOT vice versa. Is that what you are saying? Because in that case longjmp adjusts SP towards a shrinking stack, so corruption due to interrupts does not occur. In that case, great, I'll accept my usage as non-standard, continue with my "local solution" as working "step-out", withdraw my bug report, and thank you all for putting me straight. I don't see why I should run the risk of further overflows within the constraints I mentioned if I do and this thread is still open I'll report it. One last thing that I kept reminding myself to ask but forgot each time: why is longjmp/setjmp unsupported in SDCC? BTW, I seem to be missing the point about the difference in the Keil example and my usage, because I don't think I changed this particular part of the application. Appreciate if you could elaborate a bit on this. Finally, I really do appreciate all this feedback -- I'm learning a lot and enjoying it. Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 16:32 Message: Logged In: YES user_id=203539 > I'm not observing a stack overflow Yes, you do! Please accept two facts: a. with the given (undocumented) usage of setjmp/longjmp you end up with two consecutive stack frames. b. the stack from main() overflows the stack from task_0(), if an interrupt occurs during longjmp(). Please note: the stack will overflow too, if main uses the stack in any other way. Now you have 3 joices: 1. don't use setjmp/longjmp in this particular, undocumented way 2. increase the stack for main 3. disable interrupts in longjmp() > Your suggestion of adding a bit of extra stack space (6 > bytes in your proposal) works (I tried it!) Thanks, this shows that I'm right. Try it again with 2 bytes, then you've got the same stack consumption as in the Keil example (please re-read my last posting about this). > There should not be > intervening stack frames between scheduler and task. You call setjmp() both from main() and task_0(). main() and task_0() have different stack frames, consequently your scheduler and your task have different stack frames (see fact a). And now please stop dreaming. You can't use setjmp/longjmp that way and complain about the consequences. > One way of protection is disabling interrupts during the > execution of this critical section as in my initial posting, > which tackles the root cause. Well, I don't agree about "the root cause". But do it, if this is what you want. But be prepared for the next overflow at the next corner. And to be specific: I still don't see a bug in sdcc's longjmp(). ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 13:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-11-11 15:34:04
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by remarc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-11 16:33 Message: Logged In: YES user_id=1375403 Bernard, While this thread is no longer about a bug report, I hope the forum will permit to give a final reaction. I think I now fully understand your point about the in-between call creating just enough extra space (2 bytes, without allocation of auto vars and args on the stack) to allow an interrupt to occur without stack corruption. I did have the in-between call in my application but ignorantly left it out of the demo in an attempt to simplify. Despite the extra call I did get lock-ups in my full app. I can now attribute this to the fact that initially I used C code (non _naked) in my interrupt routine (requiring more than 2 bytes), and later switch to _naked assembly (requiring just 2). Without the protection of disabling interrupts in longjmp() the Keil example will fail if interrupts are introduced that start to use the stack liberally (saving registers). With my local solution of disabling interrupts (or actually reordering the code in longjmp() as another Jim suggested in this thread), I'm not in danger of this. Again, restriction to this is that there should be no longjmps to the scheduler from deeper nested functions. This solution will not win a beauty contest but for my learning purposes (especially now that it is analysed to the bone with your help) it will do, and perhaps I'll step up to a more advanced scheduler in a next project. You will appreciate that I do not (yet) feel qualified to pick up the longjmp work in SDCC. However, I could not find any comments in the manual or in the (MCS51) code why it wasn't supported (not finished, not tested, buggy??). Finally, the demo code I posted DOES work with my modified version of longjmp(). Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-11 14:03 Message: Logged In: YES user_id=203539 > why is longjmp/setjmp unsupported in SDCC? Because nobody ported and tested it for all ports and memory models. May be YOU want to step in? > You and Maarten are saying that I can only longjmp back > from caller to callee and NOT vice versa. Is that what you > are saying? Yes. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage, because I don't think I > changed this particular part of the application. Appreciate > if you could elaborate a bit on this. I can't test it, but what I see is that the Keil source uses this call chain: main -> register_task -> task -> setjmp What you're doing is: main -> task -> setjmp With the call of register_task() 2 bytes for the return address will be pushed on the stack. There might be even more pushes, if Keil has to save registers on the stack (I don't know what Keil is doing here). Therefore at the end of the call chain the distance between the stack frames of task and main is (at least) 2 bytes larger than in your example. Or in other words: using the Keil call chain there will be 2 more bytes stack space in the stack frame of main. You already know how you can adopt the behaviour of the Keil example: if (setjmp(reg_task_exit_env) == 0) { SP += 2; // allocate more stack space e.g. for interrupts task_0(); SP -= 2; } This "hack" should just help to explain, why the Keil example is working and your source isn't. I really do not recommend using this very limited scheduler. > Thanks all, this was a good learning experience. Yes indeed, problems like this are a great exercise. ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 21:03 Message: Logged In: YES user_id=1375403 OK, thanks. All is clear now. Implementation of coroutines with setjmp/longjmp is not supported by the standard. I accept that my bug report was unfounded within the permitted usage of setjmp/longjmp. Sorry that it took so long for me to get this. (It still leaves the question why setjmp/longjmp is unsupported in SDCC in the first place). The fact that it works in my app is because there is no data on the stack in danger of becoming invalid. The longjmp from the scheduler (caller) to the task function (callee) grows the stack by just two bytes for a return address because I'm compiling with an option to *not* use the stack for local vars and function arguments. It was however the timer interrupt caused it to fail. Thanks for reminding me of the comment in the Keil code. I did see it, but didn't grasp the full significance of it. Thanks all, this was a good learning experience. Regards, Jim ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-10 17:49 Message: Logged In: YES user_id=175928 > Because in > that case longjmp adjusts SP towards a shrinking stack, so > corruption due to interrupts does not occur. Yes, longjmp works by shrinking the stack back to where it was when you called setjmp. You can't go the other way because the data there is no longer guaranteed valid. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage I see none. I suspect they just got lucky that it works at all. Notice the comment in their example code for task0: "byte i,x; // note: local variable might be corrupted after delay" ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 17:21 Message: Logged In: YES user_id=1375403 OK, OK... I think I'm getting the message... You and Maarten are saying that I can only longjmp back from caller to callee and NOT vice versa. Is that what you are saying? Because in that case longjmp adjusts SP towards a shrinking stack, so corruption due to interrupts does not occur. In that case, great, I'll accept my usage as non-standard, continue with my "local solution" as working "step-out", withdraw my bug report, and thank you all for putting me straight. I don't see why I should run the risk of further overflows within the constraints I mentioned if I do and this thread is still open I'll report it. One last thing that I kept reminding myself to ask but forgot each time: why is longjmp/setjmp unsupported in SDCC? BTW, I seem to be missing the point about the difference in the Keil example and my usage, because I don't think I changed this particular part of the application. Appreciate if you could elaborate a bit on this. Finally, I really do appreciate all this feedback -- I'm learning a lot and enjoying it. Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 16:32 Message: Logged In: YES user_id=203539 > I'm not observing a stack overflow Yes, you do! Please accept two facts: a. with the given (undocumented) usage of setjmp/longjmp you end up with two consecutive stack frames. b. the stack from main() overflows the stack from task_0(), if an interrupt occurs during longjmp(). Please note: the stack will overflow too, if main uses the stack in any other way. Now you have 3 joices: 1. don't use setjmp/longjmp in this particular, undocumented way 2. increase the stack for main 3. disable interrupts in longjmp() > Your suggestion of adding a bit of extra stack space (6 > bytes in your proposal) works (I tried it!) Thanks, this shows that I'm right. Try it again with 2 bytes, then you've got the same stack consumption as in the Keil example (please re-read my last posting about this). > There should not be > intervening stack frames between scheduler and task. You call setjmp() both from main() and task_0(). main() and task_0() have different stack frames, consequently your scheduler and your task have different stack frames (see fact a). And now please stop dreaming. You can't use setjmp/longjmp that way and complain about the consequences. > One way of protection is disabling interrupts during the > execution of this critical section as in my initial posting, > which tackles the root cause. Well, I don't agree about "the root cause". But do it, if this is what you want. But be prepared for the next overflow at the next corner. And to be specific: I still don't see a bug in sdcc's longjmp(). ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 13:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |
From: SourceForge.net <no...@so...> - 2005-12-06 15:17:34
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by maartenbrock You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library >Group: fixed >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Maarten Brock (maartenbrock) Date: 2005-12-06 16:17 Message: Logged In: YES user_id=888171 From SDCC 2.5.4 #1181 setjmp/longjmp are now supported for mcs51. They are tested in the regression tests too. Other ports are not supported though. And illegal usage as mentioned in this thread is not supported either. ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-11 16:33 Message: Logged In: YES user_id=1375403 Bernard, While this thread is no longer about a bug report, I hope the forum will permit to give a final reaction. I think I now fully understand your point about the in-between call creating just enough extra space (2 bytes, without allocation of auto vars and args on the stack) to allow an interrupt to occur without stack corruption. I did have the in-between call in my application but ignorantly left it out of the demo in an attempt to simplify. Despite the extra call I did get lock-ups in my full app. I can now attribute this to the fact that initially I used C code (non _naked) in my interrupt routine (requiring more than 2 bytes), and later switch to _naked assembly (requiring just 2). Without the protection of disabling interrupts in longjmp() the Keil example will fail if interrupts are introduced that start to use the stack liberally (saving registers). With my local solution of disabling interrupts (or actually reordering the code in longjmp() as another Jim suggested in this thread), I'm not in danger of this. Again, restriction to this is that there should be no longjmps to the scheduler from deeper nested functions. This solution will not win a beauty contest but for my learning purposes (especially now that it is analysed to the bone with your help) it will do, and perhaps I'll step up to a more advanced scheduler in a next project. You will appreciate that I do not (yet) feel qualified to pick up the longjmp work in SDCC. However, I could not find any comments in the manual or in the (MCS51) code why it wasn't supported (not finished, not tested, buggy??). Finally, the demo code I posted DOES work with my modified version of longjmp(). Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-11 14:03 Message: Logged In: YES user_id=203539 > why is longjmp/setjmp unsupported in SDCC? Because nobody ported and tested it for all ports and memory models. May be YOU want to step in? > You and Maarten are saying that I can only longjmp back > from caller to callee and NOT vice versa. Is that what you > are saying? Yes. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage, because I don't think I > changed this particular part of the application. Appreciate > if you could elaborate a bit on this. I can't test it, but what I see is that the Keil source uses this call chain: main -> register_task -> task -> setjmp What you're doing is: main -> task -> setjmp With the call of register_task() 2 bytes for the return address will be pushed on the stack. There might be even more pushes, if Keil has to save registers on the stack (I don't know what Keil is doing here). Therefore at the end of the call chain the distance between the stack frames of task and main is (at least) 2 bytes larger than in your example. Or in other words: using the Keil call chain there will be 2 more bytes stack space in the stack frame of main. You already know how you can adopt the behaviour of the Keil example: if (setjmp(reg_task_exit_env) == 0) { SP += 2; // allocate more stack space e.g. for interrupts task_0(); SP -= 2; } This "hack" should just help to explain, why the Keil example is working and your source isn't. I really do not recommend using this very limited scheduler. > Thanks all, this was a good learning experience. Yes indeed, problems like this are a great exercise. ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 21:03 Message: Logged In: YES user_id=1375403 OK, thanks. All is clear now. Implementation of coroutines with setjmp/longjmp is not supported by the standard. I accept that my bug report was unfounded within the permitted usage of setjmp/longjmp. Sorry that it took so long for me to get this. (It still leaves the question why setjmp/longjmp is unsupported in SDCC in the first place). The fact that it works in my app is because there is no data on the stack in danger of becoming invalid. The longjmp from the scheduler (caller) to the task function (callee) grows the stack by just two bytes for a return address because I'm compiling with an option to *not* use the stack for local vars and function arguments. It was however the timer interrupt caused it to fail. Thanks for reminding me of the comment in the Keil code. I did see it, but didn't grasp the full significance of it. Thanks all, this was a good learning experience. Regards, Jim ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-10 17:49 Message: Logged In: YES user_id=175928 > Because in > that case longjmp adjusts SP towards a shrinking stack, so > corruption due to interrupts does not occur. Yes, longjmp works by shrinking the stack back to where it was when you called setjmp. You can't go the other way because the data there is no longer guaranteed valid. > BTW, I seem to be missing the point about the difference in > the Keil example and my usage I see none. I suspect they just got lucky that it works at all. Notice the comment in their example code for task0: "byte i,x; // note: local variable might be corrupted after delay" ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 17:21 Message: Logged In: YES user_id=1375403 OK, OK... I think I'm getting the message... You and Maarten are saying that I can only longjmp back from caller to callee and NOT vice versa. Is that what you are saying? Because in that case longjmp adjusts SP towards a shrinking stack, so corruption due to interrupts does not occur. In that case, great, I'll accept my usage as non-standard, continue with my "local solution" as working "step-out", withdraw my bug report, and thank you all for putting me straight. I don't see why I should run the risk of further overflows within the constraints I mentioned if I do and this thread is still open I'll report it. One last thing that I kept reminding myself to ask but forgot each time: why is longjmp/setjmp unsupported in SDCC? BTW, I seem to be missing the point about the difference in the Keil example and my usage, because I don't think I changed this particular part of the application. Appreciate if you could elaborate a bit on this. Finally, I really do appreciate all this feedback -- I'm learning a lot and enjoying it. Thanks, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 16:32 Message: Logged In: YES user_id=203539 > I'm not observing a stack overflow Yes, you do! Please accept two facts: a. with the given (undocumented) usage of setjmp/longjmp you end up with two consecutive stack frames. b. the stack from main() overflows the stack from task_0(), if an interrupt occurs during longjmp(). Please note: the stack will overflow too, if main uses the stack in any other way. Now you have 3 joices: 1. don't use setjmp/longjmp in this particular, undocumented way 2. increase the stack for main 3. disable interrupts in longjmp() > Your suggestion of adding a bit of extra stack space (6 > bytes in your proposal) works (I tried it!) Thanks, this shows that I'm right. Try it again with 2 bytes, then you've got the same stack consumption as in the Keil example (please re-read my last posting about this). > There should not be > intervening stack frames between scheduler and task. You call setjmp() both from main() and task_0(). main() and task_0() have different stack frames, consequently your scheduler and your task have different stack frames (see fact a). And now please stop dreaming. You can't use setjmp/longjmp that way and complain about the consequences. > One way of protection is disabling interrupts during the > execution of this critical section as in my initial posting, > which tackles the root cause. Well, I don't agree about "the root cause". But do it, if this is what you want. But be prepared for the next overflow at the next corner. And to be specific: I still don't see a bug in sdcc's longjmp(). ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-10 13:15 Message: Logged In: YES user_id=1375403 Thanks for shedding your light on this, Bernard. Initially I wasn't interested in the stack at all, I just wanted to take the Keil example (which uses the setjmp/longjmp construct) and expand it for my own purposes. It is when I started to get lock-ups that I ended up examining the stack. I'm not observing a stack overflow (i.e. the stack doesn't grow unbounded, it just alternates one level between scheduler (caller) and task (callee)). What I am seeing is a return address in the process of being copied from a jmpbuf struct onto the stack being corrupted by an interrupt coming in between. The interrupt pushes and pops a return address on stack just prior before longjmp could update the SP to made the stack state valid again. Your suggestion of adding a bit of extra stack space (6 bytes in your proposal) works (I tried it!) because in this particular case the timer interrupt only requires 2 bytes for its return address. It is however just circumventing the problem with the longjmp issue rather than tackling the root cause. If the ISR would save registers on the stack 6 bytes may not be enough. I think the issue with trying to give each task its own stack on the 8051 is the very constraint RAM space available for data and stack (up to 256 bytes). On the Microchip midrange PICs (one of my other targets) it wouldn't work at all because its stack is not accessible by code (as an aside, no setjmp/longjmp either). There are limitations with the setjmp/longjmp approach. A task can only return to the scheduler from the function that is called directly by the scheduler. (There should not be intervening stack frames between scheduler and task). With that constraint there is no limitation with regards to calling other functions from within a task. Sofar, I have not found this to be an issue (if I'm not mistaken the commercial Salvo RTOS has the same limitation). With respect to the SDCC implementation of longjmp, what I'm trying to get across is that it is vulnerable to stack corruption due to interrupts in the cricitical section indicated below: // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section One way of protection is disabling interrupts during the execution of this critical section as in my initial posting, which tackles the root cause. I'll repost the URL to the KEIL example of a simple scheduler for the 8051 here for anyone interested: http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D (scroll down to the end of the page and click on "Simple OS for 8051 microcontroller written in the Keil Software C compiler"); Thanks again, Bernard. Regards, Jim ---------------------------------------------------------------------- Comment By: Bernhard Held (bernhardheld) Date: 2005-11-10 11:27 Message: Logged In: YES user_id=203539 Please allow me to throw in my two cents. Jim, your problem is, that you don't care about the stack. You're jumping between two stack frames, the consequence is that you're punished by a plain stack overflow. I don't know exactly why the Keil example works. But one important difference might be, that task() is called via register_task(), which results in at least 2 extra bytes space on the stack. Could you please try this: if (setjmp(reg_task_exit_env) == 0) { SP += 6; // allocate more stack space e.g. for interrupts task_0(); SP -= 6; } IMHO you should write a clean tast scheduler, and all tasks should have their own stack. Once I've done it for DOS using TurboC, it's not that difficult. Furthermore it's much easier to read than the setjmp/longjmp constructs. And you can call the scheduler from within a deeply nested function. Finally: I can't find a bug in the old longjmp() function. It's just a problem of the correct stack frame (or enough space in all stack frames). Even with my_longjmp() all your problems will return, when you (or sdcc) start using the stack in task_0()! ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 22:03 Message: Logged In: YES user_id=1375403 Maarten, We seem to disagree on the interpretation of the spec. In my view "the function containing the invocation of the setjmp macrp" has NOT terminating (in fact you can see both caller and callee caught in their own endless loops, thus never terminating). I have adapted to the code from an example written for the Keil compiler (URL was in code sample). However regardless of whether my usage (which amounts to two coroutines calling each other) is within the spec or not, this still leaves the issue of the unprotected critical section in the longjmp() implementation. I'll leave it at your discretion how to deal, if at all, with this issue in SDCC. I'm happy with my solution and will move on to the next topic. I'm into SDCC for about a week now and this is the only real issue I came across (and the documentation did say that setjmp/longjmp was not supported). So overall I'm still impressed with the SDCC effort. Thanks very much for taking the time to look into this and keep up the good work. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |