From: SourceForge.net <no...@so...> - 2005-11-09 20:29:39
|
Bugs item #1350513, was opened at 2005-11-07 18:02 Message generated for change (Comment added) made by maartenbrock You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Run Time Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) >Assigned to: Maarten Brock (maartenbrock) Summary: Possible bug in longjmp() Initial Comment: I believe I have found a bug in the implementation of longjmp(). There appears to be critical section in that code, during which an interrupt can clobber up the stack frame under modification. I have adapted the code by temporarily disabling interrupts. That took care of the spurious lock-ups in my app. int my_longjmp (unsigned char *bp, int rv) { unsigned char lsp; bit save_EA; save_EA = EA; EA = 0; // start of critical section lsp = *(bp+2); *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; // end of critical section EA = save_EA; return rv; } Regards, Jim Cramer ji...@ji... ---------------------------------------------------------------------- >Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 21:29 Message: Logged In: YES user_id=888171 Jim, First thanks for the code example. It does however violate the spec. This is what it says in 7.13.2.1 #2: The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution (see note 208) in the interim, or if the invocation of the setjmp macro was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. And note 208 says: For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls. And that longjmp is exactly what your code does. I'll leave the bug report open to remind me to create a safe asm only version of setjmp/longjmp but expect to see no change in functionality. Btw. if you had logged in when you opened the bug report you would have been able to upload files to it. Now the system treats you as just some bystander. Good luck, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 13:46 Message: Logged In: YES user_id=1375403 Maarten, I'm attaching the smallest app that demonstrates the issue while running on actual hardware. I have added comments at the longjmp calls of the before and after values of SP in longjmp that I observed running in a simulator. Please let me know if you need further explanation. Regards, Jim Cramer P.S. I'm not able to attach anything on this web screen. So I'll try and paste it below: // Minimal working app demonstrating potential bug in SDCC longjmp // code adapted from http://www.phyton.com/htdocs/main.shtml?downloads.shtml~D #include <8052.h> #include <setjmp.h> typedef unsigned char byte; int my_longjmp (unsigned char data *bp, int rv); //#define LONGJMP longjmp // <<== locks up #define LONGJMP my_longjmp // <<== no lock up void task_0(void); void timer2 (void) interrupt 5 _naked; data jmp_buf reg_task_exit_env; data jmp_buf kernel_env; data jmp_buf task0_env; volatile data byte rtos_tick; sfr at 0xC9 T2MOD; void main(void) { byte wait_count; T2CON = 0x04; // load T2 control register T2MOD = 0x00; TH2 = 0xFA; // Set up for 1ms timer tick with RCAP2H = 0xFA; // Fosc = 18.432Mhz and 12 clock period per T2 cycle TL2 = 0x00; RCAP2L = 0x00; if (setjmp(reg_task_exit_env) == 0) task_0(); rtos_tick = 0; // reset flag ET2 = 1; // enable T2 TR2 = 1; // start T2 EA = 1; // global interrupt enable P2_6 = 1; // LED on port P2.6 // --------------- // Scheduler loop // --------------- while (1) { if (setjmp(kernel_env) == 0) // Invoke task LONGJMP(task0_env,1); // SP before = 0x24, SP after = 0x26 //^^^^^^^^^^^^^^^^^^^ <=== stack corruption after interrupt wait_count = 0; // spend some time in other tasks; for demo purposes we just use a delay while (wait_count < 200) { while (rtos_tick == 0); wait_count++; rtos_tick = 0; } } } void task_0() { bit flag = 0; if (setjmp(task0_env) == 0) LONGJMP(reg_task_exit_env,1); // SP before = 0x26, SP after = 0x24 : no issue // Perform task initialisation here //task0 loop while (1) { // Do task0 work: toggle LED on Port 2.6 if (flag) { P2_6 = 0; flag = 0; } else { P2_6 = 1; flag = 1; } // Yield to scheduler if (setjmp(task0_env) == 0) LONGJMP(kernel_env,1); } } void timer2 (void) interrupt 5 _naked { _asm clr TF2 mov _rtos_tick,#1 reti _endasm; } //-------------------------------------------- int my_longjmp (unsigned char data *bp, int rv) { unsigned char lsp; lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; return rv; } ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-09 01:12 Message: Logged In: YES user_id=1375403 Maarten, I understand that you will require more evidence before assuming a bug in the core library. Apologies if I spoke too firmly. My excuse is that I'm a newbie here and with "embedded" (old-hand on the desktop, though -- going back to 6502 / Apple II - shows my age). In my view the bug in the code is not in the functionality but in leaving a cricital section unprotected against interrupts. I agree with you that the reordering is dangerous or downright wrong if any local vars or function arguments are stack based. That's why you can't use --auto-stack. In fact, even with --auto-stack off there may still be an issue as there are some calls (and thus stack usage) in the generated code to resolve the generic pointer passed as the first argument. I'm still researching that one. For now, I have specifically declared the first argument as a "data" pointer instead of a generic pointer to avoid the generated calls. In my view the re-ordering removes the critical section altogether and thus the need to disable interrupts - hence a better solution my initial approach. I have looked up the section in the C99 standard you referred to. As fas as I can tell I'm meeting all stated requirements. I will try and create a small app to demonstrate/explain the issue in more detail and come back on it. I'm just a hobbyist and my app is for my personal education. The design is not critical (except for my understanding). I very much appreciate the huge effort that must have gone into developing SDCC and if I can help to further root out a potential bug I'lll gladly do so. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-09 00:19 Message: Logged In: YES user_id=888171 Jim, If you frequent the 8052.com forum, you would have read many times: testing does not prove the absence of bugs, only the absence of known bugs. I still consider reordering even more dangerous than the original code. In the original code I also still see no bug and I can only guess you're using it in an unspecified way. If you can please read 7.13.2.1 #2 of the ISO C99 spec. In short: make sure that when you execute longjmp() you have not returned from the function that called setjmp(). I'm not saying all this to annoy you. It's my warning that your design might be wrong. If you think I misunderstand, then please try to create a sample application that shows the bug but is otherwise as small as possible and upload it here. With the same respect, I'm trying to work with you, not against you. Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 20:04 Message: Logged In: YES user_id=1375403 Maarten, I already figured that you can't use --stack-auto (i.e. must use non-re-entrant code) because of the stack manipulation. Within that constraint Jim P's solution (as well as my less elegant one) works. The proof is here sitting on my desk flashing LEDs without lock-ups (where as previously it would lock up in about a second). Longjmp/setjmp is very valuable for co-operative schedulers such as I'm using now. With respect, pointing to documentation that claims that the functionality is not supported does not help to improve this otherwise great SDCC stuff. Jim P's solution is elegant, does not alter the logic of the code (just some re-ordering) and is in my opinion a bug fix that I would strongly urge to be adopted. If that still leaves setjmp/longjmp unsupported, so be it. At least some of us can take the risk and start using it ;-) Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:51 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 19:42 Message: Logged In: YES user_id=888171 Jim, 4) You have a point there. Critical might save on the stack, but save_EA could also end up on the stack in --stack-auto (allthough not in this simple case). I guess the best solution is to use inline assembly. 5) If the interrupt occurs after the first write and before the update of SP, the ISR will use stack space (far) above lsp. Nothing gets overwritten. If lsp points to a piece of stack above SP those values are no longer valid. Anything could have written there, not only ISR's but also normal program flow. jimatjtan, Reordering makes things worse. If --stack-auto is used, lsp, bp and rv can be allocated on stack. All in all, there probably was a reason why someone put in the manual that setjmp/longjmp are not supported. Greets, Maarten ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 19:26 Message: Logged In: YES user_id=1375403 Hi Jim P., That is an elegant and better solution than I proposed. I'll adopt it in my app. Thanks a lot. Regards, Jim Cramer ---------------------------------------------------------------------- Comment By: Jim Paris (jimatjtan) Date: 2005-11-08 19:16 Message: Logged In: YES user_id=175928 I think reordering it would fix the problem: lsp = *(bp+2); SP = lsp; *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:55 Message: Logged In: YES user_id=1375403 Forgot to mention under 5: happens in case the values written are in a location above the current SP, e.g: xxx <--current SP low-address-byte high-address-byte <-- updated SP Problems arises if interrupt comes in before SP is updated and if ISR uses the stack. Regards, Jim ---------------------------------------------------------------------- Comment By: remarc (remarc) Date: 2005-11-08 18:32 Message: Logged In: YES user_id=1375403 1) Sorry Maarten, newbie here. I have now registered and logged in. 2) by accident. 3) I have done so and it solved my problem. 4) Not sure about that because longjmp() pulls tricks on the stack. 5) Let me explain: the code fragment from the original longjmp() that I have labelled "critical section" here first writes a return address (saved from a previous call to a corresponding setjmp()) to the stack and then updates the stack pointer: *((unsigned char data *) lsp) = *bp++; *((unsigned char data *) lsp - 1) = *bp; SP = lsp; If an interrupt comes in after the first or second statement AND before SP gets updated in the third statement AND if the ISR starts saving values on the stack THEN the two stack bytes just modified get clobbered by the ISR. As a result longjmp() doesn't return to the address set by setjmp but jumps into some random code location. Am I making sense? 6) Default options, small memory mode, non-reentrant code Regards, Jim ---------------------------------------------------------------------- Comment By: Maarten Brock (maartenbrock) Date: 2005-11-08 13:55 Message: Logged In: YES user_id=888171 Jim, 1) It helps (YOU) if you log in before posting a bug. You get email notifications of changes to this bug report, you can change settings, you can upload files, you won't get spammed because your email address is in the open, ... 2) No need to double post the PS 3) If you want to override a function in the library you can just write it and include it in your project. The linker will search your project before it starts looking in the library. 4) Instead of accessing EA you can also use the keyword critical 5) I do not understand how an interrupt can clobber up the stack frame in this case. 6) What options to SDCC did you use? Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:28 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-11-07 18:06 Message: Logged In: NO P.S. This applies to the MCS51 library -- Jim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100599&aid=1350513&group_id=599 |