From: Paul S. <pa...@cl...> - 2010-11-01 04:44:42
|
Hi All, I am using a Gumstix Overo with the Ubuntu 10.04 LTS armel distribution and this is working well mostly. I have come across a peculiar issue with the native GCC compiler which works find to build the linux kernel and many other projects but there is one problem with it. If I do something like this. float f; unsigned char *p; f = *((float *)p); this will generate code that causes an ILLEGAL INSTRUCTION runtime signal to be thrown at the process. Further debugging has shown that if the pointer p is word aligned then there is no issue, but if p is not word aligned then the ILLEGAL INSTRUCTION will be fired at the program (which if not caught will terminate the code). Does anyone know if there is a gcc compiler flag I can get to fix this issue. I did not have this issue with the previous version of GCC on my overo with the Ubuntu 9.04 build. But it seems to have cropped up with the upgrade. I had noticed this in the past in code that used the following gcc flags, -mfpu=neon -mfloat-abi=softfp -ffast-math, but if I removed these flags the problem went away. Now with the new compiler this problem sticks around with or without these flags. To assist anyone with further debugging on this issue, below is a small test.c example that will exhibit the issue. #include <stdio.h> #include <signal.h> #define R4(p) (*((float *)(p))) void sig_trap(int signo) { printf("Caught Signal - %d\n", signo); return; } /* sig_trap */ void register_signal_handlers() { signal(SIGILL, sig_trap); } int main(int argc, char **argv) { int i; unsigned char buf[64]; printf("Starting main\n"); register_signal_handlers(); for (i = 0; i > 64; i++) buf[i] = i; for (i = 0; i < 16; i++) printf("%d, value of i = %f\n", i, R4(&buf[i])); printf("Completed Test\n"); } Example output! root@n2:~# ./test Starting main 0, value of i = 0.000000 Caught Signal - 4 1, value of i = 0.000117 Caught Signal - 4 2, value of i = 0.620234 Caught Signal - 4 3, value of i = 1.780059 4, value of i = -0.398863 Caught Signal - 4 5, value of i = -1.699432 Caught Signal - 4 6, value of i = -1.962429 Caught Signal - 4 7, value of i = -1.995304 8, value of i = 2.019390 Caught Signal - 4 9, value of i = 2.002424 Caught Signal - 4 10, value of i = 2.000303 Caught Signal - 4 11, value of i = 2.000038 12, value of i = 2.029823 Caught Signal - 4 13, value of i = 2.003728 Caught Signal - 4 14, value of i = 2.000466 Caught Signal - 4 15, value of i = 2.000058 Completed Test Regards, Paul |
From: Ash C. <ash...@gm...> - 2010-11-01 08:04:12
|
Hi Paul, I don't have the knowledge to answer your question directly but I would direct you to the Linaro GCC page: https://launchpad.net/gcc-linaro Linaro has done some work on recent releases of GCC (4.4 & 4.5) to resolve toolchain bugs specifically related to ARM with a particular focus on the OMAP3|4 CPUs. I suspect they've discovered your bug already and possibly have a fix or might be interested in knowing about it if not. -Ash On Sun, Oct 31, 2010 at 9:44 PM, Paul Solomon <pa...@cl...> wrote: > Hi All, > > > > I am using a Gumstix Overo with the Ubuntu 10.04 LTS armel distribution and > this is working well mostly. I have come across a peculiar issue with the > native GCC compiler which works find to build the linux kernel and many > other projects but there is one problem with it. > > > > If I do something like this… > > > > float f; > > unsigned char *p; > > > > f = *((float *)p); > > > > this will generate code that causes an ILLEGAL INSTRUCTION runtime signal to > be thrown at the process. Further debugging has shown that if the pointer p > is word aligned then there is no issue, but if p is not word aligned then > the ILLEGAL INSTRUCTION will be fired at the program (which if not caught > will terminate the code). > > > > Does anyone know if there is a gcc compiler flag I can get to fix this > issue. I did not have this issue with the previous version of GCC on my > overo with the Ubuntu 9.04 build. But it seems to have cropped up with the > upgrade. I had noticed this in the past in code that used the following gcc > flags, -mfpu=neon -mfloat-abi=softfp -ffast-math, but if I removed these > flags the problem went away. Now with the new compiler this problem sticks > around with or without these flags. > > > > To assist anyone with further debugging on this issue, below is a small > test.c example that will exhibit the issue. > > > > #include <stdio.h> > > #include <signal.h> > > > > #define R4(p) (*((float *)(p))) > > > > void sig_trap(int signo) { > > printf("Caught Signal - %d\n", signo); > > return; > > } /* sig_trap */ > > > > void register_signal_handlers() { > > signal(SIGILL, sig_trap); > > } > > > > int main(int argc, char **argv) { > > int i; > > unsigned char buf[64]; > > > > printf("Starting main\n"); > > register_signal_handlers(); > > > > > > for (i = 0; i > 64; i++) > > buf[i] = i; > > > > for (i = 0; i < 16; i++) > > printf("%d, value of i = %f\n", i, R4(&buf[i])); > > > > printf("Completed Test\n"); > > } > > > > Example output! > > > > root@n2:~# ./test > > Starting main > > 0, value of i = 0.000000 > > Caught Signal - 4 > > 1, value of i = 0.000117 > > Caught Signal - 4 > > 2, value of i = 0.620234 > > Caught Signal - 4 > > 3, value of i = 1.780059 > > 4, value of i = -0.398863 > > Caught Signal - 4 > > 5, value of i = -1.699432 > > Caught Signal - 4 > > 6, value of i = -1.962429 > > Caught Signal - 4 > > 7, value of i = -1.995304 > > 8, value of i = 2.019390 > > Caught Signal - 4 > > 9, value of i = 2.002424 > > Caught Signal - 4 > > 10, value of i = 2.000303 > > Caught Signal - 4 > > 11, value of i = 2.000038 > > 12, value of i = 2.029823 > > Caught Signal - 4 > > 13, value of i = 2.003728 > > Caught Signal - 4 > > 14, value of i = 2.000466 > > Caught Signal - 4 > > 15, value of i = 2.000058 > > Completed Test > > > > Regards, > > > > Paul > > ------------------------------------------------------------------------------ > Nokia and AT&T present the 2010 Calling All Innovators-North America contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store > http://p.sf.net/sfu/nokia-dev2dev > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users > > |
From: Paul S. <pa...@cl...> - 2010-11-01 10:30:05
|
I have done a whole lot more experimenting to try and see what is going on here. It appears that the issue goes away if you specify the -msoft-float option. To try and get to the bottom of this I have had a look at the resultant assembly outputs from the same C code with different options. What appears to be happening is the if the code tried to dereference a float that is not word aligned in memory then it causes an instruction error at runtime if the code is compiled to use the hardware FPU. If it is using the software FPU emulation then there is no such problem. -----Original Message----- From: Ash Charles [mailto:ash...@gm...] Sent: Monday, 1 November 2010 7:04 PM To: General mailing list for gumstix users. Subject: Re: [Gumstix-users] native GCC 4.4.3 on overo Hi Paul, I don't have the knowledge to answer your question directly but I would direct you to the Linaro GCC page: https://launchpad.net/gcc-linaro Linaro has done some work on recent releases of GCC (4.4 & 4.5) to resolve toolchain bugs specifically related to ARM with a particular focus on the OMAP3|4 CPUs. I suspect they've discovered your bug already and possibly have a fix or might be interested in knowing about it if not. -Ash On Sun, Oct 31, 2010 at 9:44 PM, Paul Solomon <pa...@cl...> wrote: > Hi All, > > > > I am using a Gumstix Overo with the Ubuntu 10.04 LTS armel > distribution and this is working well mostly. I have come across a > peculiar issue with the native GCC compiler which works find to build > the linux kernel and many other projects but there is one problem with it. > > > > If I do something like this… > > > > float f; > > unsigned char *p; > > > > f = *((float *)p); > > > > this will generate code that causes an ILLEGAL INSTRUCTION runtime > signal to be thrown at the process. Further debugging has shown that > if the pointer p is word aligned then there is no issue, but if p is > not word aligned then the ILLEGAL INSTRUCTION will be fired at the > program (which if not caught will terminate the code). > > > > Does anyone know if there is a gcc compiler flag I can get to fix this > issue. I did not have this issue with the previous version of GCC on > my overo with the Ubuntu 9.04 build. But it seems to have cropped up > with the upgrade. I had noticed this in the past in code that used > the following gcc flags, -mfpu=neon -mfloat-abi=softfp -ffast-math, > but if I removed these flags the problem went away. Now with the new > compiler this problem sticks around with or without these flags. > > > > To assist anyone with further debugging on this issue, below is a > small test.c example that will exhibit the issue. > > > > #include <stdio.h> > > #include <signal.h> > > > > #define R4(p) (*((float *)(p))) > > > > void sig_trap(int signo) { > > printf("Caught Signal - %d\n", signo); > > return; > > } /* sig_trap */ > > > > void register_signal_handlers() { > > signal(SIGILL, sig_trap); > > } > > > > int main(int argc, char **argv) { > > int i; > > unsigned char buf[64]; > > > > printf("Starting main\n"); > > register_signal_handlers(); > > > > > > for (i = 0; i > 64; i++) > > buf[i] = i; > > > > for (i = 0; i < 16; i++) > > printf("%d, value of i = %f\n", i, R4(&buf[i])); > > > > printf("Completed Test\n"); > > } > > > > Example output! > > > > root@n2:~# ./test > > Starting main > > 0, value of i = 0.000000 > > Caught Signal - 4 > > 1, value of i = 0.000117 > > Caught Signal - 4 > > 2, value of i = 0.620234 > > Caught Signal - 4 > > 3, value of i = 1.780059 > > 4, value of i = -0.398863 > > Caught Signal - 4 > > 5, value of i = -1.699432 > > Caught Signal - 4 > > 6, value of i = -1.962429 > > Caught Signal - 4 > > 7, value of i = -1.995304 > > 8, value of i = 2.019390 > > Caught Signal - 4 > > 9, value of i = 2.002424 > > Caught Signal - 4 > > 10, value of i = 2.000303 > > Caught Signal - 4 > > 11, value of i = 2.000038 > > 12, value of i = 2.029823 > > Caught Signal - 4 > > 13, value of i = 2.003728 > > Caught Signal - 4 > > 14, value of i = 2.000466 > > Caught Signal - 4 > > 15, value of i = 2.000058 > > Completed Test > > > > Regards, > > > > Paul > > ---------------------------------------------------------------------- > -------- Nokia and AT&T present the 2010 Calling All Innovators-North > America contest Create new apps & games for the Nokia N8 for consumers > in U.S. and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly $6M in > marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish > to Ovi Store http://p.sf.net/sfu/nokia-dev2dev > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users > > ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ gumstix-users mailing list gum...@li... https://lists.sourceforge.net/lists/listinfo/gumstix-users |
From: Dave H. <dhy...@gm...> - 2010-11-01 15:21:13
|
Hi Paul, On Mon, Nov 1, 2010 at 3:29 AM, Paul Solomon <pa...@cl...> wrote: > I have done a whole lot more experimenting to try and see what is going on > here. It appears that the issue goes away if you specify the -msoft-float > option. To try and get to the bottom of this I have had a look at the > resultant assembly outputs from the same C code with different options. > > What appears to be happening is the if the code tried to dereference a float > that is not word aligned in memory then it causes an instruction error at > runtime if the code is compiled to use the hardware FPU. If it is using the > software FPU emulation then there is no such problem. The simple solution is to not use unaligned floats. Any code that relies on unaligned floats is generally buggy. It simply is not guaranteed to word on all architectures. Some architectures will cause similar behaviour for just trying to access unaligned 32-bit data. ARM also doesn't support reading unaligned 32-bit data, but because so much code is bad, they added a handler to fix things up and make it look like the the right thing happened whenever the illegal operation takes places. These fixups are extremely expensive in terms of how many cycles it takes for them to execute. In ARM land, there is a proc entry that you can use to monitor and even cause programs to abort in a similar fashion to what you're seeing. See the file "unaligned-memory-access.txt" in the linux Documentation directory, and also see arm/mem_alignment. -- Dave Hylands Shuswap, BC, Canada http://www.DaveHylands.com/ |
From: Paul S. <pa...@cl...> - 2010-11-02 03:51:31
|
Hi Dave, Thanks for the input. After reading that I tend to agree that the code I am using should be fixed to not try and access unaligned data. In fact I have done this already with a bunch of memcpy's to make it run correctly. The actual place where it is being used is in a network protocol reader where there is a binary stream which gets copied into a memory buffer and then it contains floats, ints, doubles, etc. referring to different bits of data. It depends on where in the buffer the stream happens to be as to whether there will be an alignment issue or not. So it would have thought that the compiler should be capable of knowing that it is accessing data from a char pointer and so treating it as an unaligned transfer etc.. It looks like you can get this behaviour with structs from reading the link you sent if you add the __attribute__((packed)) to it, however I am not sure how this would translate to a unsigned char pointer. Anyhow.. problem solved now (more or less) time to mode on. Paul -----Original Message----- From: Dave Hylands [mailto:dhy...@gm...] Sent: Tuesday, 2 November 2010 2:21 AM To: General mailing list for gumstix users. Subject: Re: [Gumstix-users] native GCC 4.4.3 on overo Hi Paul, On Mon, Nov 1, 2010 at 3:29 AM, Paul Solomon <pa...@cl...> wrote: > I have done a whole lot more experimenting to try and see what is > going on here. It appears that the issue goes away if you specify the > -msoft-float option. To try and get to the bottom of this I have had a > look at the resultant assembly outputs from the same C code with different options. > > What appears to be happening is the if the code tried to dereference a > float that is not word aligned in memory then it causes an instruction > error at runtime if the code is compiled to use the hardware FPU. If > it is using the software FPU emulation then there is no such problem. The simple solution is to not use unaligned floats. Any code that relies on unaligned floats is generally buggy. It simply is not guaranteed to word on all architectures. Some architectures will cause similar behaviour for just trying to access unaligned 32-bit data. ARM also doesn't support reading unaligned 32-bit data, but because so much code is bad, they added a handler to fix things up and make it look like the the right thing happened whenever the illegal operation takes places. These fixups are extremely expensive in terms of how many cycles it takes for them to execute. In ARM land, there is a proc entry that you can use to monitor and even cause programs to abort in a similar fashion to what you're seeing. See the file "unaligned-memory-access.txt" in the linux Documentation directory, and also see arm/mem_alignment. -- Dave Hylands Shuswap, BC, Canada http://www.DaveHylands.com/ -------------------------------------------------------------------------- ---- Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ gumstix-users mailing list gum...@li... https://lists.sourceforge.net/lists/listinfo/gumstix-users |
From: Dave H. <dhy...@gm...> - 2010-11-02 05:59:51
|
Hi Paul, On Mon, Nov 1, 2010 at 8:51 PM, Paul Solomon <pa...@cl...> wrote: > Hi Dave, > > Thanks for the input. After reading that I tend to agree that the code I > am using should be fixed to not try and access unaligned data. In fact I > have done this already with a bunch of memcpy's to make it run correctly. Yeah - that sounds like the right way (read - most portable). > The actual place where it is being used is in a network protocol reader > where there is a binary stream which gets copied into a memory buffer and > then it contains floats, ints, doubles, etc. referring to different bits > of data. It depends on where in the buffer the stream happens to be as to > whether there will be an alignment issue or not. Yeah - so copying the data using mempcy from the unaligned buffer into an aligned buffer is the best way of dealing with this type of data. > So it would have thought that the compiler should be capable of knowing > that it is accessing data from a char pointer and so treating it as an > unaligned transfer etc.. > > It looks like you can get this behaviour with structs from reading the > link you sent if you add the __attribute__((packed)) to it, however I am > not sure how this would translate to a unsigned char pointer. When you start using stuff like __attribute__((packed)) you're basically telling the compiler that you know what you're doing. It won't necessarily create structures where the fields can be directly dereferenced. -- Dave Hylands Shuswap, BC, Canada http://www.DaveHylands.com/ |