From: Matt L. <mat...@gm...> - 2008-02-20 20:45:56
|
Hi All, I'm trying to enable SSE support for VNL on my machine, and I've encountered a bug. The vnl test named "test_alignment" generates several failures, followed by a segmentation fault. The failures seem sporadic and seem to be related to a bad choice of epsilon when comparing results. These failure appear in several dashboard builds. The more serious problem is the segfault, which I don't see the on the dashboard. I've enabled SSE on my dashboard build, so it should show up tomorrow under build name Linux-2.6_gcc-4.1.3_-Wall at lems.brown.edu. Here is an explanation of what causes the crash based on my limited understanding of SSE. The crash occurs in test_alignment.cxx: line 24 called from within the nested loops when vector size = 4, matrix offset = 0, vector offset = 0, and result offset = 1. I've traced the problem to vnl_sse.h: line 555 which calls _mm_load_ps on the result vector which is not 16-byte aligned because of the offset in the test. I'm not sure why this isn't failing on the other dashboard builds. The SSE code looks correct as long as the vectors have a 16-byte aligned address. It handles the case where the data block does not evenly divide into 16-byte blocks, but does not handle the case where the starting address is not 16-byte aligned. Usually, you don't have to worry about this case because the sse allocator function allocates arrays of aligned memory. However, as the test case indicates, it is possible to create a vnl_vector_ref that uses an arbitrary block of data. Does this make sense or is something else wrong? I'm I the only one having trouble with SSE support or is this problem more widespread? I don't have a sense of whether people are using it with no trouble or simple disabling it. Thanks, Matt |
From: Ian S. <ian...@st...> - 2008-02-21 11:10:48
|
VNL_SEE was a experimental addition as part of a summer studentship here at Imorphics last year, and I think the student was optimistic about the code's robustness under the conditions we were likely to use it. I've been occasionally taking a look at the code - mostly writing test cases - and I think I know what I need to do to fix it, but it will take me some time. The code needs to support 4-byte aligned doubles, because we use them all the time, but switching efficiently between the fastest 16-byte aligned code and the slower 4-byte aligned code is tricky. Unless you want to work on the SSE code itself, I would leave VNL_SSE disabled at the moment. Ian. Matt Leotta wrote: > Hi All, > > I'm trying to enable SSE support for VNL on my machine, and I've > encountered a bug. The vnl test named "test_alignment" generates > several failures, followed by a segmentation fault. The failures seem > sporadic and seem to be related to a bad choice of epsilon when > comparing results. These failure appear in several dashboard builds. > The more serious problem is the segfault, which I don't see the on the > dashboard. I've enabled SSE on my dashboard build, so it should show > up tomorrow under build name Linux-2.6_gcc-4.1.3_-Wall at > lems.brown.edu. > > Here is an explanation of what causes the crash based on my limited > understanding of SSE. The crash occurs in > test_alignment.cxx: line 24 > called from within the nested loops when vector size = 4, matrix > offset = 0, vector offset = 0, and result offset = 1. I've traced the > problem to > vnl_sse.h: line 555 > which calls _mm_load_ps on the result vector which is not 16-byte > aligned because of the offset in the test. > > I'm not sure why this isn't failing on the other dashboard builds. > The SSE code looks correct as long as the vectors have a 16-byte > aligned address. It handles the case where the data block does not > evenly divide into 16-byte blocks, but does not handle the case where > the starting address is not 16-byte aligned. Usually, you don't have > to worry about this case because the sse allocator function allocates > arrays of aligned memory. However, as the test case indicates, it is > possible to create a vnl_vector_ref that uses an arbitrary block of > data. > > Does this make sense or is something else wrong? I'm I the only one > having trouble with SSE support or is this problem more widespread? I > don't have a sense of whether people are using it with no trouble or > simple disabling it. > > Thanks, > Matt > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Vxl-maintainers mailing list > Vxl...@li... > https://lists.sourceforge.net/lists/listinfo/vxl-maintainers |
From: Matt L. <mat...@gm...> - 2008-02-21 13:07:59
|
Thanks Ian, I think I understand the situation better now. I'll disable SSE for now. However, I'm wondering if there is a deeper issue here caused by vnl_vector_ref. Certainly if you stick with vnl_vector and vnl_vector_fixed you can guarantee the alignment of memory because these classes allocate their own memory. However, with vnl_vector_ref I can create a vector using any block of memory. Isn't it the case that I could create a vnl_vector_ref with memory that isn't even 4-byte aligned? It seems that this case is unlikely, but maybe there should be at least an assert statement to catch this. --Matt On Thu, Feb 21, 2008 at 6:10 AM, Ian Scott <ian...@st...> wrote: > > VNL_SEE was a experimental addition as part of a summer studentship here > at Imorphics last year, and I think the student was optimistic about the > code's robustness under the conditions we were likely to use it. I've > been occasionally taking a look at the code - mostly writing test cases > - and I think I know what I need to do to fix it, but it will take me > some time. > > The code needs to support 4-byte aligned doubles, because we use them > all the time, but switching efficiently between the fastest 16-byte > aligned code and the slower 4-byte aligned code is tricky. > > Unless you want to work on the SSE code itself, I would leave VNL_SSE > disabled at the moment. > > Ian. > > > > Matt Leotta wrote: > > Hi All, > > > > I'm trying to enable SSE support for VNL on my machine, and I've > > encountered a bug. The vnl test named "test_alignment" generates > > several failures, followed by a segmentation fault. The failures seem > > sporadic and seem to be related to a bad choice of epsilon when > > comparing results. These failure appear in several dashboard builds. > > The more serious problem is the segfault, which I don't see the on the > > dashboard. I've enabled SSE on my dashboard build, so it should show > > up tomorrow under build name Linux-2.6_gcc-4.1.3_-Wall at > > lems.brown.edu. > > > > Here is an explanation of what causes the crash based on my limited > > understanding of SSE. The crash occurs in > > test_alignment.cxx: line 24 > > called from within the nested loops when vector size = 4, matrix > > offset = 0, vector offset = 0, and result offset = 1. I've traced the > > problem to > > vnl_sse.h: line 555 > > which calls _mm_load_ps on the result vector which is not 16-byte > > aligned because of the offset in the test. > > > > I'm not sure why this isn't failing on the other dashboard builds. > > The SSE code looks correct as long as the vectors have a 16-byte > > aligned address. It handles the case where the data block does not > > evenly divide into 16-byte blocks, but does not handle the case where > > the starting address is not 16-byte aligned. Usually, you don't have > > to worry about this case because the sse allocator function allocates > > arrays of aligned memory. However, as the test case indicates, it is > > possible to create a vnl_vector_ref that uses an arbitrary block of > > data. > > > > Does this make sense or is something else wrong? I'm I the only one > > having trouble with SSE support or is this problem more widespread? I > > don't have a sense of whether people are using it with no trouble or > > simple disabling it. > > > > Thanks, > > Matt > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Vxl-maintainers mailing list > > Vxl...@li... > > https://lists.sourceforge.net/lists/listinfo/vxl-maintainers > > |
From: Ian S. <ian...@st...> - 2008-02-21 14:29:20
|
Matt Leotta wrote: > Thanks Ian, > > I think I understand the situation better now. I'll disable SSE for > now. However, I'm wondering if there is a deeper issue here caused by > vnl_vector_ref. Certainly if you stick with vnl_vector and > vnl_vector_fixed you can guarantee the alignment of memory because > these classes allocate their own memory. However, with vnl_vector_ref > I can create a vector using any block of memory. Isn't it the case > that I could create a vnl_vector_ref with memory that isn't even > 4-byte aligned? It seems that this case is unlikely, but maybe there > should be at least an assert statement to catch this. It isn't a problem with vnl_vector_ref so much as problem with C++/C. I think it is one of those unwritten rules - don't ever allocate/use a POD unless it is aligned to sizeof(POD). The compiler makes it pretty hard unless you do a C-style or reinterpret cast of a void *, at which point the standard says you are into implementation- or un-defined territory. The only possible reason I can think of for not putting an assert in, is that there could be platforms out there that don't require any alignment. Ian. |
From: Amitha P. <ami...@us...> - 2008-02-21 16:00:33
|
Ian Scott wrote: > The only possible reason I can think of for not putting an assert in, is > that there could be platforms out there that don't require any alignment. One of those platforms would appear to be the Intel 32-bit platform: #include <iostream> int main() { char a[16]; double* d = (double*)(&a[1]); *d = 5; std::cout << "d="<<*d<<"\n"; } However, as you also wrote, I don't think one can get a un-reinterpreted pointer to an unaligned double. If there is a test for vnl_vector_ref that interprets a non-aligned memory location as a double, we should take that test out, because it is definitely not portable C++. I think a sufficient solution is an assert on vnl_see code, and a caveat lector message in the vector ref documentation that reinterpreting double refs to unaligned memory is the users problem. Amitha. |