You can subscribe to this list here.
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
(30) |
Sep
(1) |
Oct
(10) |
Nov
(8) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
|
Feb
(9) |
Mar
(3) |
Apr
(1) |
May
(2) |
Jun
(2) |
Jul
(73) |
Aug
(145) |
Sep
(32) |
Oct
(45) |
Nov
(4) |
Dec
(76) |
2014 |
Jan
(24) |
Feb
(92) |
Mar
(27) |
Apr
(15) |
May
(57) |
Jun
(49) |
Jul
(105) |
Aug
(125) |
Sep
(7) |
Oct
(19) |
Nov
(70) |
Dec
(4) |
2015 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
(8) |
Jun
|
Jul
(40) |
Aug
(29) |
Sep
|
Oct
(8) |
Nov
(1) |
Dec
(7) |
2016 |
Jan
(12) |
Feb
(7) |
Mar
(8) |
Apr
(4) |
May
(20) |
Jun
(4) |
Jul
(38) |
Aug
(44) |
Sep
(11) |
Oct
(10) |
Nov
(13) |
Dec
(4) |
2017 |
Jan
|
Feb
(7) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Karl R. <ru...@iu...> - 2016-07-23 08:15:48
|
Hi, > yes. this seems to be the case. if i force out-of-order CSR into > in-order CSR everything seems to work. Can't see the documentation > explicitly mentioning this if this is the case indeed. > > Karl, can you please confirm only in-order CSRs are supported? Thanks! out-of-order CSR works for SpMVs, but not for sparse matrix-matrix multiplies. Parallel algorithms usually work better for in-order data layouts. The performance penalty for out-of-order data is almost always too high to justify any extra kernels for out-of-order data. Best regards, Karli PS: Yes, the documentation should be more explicit about this. |
From: Karl R. <ru...@iu...> - 2016-07-23 08:10:50
|
Hi, > PS > (4) column indices admit out-of-order placements of elements within each > row. Column indices *have* to be in ascending order for sparse matrix-matrix multiplication. Best regards, Karli > > Thank you. > -Dmitriy > > On Fri, Jul 22, 2016 at 12:56 PM, Dmitriy Lyubimov <dl...@gm... > <mailto:dl...@gm...>> wrote: > > I think I still am getting seg faults on attempt to multiply > matrices even without conversion back (larger arguments, 3k x 1k) > > I re-wrote another alternative transformation procedure and see > nothing wrong with it. Both Andrew's code and mine fail with the > same symptoms. > > Karl, can we verify assumptions about the format: > > (1) the compressed_marix.set method expects host memory pointers. > (2) the format is compressed row storage (CSR). Documentation never > says explicitly that, and actually seems to have errors in size of > elements and jumper arrays (it says jumper array has to be cols+1 > long wheres in CSR it shoud actually be rows + 1 long, right? ) > (3) the element sizes of jumper and column indices arrays are 32 bit > and are in little endian order (at least for the open MP backend). > > Right now I can't even get open mp sparse multiplication work > although CSR format is not rocket science at all. Don't see a > problem anywhere. Tried to read Vienna's code to converm the > assumptions above, but this seems to be pretty elusive for the time > being. > |
From: Karl R. <ru...@iu...> - 2016-07-23 08:10:05
|
Hi Dmitriy, > Karl, can we verify assumptions about the format: > > (1) the compressed_marix.set method expects host memory pointers. yes > (2) the format is compressed row storage (CSR). Documentation never says > explicitly that, and actually seems to have errors in size of elements > and jumper arrays (it says jumper array has to be cols+1 long wheres in > CSR it shoud actually be rows + 1 long, right? ) yes > (3) the element sizes of jumper and column indices arrays are 32 bit and > are in little endian order (at least for the open MP backend). elements are in whatever order your machine supports. Best regards, Karli > Right now I can't even get open mp sparse multiplication work although > CSR format is not rocket science at all. Don't see a problem anywhere. > Tried to read Vienna's code to converm the assumptions above, but this > seems to be pretty elusive for the time being. > > > On Fri, Jul 22, 2016 at 10:26 AM, Andrew Palumbo <ap...@ou... > <mailto:ap...@ou...>> wrote: > > Yep thats it. Oh wow- well thats just embarrassing 😊. > > > Thanks very much for your time, Karl- much appreciated. > > > Andy > > ------------------------------------------------------------------------ > *From:* Karl Rupp <ru...@iu... <mailto:ru...@iu...>> > *Sent:* Friday, July 22, 2016 12:39:20 PM > *To:* Andrew Palumbo; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a > compressed_matrix > Hi, > > your second and third arguments to memory_read() are incorrect: > The second argument is the offset from the beginning, the third > argument > is the number of bytes to be read. Shifting the zero to the second > position fixes the snippet (plus correcting the loop bounds when > printing at the end) :-) > > Best regards, > Karli > > > > On 07/22/2016 08:51 AM, Andrew Palumbo wrote: > > a couple of small mistakes in the previous c++ file: > > > > > > The memory_read(..) call should be: > > > > > > // read data back into our product buffers > > viennacl::backend::memory_read(handle1, product_size_row * 4, 0, > > product_row_ptr, false); > > viennacl::backend::memory_read(handle2, product_NNz * 4, 0, > > product_col_ptr, false); > > viennacl::backend::memory_read(handle, product_NNz * 8, 0, > > product_values_ptr, false); > > > > > > (read product_NNz * x bytes instead of product_size_row * x) > > > > > > I've attached the corrected file. > > > > > > Thanks > > > > > > Andy > > > > ------------------------------------------------------------------------ > > *From:* Andrew Palumbo <ap...@ou... <mailto:ap...@ou...>> > > *Sent:* Thursday, July 21, 2016 11:03:59 PM > > *To:* Karl Rupp; viennacl-devel > > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > > > Hello, > > > > > > I've mocked up a sample of the compressed_matrix multiplication that > > I've been working with javacpp on in C++. I am seeing the same type of > > memory errors when I try to read the data out of product, and into the > > output buffers as I was with javacpp. By printing the matrix to stdout > > as in the compressed_matrix example we can see that there are values > > there, and they seem reasonable, but when i use > > backend::memory_read(...) to retrive the buffers, I'm getting values > > consistent with a memory error, and similar to what i was seeing in the > > javacpp code. Maybe I am not using the handles correctly? Admittedly > > my C++ is more than rusty, but I believe I am referencing the buffers > > correctly in the output. > > > > > > Below is the output of the attached file: sparse.cpp > > > > > > Thanks very much, > > > > > > Andy > > > > > > > > ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: > > (1, 2) 0.329908 > > (1, 3) 0.0110522 > > (1, 4) 0.336839 > > (2, 5) 0.0150778 > > (2, 7) 0.0143518 > > (3, 3) 0.217256 > > (3, 6) 0.346854 > > (3, 9) 0.45353 > > (4, 3) 0.407954 > > (4, 6) 0.651308 > > (5, 2) 0.676061 > > (5, 3) 0.0226486 > > (5, 4) 0.690264 > > (6, 5) 0.0998838 > > (6, 7) 0.0950744 > > (7, 2) 0.346173 > > (7, 3) 0.0115971 > > (7, 4) 0.353446 > > (7, 9) 0.684458 > > (8, 5) 0.0448123 > > (8, 7) 0.0426546 > > (8, 9) 0.82782 > > (9, 5) 0.295356 > > (9, 7) 0.281134 > > > > row jumpers: [ > > -36207072,32642,-39708721,32642,6390336,0,2012467744 <tel:2012467744>,32767,2012467968 > <tel:2012467968>,32767,4203729,] > > col ptrs: [ > > 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] > > elements: [ > > 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] > > > > > > and similarly for multiplication of 2 1x1 matrices: > > > > Result: > > > > ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: > > (0, 0) 0.117699 > > > > row jumpers: [ > > -717571424,32767,] > > col ptrs: [ > > 6386240,] > > elements: [ > > 0.289516,6.9479e-310,] > > > > > > > > > > ------------------------------------------------------------------------ > > *From:* Andrew Palumbo <ap...@ou... <mailto:ap...@ou...>> > > *Sent:* Wednesday, July 20, 2016 5:40:31 PM > > *To:* Karl Rupp; viennacl-devel > > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > > > Oops, sorry about not cc'ing all. > > > > > > I do not get correct data back for a (Random.nextDouble() populated) 1 x > > 1 Matrix. > > > > > > A: > > > > Row Pointer: [0, 1 ] > > > > Col Pointer: [0 ] > > element Pointer: [0.6465821602909256 ] > > > > > > B: > > > > > > Row Pointer: [0, 1 ] > > Col Pointer: [0 ] > > element Pointer: [0.9513577109193919 ] > > > > > > C = A %*% B > > > > Row Pointer: [469762248, 32632] > > Col Pointer: [469762248 ] > > element Pointer: [6.9245198744523E-310 ] > > > > > > ouch. > > > > > > It looks like I'm not copying the Buffers correctly at all. I'm may be > > using the javacpp buffers incorrectly here, or I have possibly wrapped > > the viennacl::backend::memory_handle class incorrectly, so I'm using a > > pointer to the wrong memory from eg. viennacl::compressed_matrix::handle. > > > > > > I mentioned before that the multiplication completed in on small <~300 x > > 300 matrices because if I try to multiply two larger sparse matrices, an > > err the JVM crashes with a SIGSEGV. > > > > > > Since this code is all wrapped with javacpp, I don't really have a small > > sample that I can show you (not going to dump a whole bunch of code on > > you). > > > > > > I'll keep trying to figure it out. Pretty sure the problem is on my end > > here �� I really mainly wanted to ask you if I was using the correct > > methods at this point, or if there was anything very obviously that I > > was doing wrong. > > > > > > Thanks a lot for your help! > > > > > > Andy > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > *From:* Karl Rupp <ru...@iu... <mailto:ru...@iu...>> > > *Sent:* Wednesday, July 20, 2016 5:00:36 PM > > *To:* Andrew Palumbo; viennacl-devel > > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > Hi, > > > > please keep viennacl-devel in CC: > > > > Just to clarify: Do you get incorrect values for a 1-by-1 matrix as > > indicated in your sample data? In your previous email you mentioned that > > results are fine for small matrices... > > > > I'm afraid I can only guess at the source of the error with the > > informations provided. Any chance that you can provide a standalone code > > to reproduce the problem with reasonable effort? > > > > Best regards, > > Karli > > > > > > > > On 07/20/2016 10:16 PM, Andrew Palumbo wrote: > >> Thanks so much for your quick answer! > >> > >> > >> I actually am sorry to say that I made a mistake when writing the last > >> email, I copied the wrong signature from the VCL documentation, and then > >> the mistake propagated through the rest of the e-mail. > >> > >> > >> I am actually using viennacl::backend::memory_read(). > >> > >> > >> Eg, for the row_jumpers and column_idx I read use: > >> > >> @Name("backend::memory_read") > >> public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer, > >> int bytes_to_read, > >> int offset, > >> IntPointer ptr, > >> boolean async); > >> > >> and for the Values: > >> > >> > >> @Name("backend::memory_read") > >> public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer, > >> int bytes_to_read, > >> int offset, > >> DoublePointer ptr, > >> boolean async); > >> > >> And then call: > >> > >> > >> memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) > >> memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) > >> memoryReadDouble(element_handle, NNz *8,0, values,false) > >> > >> > >> and after convetring them to java.nio.Buffers, am getting results like: > >> > >> > >> rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310 > >> > >> > >> Have also tried reading into BytePointers similarly with the same type > >> of results. I know that the use of Javacpp obfuscates what the problem > >> may be. But I believe the Memorry is properly allocated. > >> > >> > >> > >> Sorry for the mistake. > >> > >> > >> Thanks, > >> > >> > >> Andy > >> > >> > >> ------------------------------------------------------------------------ > >> *From:* Karl Rupp <ru...@iu... <mailto:ru...@iu...>> > >> *Sent:* Wednesday, July 20, 2016 3:50:07 PM > >> *To:* Andrew Palumbo;Vie...@li... > <mailto:Vie...@li...> > >> *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > >> Hi Andy, > >> > >> instead of viennacl::backend::memory_copy(), you want to use > >> viennacl::backend::memory_read(), which directly transfers the data into > >> your buffer(s). > >> > >> If you *know* that your handles are in host memory, you can even grab > >> the values directly via > >> viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); > >> defined in viennacl/linalg/host_based/common.hpp, around line 40. > >> > >> Please let me know if you still get errors after using that. > >> > >> Best regards, > >> Karli > >> > >> > >> > >> > >> On 07/20/2016 09:05 PM, Andrew Palumbo wrote: > >>> Hello, > >>> > >>> > >>> I'm Having some difficulties with compressed_matrix multiplication. > >>> > >>> > >>> Essentially I am copying three buffers, the CSR conversion of an Apache > >>> Mahout SparseMatrix, into two compressed_matrices performing matrix > >>> multiplication. I am doing this in scala and Java using javacpp. > >>> > >>> > >>> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR > >>> format looks like this: > >>> > >>> > >>> NNz: 12 > >>> > >>> Row Pointer: [0, 1, 4, 6, 9, 12, ] > >>> > >>> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] > >>> > >>> element Pointer: [0.4065367203992265, 0.04957158909682802, > >>> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, > >>> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, > >>> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, > >>> 0.9710498974366047, ] > >>> > >>> Multiplied by a similarly Sparse 10 x 5 compressed_matrix > >>> > >>> I use a CompressedMatrix wrapper which essentially wraps the > >>> > >>> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, > >>> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) > >>> > >>> constructor as well as the > >>> > >>> compressed_matrix (matrix_expression< const compressed_matrix, > >>> const compressed_matrix, op_prod > const &proxy). > >>> > >>> I have a helper function, /toVclCompressedMatrix/(..) which essentially > >>> does the CSR conversion from a Mahout src matrix, calls the constructor > >>> and uses viennacl::compressed_matrix::set(...) to set the buffers: > >>> > >>> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) > >>> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) > >>> > >>> > >>> and then create a new viennacl::compressed_matrix from the > >>> viennacl::linalg::prod of the 2 matrices i.e.: > >>> > >>> val ompC =new CompressedMatrix(prod(ompA, ompB)) > >>> > >>> The context in the above case is either the Host or OpenMP (I know that > >>> there is some special casting of the row_jumpers and col_idxs that needs > >>> to be done in the OpenCL version) > >>> > >>> The Matrix multiplication completes without error on small Matrices eg. > >>> < 300 x 300 > >>> but seems to overwrite the resulting buffers on larger Matrices. > >>> > >>> My real problem, though is getting the memory back out of the > >>> resulting`ompC` compresed_matrix so that i can write it back to a mahout > >>> SparseMatrix. > >>> > >>> currently I am using: > >>> > >>> void viennacl::backend::memory_copy (mem_handle const & src_buffer, > >>> mem_handle & dst_buffer, > >>> vcl_size_t src_offset, > >>> vcl_size_t dst_offset, > >>> vcl_size_t bytes_to_copy > >>> ) > >>> > >>> on ompC.handel1,ompC.handel2 and ompC.handel source handels > >>> > >>> to copy into pre-allocated row_jumper, col_index and element buffers > >>> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). > >>> > >>> I am getting nonsensical values back that one would expect from memory > >>> errors. eg: > >>> > >>> the Matrix geometry of the result: ompC.size1(), and omp.size2() are > >>> correct and ompC.nnz is a reasonable value. > >>> > >>> It is possible that I have mis-allocated some of the memory on my side, > >>> but I am pretty sure that most of the Buffers are allocated correctly > >>> (usually JavaCPP does a pretty good job of this). > >>> > >>> > >>> I guess, long story short, my question is am i using the correct method > >>> of copying the memory out of a compressed_matrix? is there something > >>> glaringly incorrect that i am doing here? Should I be using > >>> viennacl::backend::memory_copy or is there a different method that i > >>> should be using? > >>> > >>> > >>> Thanks very much, > >>> > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ > >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > >>> patterns at an interface-level. Reveals which users, apps, and protocols are > >>> consuming the most bandwidth. Provides multi-vendor support for NetFlow, > >>> J-Flow, sFlow and other flows. Make informed decisions using capacity planning > >>> reports.http://sdm.link/zohodev2dev > >>> > >>> > >>> > >>> _______________________________________________ > >>> ViennaCL-devel mailing list > >>>Vie...@li... > <mailto:Vie...@li...> > >>>https://lists.sourceforge.net/lists/listinfo/viennacl-devel > >>> > >> > > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and > protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using > capacity planning > reports.http://sdm.link/zohodev2dev > _______________________________________________ > ViennaCL-devel mailing list > Vie...@li... > <mailto:Vie...@li...> > https://lists.sourceforge.net/lists/listinfo/viennacl-devel > > |
From: Dmitriy L. <dl...@gm...> - 2016-07-22 21:44:01
|
yes. this seems to be the case. if i force out-of-order CSR into in-order CSR everything seems to work. Can't see the documentation explicitly mentioning this if this is the case indeed. Karl, can you please confirm only in-order CSRs are supported? Thanks! -Dmitriy On Fri, Jul 22, 2016 at 12:57 PM, Dmitriy Lyubimov <dl...@gm...> wrote: > PS > (4) column indices admit out-of-order placements of elements within each > row. > > Thank you. > -Dmitriy > > On Fri, Jul 22, 2016 at 12:56 PM, Dmitriy Lyubimov <dl...@gm...> > wrote: > >> I think I still am getting seg faults on attempt to multiply matrices >> even without conversion back (larger arguments, 3k x 1k) >> >> I re-wrote another alternative transformation procedure and see nothing >> wrong with it. Both Andrew's code and mine fail with the same symptoms. >> >> Karl, can we verify assumptions about the format: >> >> (1) the compressed_marix.set method expects host memory pointers. >> (2) the format is compressed row storage (CSR). Documentation never says >> explicitly that, and actually seems to have errors in size of elements and >> jumper arrays (it says jumper array has to be cols+1 long wheres in CSR it >> shoud actually be rows + 1 long, right? ) >> (3) the element sizes of jumper and column indices arrays are 32 bit and >> are in little endian order (at least for the open MP backend). >> >> Right now I can't even get open mp sparse multiplication work although >> CSR format is not rocket science at all. Don't see a problem anywhere. >> Tried to read Vienna's code to converm the assumptions above, but this >> seems to be pretty elusive for the time being. >> >> >> On Fri, Jul 22, 2016 at 10:26 AM, Andrew Palumbo <ap...@ou...> >> wrote: >> >>> Yep thats it. Oh wow- well thats just embarrassing [image: 😊]. >>> >>> >>> Thanks very much for your time, Karl- much appreciated. >>> >>> >>> Andy >>> ------------------------------ >>> *From:* Karl Rupp <ru...@iu...> >>> *Sent:* Friday, July 22, 2016 12:39:20 PM >>> *To:* Andrew Palumbo; viennacl-devel >>> *Subject:* Re: [ViennaCL-devel] Copying Values out of a >>> compressed_matrix >>> >>> Hi, >>> >>> your second and third arguments to memory_read() are incorrect: >>> The second argument is the offset from the beginning, the third argument >>> is the number of bytes to be read. Shifting the zero to the second >>> position fixes the snippet (plus correcting the loop bounds when >>> printing at the end) :-) >>> >>> Best regards, >>> Karli >>> >>> >>> >>> On 07/22/2016 08:51 AM, Andrew Palumbo wrote: >>> > a couple of small mistakes in the previous c++ file: >>> > >>> > >>> > The memory_read(..) call should be: >>> > >>> > >>> > // read data back into our product buffers >>> > viennacl::backend::memory_read(handle1, product_size_row * 4, 0, >>> > product_row_ptr, false); >>> > viennacl::backend::memory_read(handle2, product_NNz * 4, 0, >>> > product_col_ptr, false); >>> > viennacl::backend::memory_read(handle, product_NNz * 8, 0, >>> > product_values_ptr, false); >>> > >>> > >>> > (read product_NNz * x bytes instead of product_size_row * x) >>> > >>> > >>> > I've attached the corrected file. >>> > >>> > >>> > Thanks >>> > >>> > >>> > Andy >>> > >>> > >>> ------------------------------------------------------------------------ >>> > *From:* Andrew Palumbo <ap...@ou...> >>> > *Sent:* Thursday, July 21, 2016 11:03:59 PM >>> > *To:* Karl Rupp; viennacl-devel >>> > *Subject:* Re: [ViennaCL-devel] Copying Values out of a >>> compressed_matrix >>> > >>> > Hello, >>> > >>> > >>> > I've mocked up a sample of the compressed_matrix multiplication that >>> > I've been working with javacpp on in C++. I am seeing the same type of >>> > memory errors when I try to read the data out of product, and into the >>> > output buffers as I was with javacpp. By printing the matrix to stdout >>> > as in the compressed_matrix example we can see that there are values >>> > there, and they seem reasonable, but when i use >>> > backend::memory_read(...) to retrive the buffers, I'm getting values >>> > consistent with a memory error, and similar to what i was seeing in the >>> > javacpp code. Maybe I am not using the handles correctly? Admittedly >>> > my C++ is more than rusty, but I believe I am referencing the buffers >>> > correctly in the output. >>> > >>> > >>> > Below is the output of the attached file: sparse.cpp >>> > >>> > >>> > Thanks very much, >>> > >>> > >>> > Andy >>> > >>> > >>> > >>> > ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: >>> > (1, 2) 0.329908 >>> > (1, 3) 0.0110522 >>> > (1, 4) 0.336839 >>> > (2, 5) 0.0150778 >>> > (2, 7) 0.0143518 >>> > (3, 3) 0.217256 >>> > (3, 6) 0.346854 >>> > (3, 9) 0.45353 >>> > (4, 3) 0.407954 >>> > (4, 6) 0.651308 >>> > (5, 2) 0.676061 >>> > (5, 3) 0.0226486 >>> > (5, 4) 0.690264 >>> > (6, 5) 0.0998838 >>> > (6, 7) 0.0950744 >>> > (7, 2) 0.346173 >>> > (7, 3) 0.0115971 >>> > (7, 4) 0.353446 >>> > (7, 9) 0.684458 >>> > (8, 5) 0.0448123 >>> > (8, 7) 0.0426546 >>> > (8, 9) 0.82782 >>> > (9, 5) 0.295356 >>> > (9, 7) 0.281134 >>> > >>> > row jumpers: [ >>> > -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968 >>> ,32767,4203729,] >>> > col ptrs: [ >>> > >>> 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] >>> > elements: [ >>> > >>> 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] >>> > >>> > >>> > and similarly for multiplication of 2 1x1 matrices: >>> > >>> > Result: >>> > >>> > ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: >>> > (0, 0) 0.117699 >>> > >>> > row jumpers: [ >>> > -717571424,32767,] >>> > col ptrs: [ >>> > 6386240,] >>> > elements: [ >>> > 0.289516,6.9479e-310,] >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------ >>> > *From:* Andrew Palumbo <ap...@ou...> >>> > *Sent:* Wednesday, July 20, 2016 5:40:31 PM >>> > *To:* Karl Rupp; viennacl-devel >>> > *Subject:* Re: [ViennaCL-devel] Copying Values out of a >>> compressed_matrix >>> > >>> > Oops, sorry about not cc'ing all. >>> > >>> > >>> > I do not get correct data back for a (Random.nextDouble() populated) 1 >>> x >>> > 1 Matrix. >>> > >>> > >>> > A: >>> > >>> > Row Pointer: [0, 1 ] >>> > >>> > Col Pointer: [0 ] >>> > element Pointer: [0.6465821602909256 ] >>> > >>> > >>> > B: >>> > >>> > >>> > Row Pointer: [0, 1 ] >>> > Col Pointer: [0 ] >>> > element Pointer: [0.9513577109193919 ] >>> > >>> > >>> > C = A %*% B >>> > >>> > Row Pointer: [469762248, 32632] >>> > Col Pointer: [469762248 ] >>> > element Pointer: [6.9245198744523E-310 ] >>> > >>> > >>> > ouch. >>> > >>> > >>> > It looks like I'm not copying the Buffers correctly at all. I'm may be >>> > using the javacpp buffers incorrectly here, or I have possibly wrapped >>> > the viennacl::backend::memory_handle class incorrectly, so I'm using a >>> > pointer to the wrong memory from eg. >>> viennacl::compressed_matrix::handle. >>> > >>> > >>> > I mentioned before that the multiplication completed in on small <~300 >>> x >>> > 300 matrices because if I try to multiply two larger sparse matrices, >>> an >>> > err the JVM crashes with a SIGSEGV. >>> > >>> > >>> > Since this code is all wrapped with javacpp, I don't really have a >>> small >>> > sample that I can show you (not going to dump a whole bunch of code on >>> > you). >>> > >>> > >>> > I'll keep trying to figure it out. Pretty sure the problem is on my >>> end >>> > here �� I really mainly wanted to ask you if I was using the correct >>> > methods at this point, or if there was anything very obviously that I >>> > was doing wrong. >>> > >>> > >>> > Thanks a lot for your help! >>> > >>> > >>> > Andy >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------ >>> > *From:* Karl Rupp <ru...@iu...> >>> > *Sent:* Wednesday, July 20, 2016 5:00:36 PM >>> > *To:* Andrew Palumbo; viennacl-devel >>> > *Subject:* Re: [ViennaCL-devel] Copying Values out of a >>> compressed_matrix >>> > Hi, >>> > >>> > please keep viennacl-devel in CC: >>> > >>> > Just to clarify: Do you get incorrect values for a 1-by-1 matrix as >>> > indicated in your sample data? In your previous email you mentioned >>> that >>> > results are fine for small matrices... >>> > >>> > I'm afraid I can only guess at the source of the error with the >>> > informations provided. Any chance that you can provide a standalone >>> code >>> > to reproduce the problem with reasonable effort? >>> > >>> > Best regards, >>> > Karli >>> > >>> > >>> > >>> > On 07/20/2016 10:16 PM, Andrew Palumbo wrote: >>> >> Thanks so much for your quick answer! >>> >> >>> >> >>> >> I actually am sorry to say that I made a mistake when writing the last >>> >> email, I copied the wrong signature from the VCL documentation, and >>> then >>> >> the mistake propagated through the rest of the e-mail. >>> >> >>> >> >>> >> I am actually using viennacl::backend::memory_read(). >>> >> >>> >> >>> >> Eg, for the row_jumpers and column_idx I read use: >>> >> >>> >> @Name("backend::memory_read") >>> >> public static native void memoryReadInt(@Const @ByRef MemHandle >>> src_buffer, >>> >> int bytes_to_read, >>> >> int offset, >>> >> IntPointer ptr, >>> >> boolean async); >>> >> >>> >> and for the Values: >>> >> >>> >> >>> >> @Name("backend::memory_read") >>> >> public static native void memoryReadDouble(@Const @ByRef MemHandle >>> src_buffer, >>> >> int bytes_to_read, >>> >> int offset, >>> >> DoublePointer ptr, >>> >> boolean async); >>> >> >>> >> And then call: >>> >> >>> >> >>> >> memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) >>> >> memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) >>> >> memoryReadDouble(element_handle, NNz *8,0, values,false) >>> >> >>> >> >>> >> and after convetring them to java.nio.Buffers, am getting results >>> like: >>> >> >>> >> >>> >> rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): >>> 6.91730177312166E-310 >>> >> >>> >> >>> >> Have also tried reading into BytePointers similarly with the same type >>> >> of results. I know that the use of Javacpp obfuscates what the >>> problem >>> >> may be. But I believe the Memorry is properly allocated. >>> >> >>> >> >>> >> >>> >> Sorry for the mistake. >>> >> >>> >> >>> >> Thanks, >>> >> >>> >> >>> >> Andy >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------ >>> >> *From:* Karl Rupp <ru...@iu...> >>> >> *Sent:* Wednesday, July 20, 2016 3:50:07 PM >>> >> *To:* Andrew Palumbo; Vie...@li... >>> >> *Subject:* Re: [ViennaCL-devel] Copying Values out of a >>> compressed_matrix >>> >> Hi Andy, >>> >> >>> >> instead of viennacl::backend::memory_copy(), you want to use >>> >> viennacl::backend::memory_read(), which directly transfers the data >>> into >>> >> your buffer(s). >>> >> >>> >> If you *know* that your handles are in host memory, you can even grab >>> >> the values directly via >>> >> viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); >>> >> defined in viennacl/linalg/host_based/common.hpp, around line 40. >>> >> >>> >> Please let me know if you still get errors after using that. >>> >> >>> >> Best regards, >>> >> Karli >>> >> >>> >> >>> >> >>> >> >>> >> On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >>> >>> Hello, >>> >>> >>> >>> >>> >>> I'm Having some difficulties with compressed_matrix multiplication. >>> >>> >>> >>> >>> >>> Essentially I am copying three buffers, the CSR conversion of an >>> Apache >>> >>> Mahout SparseMatrix, into two compressed_matrices performing matrix >>> >>> multiplication. I am doing this in scala and Java using javacpp. >>> >>> >>> >>> >>> >>> For example, I have a 5 x 10 matrix of ~20% non-zero values which in >>> CSR >>> >>> format looks like this: >>> >>> >>> >>> >>> >>> NNz: 12 >>> >>> >>> >>> Row Pointer: [0, 1, 4, 6, 9, 12, ] >>> >>> >>> >>> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >>> >>> >>> >>> element Pointer: [0.4065367203992265, 0.04957158909682802, >>> >>> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >>> >>> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >>> >>> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >>> >>> 0.9710498974366047, ] >>> >>> >>> >>> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >>> >>> >>> >>> I use a CompressedMatrix wrapper which essentially wraps the >>> >>> >>> >>> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >>> >>> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >>> >>> >>> >>> constructor as well as the >>> >>> >>> >>> compressed_matrix (matrix_expression< const compressed_matrix, >>> >>> const compressed_matrix, op_prod > const &proxy). >>> >>> >>> >>> I have a helper function, /toVclCompressedMatrix/(..) which >>> essentially >>> >>> does the CSR conversion from a Mahout src matrix, calls the >>> constructor >>> >>> and uses viennacl::compressed_matrix::set(...) to set the buffers: >>> >>> >>> >>> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >>> >>> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >>> >>> >>> >>> >>> >>> and then create a new viennacl::compressed_matrix from the >>> >>> viennacl::linalg::prod of the 2 matrices i.e.: >>> >>> >>> >>> val ompC =new CompressedMatrix(prod(ompA, ompB)) >>> >>> >>> >>> The context in the above case is either the Host or OpenMP (I know >>> that >>> >>> there is some special casting of the row_jumpers and col_idxs that >>> needs >>> >>> to be done in the OpenCL version) >>> >>> >>> >>> The Matrix multiplication completes without error on small Matrices >>> eg. >>> >>> < 300 x 300 >>> >>> but seems to overwrite the resulting buffers on larger Matrices. >>> >>> >>> >>> My real problem, though is getting the memory back out of the >>> >>> resulting`ompC` compresed_matrix so that i can write it back to a >>> mahout >>> >>> SparseMatrix. >>> >>> >>> >>> currently I am using: >>> >>> >>> >>> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >>> >>> mem_handle & dst_buffer, >>> >>> vcl_size_t src_offset, >>> >>> vcl_size_t dst_offset, >>> >>> vcl_size_t bytes_to_copy >>> >>> ) >>> >>> >>> >>> on ompC.handel1,ompC.handel2 and ompC.handel source handels >>> >>> >>> >>> to copy into pre-allocated row_jumper, col_index and element >>> buffers >>> >>> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >>> >>> >>> >>> I am getting nonsensical values back that one would expect from >>> memory >>> >>> errors. eg: >>> >>> >>> >>> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >>> >>> correct and ompC.nnz is a reasonable value. >>> >>> >>> >>> It is possible that I have mis-allocated some of the memory on my >>> side, >>> >>> but I am pretty sure that most of the Buffers are allocated correctly >>> >>> (usually JavaCPP does a pretty good job of this). >>> >>> >>> >>> >>> >>> I guess, long story short, my question is am i using the correct >>> method >>> >>> of copying the memory out of a compressed_matrix? is there something >>> >>> glaringly incorrect that i am doing here? Should I be using >>> >>> viennacl::backend::memory_copy or is there a different method that i >>> >>> should be using? >>> >>> >>> >>> >>> >>> Thanks very much, >>> >>> >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and >>> traffic >>> >>> patterns at an interface-level. Reveals which users, apps, and >>> protocols are >>> >>> consuming the most bandwidth. Provides multi-vendor support for >>> NetFlow, >>> >>> J-Flow, sFlow and other flows. Make informed decisions using >>> capacity planning >>> >>> reports.http://sdm.link/zohodev2dev >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> ViennaCL-devel mailing list >>> >>> Vie...@li... >>> >>>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >>> >>> >>> >> >>> > >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and >>> traffic >>> patterns at an interface-level. Reveals which users, apps, and protocols >>> are >>> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >>> J-Flow, sFlow and other flows. Make informed decisions using capacity >>> planning >>> reports.http://sdm.link/zohodev2dev >>> _______________________________________________ >>> ViennaCL-devel mailing list >>> Vie...@li... >>> https://lists.sourceforge.net/lists/listinfo/viennacl-devel >>> >>> >> > |
From: Dmitriy L. <dl...@gm...> - 2016-07-22 21:23:13
|
On Fri, Jul 22, 2016 at 12:57 PM, Dmitriy Lyubimov <dl...@gm...> wrote: > > (2) the format is compressed row storage (CSR). Documentation never says >> explicitly that, >> > Correction: reference documentation of the set () method does not mention it. The manual does say compressed_matrix in general operates on CSR. |
From: Dmitriy L. <dl...@gm...> - 2016-07-22 19:57:48
|
PS (4) column indices admit out-of-order placements of elements within each row. Thank you. -Dmitriy On Fri, Jul 22, 2016 at 12:56 PM, Dmitriy Lyubimov <dl...@gm...> wrote: > I think I still am getting seg faults on attempt to multiply matrices even > without conversion back (larger arguments, 3k x 1k) > > I re-wrote another alternative transformation procedure and see nothing > wrong with it. Both Andrew's code and mine fail with the same symptoms. > > Karl, can we verify assumptions about the format: > > (1) the compressed_marix.set method expects host memory pointers. > (2) the format is compressed row storage (CSR). Documentation never says > explicitly that, and actually seems to have errors in size of elements and > jumper arrays (it says jumper array has to be cols+1 long wheres in CSR it > shoud actually be rows + 1 long, right? ) > (3) the element sizes of jumper and column indices arrays are 32 bit and > are in little endian order (at least for the open MP backend). > > Right now I can't even get open mp sparse multiplication work although CSR > format is not rocket science at all. Don't see a problem anywhere. Tried to > read Vienna's code to converm the assumptions above, but this seems to be > pretty elusive for the time being. > > > On Fri, Jul 22, 2016 at 10:26 AM, Andrew Palumbo <ap...@ou...> > wrote: > >> Yep thats it. Oh wow- well thats just embarrassing [image: 😊]. >> >> >> Thanks very much for your time, Karl- much appreciated. >> >> >> Andy >> ------------------------------ >> *From:* Karl Rupp <ru...@iu...> >> *Sent:* Friday, July 22, 2016 12:39:20 PM >> *To:* Andrew Palumbo; viennacl-devel >> *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix >> >> Hi, >> >> your second and third arguments to memory_read() are incorrect: >> The second argument is the offset from the beginning, the third argument >> is the number of bytes to be read. Shifting the zero to the second >> position fixes the snippet (plus correcting the loop bounds when >> printing at the end) :-) >> >> Best regards, >> Karli >> >> >> >> On 07/22/2016 08:51 AM, Andrew Palumbo wrote: >> > a couple of small mistakes in the previous c++ file: >> > >> > >> > The memory_read(..) call should be: >> > >> > >> > // read data back into our product buffers >> > viennacl::backend::memory_read(handle1, product_size_row * 4, 0, >> > product_row_ptr, false); >> > viennacl::backend::memory_read(handle2, product_NNz * 4, 0, >> > product_col_ptr, false); >> > viennacl::backend::memory_read(handle, product_NNz * 8, 0, >> > product_values_ptr, false); >> > >> > >> > (read product_NNz * x bytes instead of product_size_row * x) >> > >> > >> > I've attached the corrected file. >> > >> > >> > Thanks >> > >> > >> > Andy >> > >> > ------------------------------------------------------------------------ >> > *From:* Andrew Palumbo <ap...@ou...> >> > *Sent:* Thursday, July 21, 2016 11:03:59 PM >> > *To:* Karl Rupp; viennacl-devel >> > *Subject:* Re: [ViennaCL-devel] Copying Values out of a >> compressed_matrix >> > >> > Hello, >> > >> > >> > I've mocked up a sample of the compressed_matrix multiplication that >> > I've been working with javacpp on in C++. I am seeing the same type of >> > memory errors when I try to read the data out of product, and into the >> > output buffers as I was with javacpp. By printing the matrix to stdout >> > as in the compressed_matrix example we can see that there are values >> > there, and they seem reasonable, but when i use >> > backend::memory_read(...) to retrive the buffers, I'm getting values >> > consistent with a memory error, and similar to what i was seeing in the >> > javacpp code. Maybe I am not using the handles correctly? Admittedly >> > my C++ is more than rusty, but I believe I am referencing the buffers >> > correctly in the output. >> > >> > >> > Below is the output of the attached file: sparse.cpp >> > >> > >> > Thanks very much, >> > >> > >> > Andy >> > >> > >> > >> > ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: >> > (1, 2) 0.329908 >> > (1, 3) 0.0110522 >> > (1, 4) 0.336839 >> > (2, 5) 0.0150778 >> > (2, 7) 0.0143518 >> > (3, 3) 0.217256 >> > (3, 6) 0.346854 >> > (3, 9) 0.45353 >> > (4, 3) 0.407954 >> > (4, 6) 0.651308 >> > (5, 2) 0.676061 >> > (5, 3) 0.0226486 >> > (5, 4) 0.690264 >> > (6, 5) 0.0998838 >> > (6, 7) 0.0950744 >> > (7, 2) 0.346173 >> > (7, 3) 0.0115971 >> > (7, 4) 0.353446 >> > (7, 9) 0.684458 >> > (8, 5) 0.0448123 >> > (8, 7) 0.0426546 >> > (8, 9) 0.82782 >> > (9, 5) 0.295356 >> > (9, 7) 0.281134 >> > >> > row jumpers: [ >> > -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968 >> ,32767,4203729,] >> > col ptrs: [ >> > >> 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] >> > elements: [ >> > >> 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] >> > >> > >> > and similarly for multiplication of 2 1x1 matrices: >> > >> > Result: >> > >> > ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: >> > (0, 0) 0.117699 >> > >> > row jumpers: [ >> > -717571424,32767,] >> > col ptrs: [ >> > 6386240,] >> > elements: [ >> > 0.289516,6.9479e-310,] >> > >> > >> > >> > >> > ------------------------------------------------------------------------ >> > *From:* Andrew Palumbo <ap...@ou...> >> > *Sent:* Wednesday, July 20, 2016 5:40:31 PM >> > *To:* Karl Rupp; viennacl-devel >> > *Subject:* Re: [ViennaCL-devel] Copying Values out of a >> compressed_matrix >> > >> > Oops, sorry about not cc'ing all. >> > >> > >> > I do not get correct data back for a (Random.nextDouble() populated) 1 x >> > 1 Matrix. >> > >> > >> > A: >> > >> > Row Pointer: [0, 1 ] >> > >> > Col Pointer: [0 ] >> > element Pointer: [0.6465821602909256 ] >> > >> > >> > B: >> > >> > >> > Row Pointer: [0, 1 ] >> > Col Pointer: [0 ] >> > element Pointer: [0.9513577109193919 ] >> > >> > >> > C = A %*% B >> > >> > Row Pointer: [469762248, 32632] >> > Col Pointer: [469762248 ] >> > element Pointer: [6.9245198744523E-310 ] >> > >> > >> > ouch. >> > >> > >> > It looks like I'm not copying the Buffers correctly at all. I'm may be >> > using the javacpp buffers incorrectly here, or I have possibly wrapped >> > the viennacl::backend::memory_handle class incorrectly, so I'm using a >> > pointer to the wrong memory from eg. >> viennacl::compressed_matrix::handle. >> > >> > >> > I mentioned before that the multiplication completed in on small <~300 x >> > 300 matrices because if I try to multiply two larger sparse matrices, an >> > err the JVM crashes with a SIGSEGV. >> > >> > >> > Since this code is all wrapped with javacpp, I don't really have a small >> > sample that I can show you (not going to dump a whole bunch of code on >> > you). >> > >> > >> > I'll keep trying to figure it out. Pretty sure the problem is on my end >> > here �� I really mainly wanted to ask you if I was using the correct >> > methods at this point, or if there was anything very obviously that I >> > was doing wrong. >> > >> > >> > Thanks a lot for your help! >> > >> > >> > Andy >> > >> > >> > >> > >> > >> > >> > ------------------------------------------------------------------------ >> > *From:* Karl Rupp <ru...@iu...> >> > *Sent:* Wednesday, July 20, 2016 5:00:36 PM >> > *To:* Andrew Palumbo; viennacl-devel >> > *Subject:* Re: [ViennaCL-devel] Copying Values out of a >> compressed_matrix >> > Hi, >> > >> > please keep viennacl-devel in CC: >> > >> > Just to clarify: Do you get incorrect values for a 1-by-1 matrix as >> > indicated in your sample data? In your previous email you mentioned that >> > results are fine for small matrices... >> > >> > I'm afraid I can only guess at the source of the error with the >> > informations provided. Any chance that you can provide a standalone code >> > to reproduce the problem with reasonable effort? >> > >> > Best regards, >> > Karli >> > >> > >> > >> > On 07/20/2016 10:16 PM, Andrew Palumbo wrote: >> >> Thanks so much for your quick answer! >> >> >> >> >> >> I actually am sorry to say that I made a mistake when writing the last >> >> email, I copied the wrong signature from the VCL documentation, and >> then >> >> the mistake propagated through the rest of the e-mail. >> >> >> >> >> >> I am actually using viennacl::backend::memory_read(). >> >> >> >> >> >> Eg, for the row_jumpers and column_idx I read use: >> >> >> >> @Name("backend::memory_read") >> >> public static native void memoryReadInt(@Const @ByRef MemHandle >> src_buffer, >> >> int bytes_to_read, >> >> int offset, >> >> IntPointer ptr, >> >> boolean async); >> >> >> >> and for the Values: >> >> >> >> >> >> @Name("backend::memory_read") >> >> public static native void memoryReadDouble(@Const @ByRef MemHandle >> src_buffer, >> >> int bytes_to_read, >> >> int offset, >> >> DoublePointer ptr, >> >> boolean async); >> >> >> >> And then call: >> >> >> >> >> >> memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) >> >> memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) >> >> memoryReadDouble(element_handle, NNz *8,0, values,false) >> >> >> >> >> >> and after convetring them to java.nio.Buffers, am getting results like: >> >> >> >> >> >> rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): >> 6.91730177312166E-310 >> >> >> >> >> >> Have also tried reading into BytePointers similarly with the same type >> >> of results. I know that the use of Javacpp obfuscates what the problem >> >> may be. But I believe the Memorry is properly allocated. >> >> >> >> >> >> >> >> Sorry for the mistake. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Andy >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> *From:* Karl Rupp <ru...@iu...> >> >> *Sent:* Wednesday, July 20, 2016 3:50:07 PM >> >> *To:* Andrew Palumbo; Vie...@li... >> >> *Subject:* Re: [ViennaCL-devel] Copying Values out of a >> compressed_matrix >> >> Hi Andy, >> >> >> >> instead of viennacl::backend::memory_copy(), you want to use >> >> viennacl::backend::memory_read(), which directly transfers the data >> into >> >> your buffer(s). >> >> >> >> If you *know* that your handles are in host memory, you can even grab >> >> the values directly via >> >> viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); >> >> defined in viennacl/linalg/host_based/common.hpp, around line 40. >> >> >> >> Please let me know if you still get errors after using that. >> >> >> >> Best regards, >> >> Karli >> >> >> >> >> >> >> >> >> >> On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >> >>> Hello, >> >>> >> >>> >> >>> I'm Having some difficulties with compressed_matrix multiplication. >> >>> >> >>> >> >>> Essentially I am copying three buffers, the CSR conversion of an >> Apache >> >>> Mahout SparseMatrix, into two compressed_matrices performing matrix >> >>> multiplication. I am doing this in scala and Java using javacpp. >> >>> >> >>> >> >>> For example, I have a 5 x 10 matrix of ~20% non-zero values which in >> CSR >> >>> format looks like this: >> >>> >> >>> >> >>> NNz: 12 >> >>> >> >>> Row Pointer: [0, 1, 4, 6, 9, 12, ] >> >>> >> >>> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >> >>> >> >>> element Pointer: [0.4065367203992265, 0.04957158909682802, >> >>> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >> >>> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >> >>> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >> >>> 0.9710498974366047, ] >> >>> >> >>> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >> >>> >> >>> I use a CompressedMatrix wrapper which essentially wraps the >> >>> >> >>> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >> >>> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >> >>> >> >>> constructor as well as the >> >>> >> >>> compressed_matrix (matrix_expression< const compressed_matrix, >> >>> const compressed_matrix, op_prod > const &proxy). >> >>> >> >>> I have a helper function, /toVclCompressedMatrix/(..) which >> essentially >> >>> does the CSR conversion from a Mahout src matrix, calls the >> constructor >> >>> and uses viennacl::compressed_matrix::set(...) to set the buffers: >> >>> >> >>> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >> >>> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >> >>> >> >>> >> >>> and then create a new viennacl::compressed_matrix from the >> >>> viennacl::linalg::prod of the 2 matrices i.e.: >> >>> >> >>> val ompC =new CompressedMatrix(prod(ompA, ompB)) >> >>> >> >>> The context in the above case is either the Host or OpenMP (I know >> that >> >>> there is some special casting of the row_jumpers and col_idxs that >> needs >> >>> to be done in the OpenCL version) >> >>> >> >>> The Matrix multiplication completes without error on small Matrices >> eg. >> >>> < 300 x 300 >> >>> but seems to overwrite the resulting buffers on larger Matrices. >> >>> >> >>> My real problem, though is getting the memory back out of the >> >>> resulting`ompC` compresed_matrix so that i can write it back to a >> mahout >> >>> SparseMatrix. >> >>> >> >>> currently I am using: >> >>> >> >>> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >> >>> mem_handle & dst_buffer, >> >>> vcl_size_t src_offset, >> >>> vcl_size_t dst_offset, >> >>> vcl_size_t bytes_to_copy >> >>> ) >> >>> >> >>> on ompC.handel1,ompC.handel2 and ompC.handel source handels >> >>> >> >>> to copy into pre-allocated row_jumper, col_index and element buffers >> >>> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >> >>> >> >>> I am getting nonsensical values back that one would expect from memory >> >>> errors. eg: >> >>> >> >>> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >> >>> correct and ompC.nnz is a reasonable value. >> >>> >> >>> It is possible that I have mis-allocated some of the memory on my >> side, >> >>> but I am pretty sure that most of the Buffers are allocated correctly >> >>> (usually JavaCPP does a pretty good job of this). >> >>> >> >>> >> >>> I guess, long story short, my question is am i using the correct >> method >> >>> of copying the memory out of a compressed_matrix? is there something >> >>> glaringly incorrect that i am doing here? Should I be using >> >>> viennacl::backend::memory_copy or is there a different method that i >> >>> should be using? >> >>> >> >>> >> >>> Thanks very much, >> >>> >> >>> Andy >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and >> traffic >> >>> patterns at an interface-level. Reveals which users, apps, and >> protocols are >> >>> consuming the most bandwidth. Provides multi-vendor support for >> NetFlow, >> >>> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning >> >>> reports.http://sdm.link/zohodev2dev >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> ViennaCL-devel mailing list >> >>> Vie...@li... >> >>>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> >>> >> >> >> > >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and >> traffic >> patterns at an interface-level. Reveals which users, apps, and protocols >> are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning >> reports.http://sdm.link/zohodev2dev >> _______________________________________________ >> ViennaCL-devel mailing list >> Vie...@li... >> https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> >> > |
From: Dmitriy L. <dl...@gm...> - 2016-07-22 19:56:24
|
I think I still am getting seg faults on attempt to multiply matrices even without conversion back (larger arguments, 3k x 1k) I re-wrote another alternative transformation procedure and see nothing wrong with it. Both Andrew's code and mine fail with the same symptoms. Karl, can we verify assumptions about the format: (1) the compressed_marix.set method expects host memory pointers. (2) the format is compressed row storage (CSR). Documentation never says explicitly that, and actually seems to have errors in size of elements and jumper arrays (it says jumper array has to be cols+1 long wheres in CSR it shoud actually be rows + 1 long, right? ) (3) the element sizes of jumper and column indices arrays are 32 bit and are in little endian order (at least for the open MP backend). Right now I can't even get open mp sparse multiplication work although CSR format is not rocket science at all. Don't see a problem anywhere. Tried to read Vienna's code to converm the assumptions above, but this seems to be pretty elusive for the time being. On Fri, Jul 22, 2016 at 10:26 AM, Andrew Palumbo <ap...@ou...> wrote: > Yep thats it. Oh wow- well thats just embarrassing [image: 😊]. > > > Thanks very much for your time, Karl- much appreciated. > > > Andy > ------------------------------ > *From:* Karl Rupp <ru...@iu...> > *Sent:* Friday, July 22, 2016 12:39:20 PM > *To:* Andrew Palumbo; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > Hi, > > your second and third arguments to memory_read() are incorrect: > The second argument is the offset from the beginning, the third argument > is the number of bytes to be read. Shifting the zero to the second > position fixes the snippet (plus correcting the loop bounds when > printing at the end) :-) > > Best regards, > Karli > > > > On 07/22/2016 08:51 AM, Andrew Palumbo wrote: > > a couple of small mistakes in the previous c++ file: > > > > > > The memory_read(..) call should be: > > > > > > // read data back into our product buffers > > viennacl::backend::memory_read(handle1, product_size_row * 4, 0, > > product_row_ptr, false); > > viennacl::backend::memory_read(handle2, product_NNz * 4, 0, > > product_col_ptr, false); > > viennacl::backend::memory_read(handle, product_NNz * 8, 0, > > product_values_ptr, false); > > > > > > (read product_NNz * x bytes instead of product_size_row * x) > > > > > > I've attached the corrected file. > > > > > > Thanks > > > > > > Andy > > > > ------------------------------------------------------------------------ > > *From:* Andrew Palumbo <ap...@ou...> > > *Sent:* Thursday, July 21, 2016 11:03:59 PM > > *To:* Karl Rupp; viennacl-devel > > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > > > Hello, > > > > > > I've mocked up a sample of the compressed_matrix multiplication that > > I've been working with javacpp on in C++. I am seeing the same type of > > memory errors when I try to read the data out of product, and into the > > output buffers as I was with javacpp. By printing the matrix to stdout > > as in the compressed_matrix example we can see that there are values > > there, and they seem reasonable, but when i use > > backend::memory_read(...) to retrive the buffers, I'm getting values > > consistent with a memory error, and similar to what i was seeing in the > > javacpp code. Maybe I am not using the handles correctly? Admittedly > > my C++ is more than rusty, but I believe I am referencing the buffers > > correctly in the output. > > > > > > Below is the output of the attached file: sparse.cpp > > > > > > Thanks very much, > > > > > > Andy > > > > > > > > ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: > > (1, 2) 0.329908 > > (1, 3) 0.0110522 > > (1, 4) 0.336839 > > (2, 5) 0.0150778 > > (2, 7) 0.0143518 > > (3, 3) 0.217256 > > (3, 6) 0.346854 > > (3, 9) 0.45353 > > (4, 3) 0.407954 > > (4, 6) 0.651308 > > (5, 2) 0.676061 > > (5, 3) 0.0226486 > > (5, 4) 0.690264 > > (6, 5) 0.0998838 > > (6, 7) 0.0950744 > > (7, 2) 0.346173 > > (7, 3) 0.0115971 > > (7, 4) 0.353446 > > (7, 9) 0.684458 > > (8, 5) 0.0448123 > > (8, 7) 0.0426546 > > (8, 9) 0.82782 > > (9, 5) 0.295356 > > (9, 7) 0.281134 > > > > row jumpers: [ > > -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968 > ,32767,4203729,] > > col ptrs: [ > > > 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] > > elements: [ > > > 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] > > > > > > and similarly for multiplication of 2 1x1 matrices: > > > > Result: > > > > ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: > > (0, 0) 0.117699 > > > > row jumpers: [ > > -717571424,32767,] > > col ptrs: [ > > 6386240,] > > elements: [ > > 0.289516,6.9479e-310,] > > > > > > > > > > ------------------------------------------------------------------------ > > *From:* Andrew Palumbo <ap...@ou...> > > *Sent:* Wednesday, July 20, 2016 5:40:31 PM > > *To:* Karl Rupp; viennacl-devel > > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > > > Oops, sorry about not cc'ing all. > > > > > > I do not get correct data back for a (Random.nextDouble() populated) 1 x > > 1 Matrix. > > > > > > A: > > > > Row Pointer: [0, 1 ] > > > > Col Pointer: [0 ] > > element Pointer: [0.6465821602909256 ] > > > > > > B: > > > > > > Row Pointer: [0, 1 ] > > Col Pointer: [0 ] > > element Pointer: [0.9513577109193919 ] > > > > > > C = A %*% B > > > > Row Pointer: [469762248, 32632] > > Col Pointer: [469762248 ] > > element Pointer: [6.9245198744523E-310 ] > > > > > > ouch. > > > > > > It looks like I'm not copying the Buffers correctly at all. I'm may be > > using the javacpp buffers incorrectly here, or I have possibly wrapped > > the viennacl::backend::memory_handle class incorrectly, so I'm using a > > pointer to the wrong memory from eg. viennacl::compressed_matrix::handle. > > > > > > I mentioned before that the multiplication completed in on small <~300 x > > 300 matrices because if I try to multiply two larger sparse matrices, an > > err the JVM crashes with a SIGSEGV. > > > > > > Since this code is all wrapped with javacpp, I don't really have a small > > sample that I can show you (not going to dump a whole bunch of code on > > you). > > > > > > I'll keep trying to figure it out. Pretty sure the problem is on my end > > here �� I really mainly wanted to ask you if I was using the correct > > methods at this point, or if there was anything very obviously that I > > was doing wrong. > > > > > > Thanks a lot for your help! > > > > > > Andy > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > *From:* Karl Rupp <ru...@iu...> > > *Sent:* Wednesday, July 20, 2016 5:00:36 PM > > *To:* Andrew Palumbo; viennacl-devel > > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > Hi, > > > > please keep viennacl-devel in CC: > > > > Just to clarify: Do you get incorrect values for a 1-by-1 matrix as > > indicated in your sample data? In your previous email you mentioned that > > results are fine for small matrices... > > > > I'm afraid I can only guess at the source of the error with the > > informations provided. Any chance that you can provide a standalone code > > to reproduce the problem with reasonable effort? > > > > Best regards, > > Karli > > > > > > > > On 07/20/2016 10:16 PM, Andrew Palumbo wrote: > >> Thanks so much for your quick answer! > >> > >> > >> I actually am sorry to say that I made a mistake when writing the last > >> email, I copied the wrong signature from the VCL documentation, and then > >> the mistake propagated through the rest of the e-mail. > >> > >> > >> I am actually using viennacl::backend::memory_read(). > >> > >> > >> Eg, for the row_jumpers and column_idx I read use: > >> > >> @Name("backend::memory_read") > >> public static native void memoryReadInt(@Const @ByRef MemHandle > src_buffer, > >> int bytes_to_read, > >> int offset, > >> IntPointer ptr, > >> boolean async); > >> > >> and for the Values: > >> > >> > >> @Name("backend::memory_read") > >> public static native void memoryReadDouble(@Const @ByRef MemHandle > src_buffer, > >> int bytes_to_read, > >> int offset, > >> DoublePointer ptr, > >> boolean async); > >> > >> And then call: > >> > >> > >> memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) > >> memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) > >> memoryReadDouble(element_handle, NNz *8,0, values,false) > >> > >> > >> and after convetring them to java.nio.Buffers, am getting results like: > >> > >> > >> rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): > 6.91730177312166E-310 > >> > >> > >> Have also tried reading into BytePointers similarly with the same type > >> of results. I know that the use of Javacpp obfuscates what the problem > >> may be. But I believe the Memorry is properly allocated. > >> > >> > >> > >> Sorry for the mistake. > >> > >> > >> Thanks, > >> > >> > >> Andy > >> > >> > >> ------------------------------------------------------------------------ > >> *From:* Karl Rupp <ru...@iu...> > >> *Sent:* Wednesday, July 20, 2016 3:50:07 PM > >> *To:* Andrew Palumbo; Vie...@li... > >> *Subject:* Re: [ViennaCL-devel] Copying Values out of a > compressed_matrix > >> Hi Andy, > >> > >> instead of viennacl::backend::memory_copy(), you want to use > >> viennacl::backend::memory_read(), which directly transfers the data into > >> your buffer(s). > >> > >> If you *know* that your handles are in host memory, you can even grab > >> the values directly via > >> viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); > >> defined in viennacl/linalg/host_based/common.hpp, around line 40. > >> > >> Please let me know if you still get errors after using that. > >> > >> Best regards, > >> Karli > >> > >> > >> > >> > >> On 07/20/2016 09:05 PM, Andrew Palumbo wrote: > >>> Hello, > >>> > >>> > >>> I'm Having some difficulties with compressed_matrix multiplication. > >>> > >>> > >>> Essentially I am copying three buffers, the CSR conversion of an > Apache > >>> Mahout SparseMatrix, into two compressed_matrices performing matrix > >>> multiplication. I am doing this in scala and Java using javacpp. > >>> > >>> > >>> For example, I have a 5 x 10 matrix of ~20% non-zero values which in > CSR > >>> format looks like this: > >>> > >>> > >>> NNz: 12 > >>> > >>> Row Pointer: [0, 1, 4, 6, 9, 12, ] > >>> > >>> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] > >>> > >>> element Pointer: [0.4065367203992265, 0.04957158909682802, > >>> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, > >>> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, > >>> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, > >>> 0.9710498974366047, ] > >>> > >>> Multiplied by a similarly Sparse 10 x 5 compressed_matrix > >>> > >>> I use a CompressedMatrix wrapper which essentially wraps the > >>> > >>> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, > >>> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) > >>> > >>> constructor as well as the > >>> > >>> compressed_matrix (matrix_expression< const compressed_matrix, > >>> const compressed_matrix, op_prod > const &proxy). > >>> > >>> I have a helper function, /toVclCompressedMatrix/(..) which essentially > >>> does the CSR conversion from a Mahout src matrix, calls the constructor > >>> and uses viennacl::compressed_matrix::set(...) to set the buffers: > >>> > >>> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) > >>> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) > >>> > >>> > >>> and then create a new viennacl::compressed_matrix from the > >>> viennacl::linalg::prod of the 2 matrices i.e.: > >>> > >>> val ompC =new CompressedMatrix(prod(ompA, ompB)) > >>> > >>> The context in the above case is either the Host or OpenMP (I know that > >>> there is some special casting of the row_jumpers and col_idxs that > needs > >>> to be done in the OpenCL version) > >>> > >>> The Matrix multiplication completes without error on small Matrices eg. > >>> < 300 x 300 > >>> but seems to overwrite the resulting buffers on larger Matrices. > >>> > >>> My real problem, though is getting the memory back out of the > >>> resulting`ompC` compresed_matrix so that i can write it back to a > mahout > >>> SparseMatrix. > >>> > >>> currently I am using: > >>> > >>> void viennacl::backend::memory_copy (mem_handle const & src_buffer, > >>> mem_handle & dst_buffer, > >>> vcl_size_t src_offset, > >>> vcl_size_t dst_offset, > >>> vcl_size_t bytes_to_copy > >>> ) > >>> > >>> on ompC.handel1,ompC.handel2 and ompC.handel source handels > >>> > >>> to copy into pre-allocated row_jumper, col_index and element buffers > >>> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). > >>> > >>> I am getting nonsensical values back that one would expect from memory > >>> errors. eg: > >>> > >>> the Matrix geometry of the result: ompC.size1(), and omp.size2() are > >>> correct and ompC.nnz is a reasonable value. > >>> > >>> It is possible that I have mis-allocated some of the memory on my side, > >>> but I am pretty sure that most of the Buffers are allocated correctly > >>> (usually JavaCPP does a pretty good job of this). > >>> > >>> > >>> I guess, long story short, my question is am i using the correct method > >>> of copying the memory out of a compressed_matrix? is there something > >>> glaringly incorrect that i am doing here? Should I be using > >>> viennacl::backend::memory_copy or is there a different method that i > >>> should be using? > >>> > >>> > >>> Thanks very much, > >>> > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > >>> patterns at an interface-level. Reveals which users, apps, and > protocols are > >>> consuming the most bandwidth. Provides multi-vendor support for > NetFlow, > >>> J-Flow, sFlow and other flows. Make informed decisions using capacity > planning > >>> reports.http://sdm.link/zohodev2dev > >>> > >>> > >>> > >>> _______________________________________________ > >>> ViennaCL-devel mailing list > >>> Vie...@li... > >>>https://lists.sourceforge.net/lists/listinfo/viennacl-devel > >>> > >> > > > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning > reports.http://sdm.link/zohodev2dev > _______________________________________________ > ViennaCL-devel mailing list > Vie...@li... > https://lists.sourceforge.net/lists/listinfo/viennacl-devel > > |
From: Andrew P. <ap...@ou...> - 2016-07-22 17:26:32
|
Yep thats it. Oh wow- well thats just embarrassing [😊] . Thanks very much for your time, Karl- much appreciated. Andy ________________________________ From: Karl Rupp <ru...@iu...> Sent: Friday, July 22, 2016 12:39:20 PM To: Andrew Palumbo; viennacl-devel Subject: Re: [ViennaCL-devel] Copying Values out of a compressed_matrix Hi, your second and third arguments to memory_read() are incorrect: The second argument is the offset from the beginning, the third argument is the number of bytes to be read. Shifting the zero to the second position fixes the snippet (plus correcting the loop bounds when printing at the end) :-) Best regards, Karli On 07/22/2016 08:51 AM, Andrew Palumbo wrote: > a couple of small mistakes in the previous c++ file: > > > The memory_read(..) call should be: > > > // read data back into our product buffers > viennacl::backend::memory_read(handle1, product_size_row * 4, 0, > product_row_ptr, false); > viennacl::backend::memory_read(handle2, product_NNz * 4, 0, > product_col_ptr, false); > viennacl::backend::memory_read(handle, product_NNz * 8, 0, > product_values_ptr, false); > > > (read product_NNz * x bytes instead of product_size_row * x) > > > I've attached the corrected file. > > > Thanks > > > Andy > > ------------------------------------------------------------------------ > *From:* Andrew Palumbo <ap...@ou...> > *Sent:* Thursday, July 21, 2016 11:03:59 PM > *To:* Karl Rupp; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > Hello, > > > I've mocked up a sample of the compressed_matrix multiplication that > I've been working with javacpp on in C++. I am seeing the same type of > memory errors when I try to read the data out of product, and into the > output buffers as I was with javacpp. By printing the matrix to stdout > as in the compressed_matrix example we can see that there are values > there, and they seem reasonable, but when i use > backend::memory_read(...) to retrive the buffers, I'm getting values > consistent with a memory error, and similar to what i was seeing in the > javacpp code. Maybe I am not using the handles correctly? Admittedly > my C++ is more than rusty, but I believe I am referencing the buffers > correctly in the output. > > > Below is the output of the attached file: sparse.cpp > > > Thanks very much, > > > Andy > > > > ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: > (1, 2) 0.329908 > (1, 3) 0.0110522 > (1, 4) 0.336839 > (2, 5) 0.0150778 > (2, 7) 0.0143518 > (3, 3) 0.217256 > (3, 6) 0.346854 > (3, 9) 0.45353 > (4, 3) 0.407954 > (4, 6) 0.651308 > (5, 2) 0.676061 > (5, 3) 0.0226486 > (5, 4) 0.690264 > (6, 5) 0.0998838 > (6, 7) 0.0950744 > (7, 2) 0.346173 > (7, 3) 0.0115971 > (7, 4) 0.353446 > (7, 9) 0.684458 > (8, 5) 0.0448123 > (8, 7) 0.0426546 > (8, 9) 0.82782 > (9, 5) 0.295356 > (9, 7) 0.281134 > > row jumpers: [ > -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968,32767,4203729,] > col ptrs: [ > 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] > elements: [ > 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] > > > and similarly for multiplication of 2 1x1 matrices: > > Result: > > ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: > (0, 0) 0.117699 > > row jumpers: [ > -717571424,32767,] > col ptrs: [ > 6386240,] > elements: [ > 0.289516,6.9479e-310,] > > > > > ------------------------------------------------------------------------ > *From:* Andrew Palumbo <ap...@ou...> > *Sent:* Wednesday, July 20, 2016 5:40:31 PM > *To:* Karl Rupp; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > Oops, sorry about not cc'ing all. > > > I do not get correct data back for a (Random.nextDouble() populated) 1 x > 1 Matrix. > > > A: > > Row Pointer: [0, 1 ] > > Col Pointer: [0 ] > element Pointer: [0.6465821602909256 ] > > > B: > > > Row Pointer: [0, 1 ] > Col Pointer: [0 ] > element Pointer: [0.9513577109193919 ] > > > C = A %*% B > > Row Pointer: [469762248, 32632] > Col Pointer: [469762248 ] > element Pointer: [6.9245198744523E-310 ] > > > ouch. > > > It looks like I'm not copying the Buffers correctly at all. I'm may be > using the javacpp buffers incorrectly here, or I have possibly wrapped > the viennacl::backend::memory_handle class incorrectly, so I'm using a > pointer to the wrong memory from eg. viennacl::compressed_matrix::handle. > > > I mentioned before that the multiplication completed in on small <~300 x > 300 matrices because if I try to multiply two larger sparse matrices, an > err the JVM crashes with a SIGSEGV. > > > Since this code is all wrapped with javacpp, I don't really have a small > sample that I can show you (not going to dump a whole bunch of code on > you). > > > I'll keep trying to figure it out. Pretty sure the problem is on my end > here �� I really mainly wanted to ask you if I was using the correct > methods at this point, or if there was anything very obviously that I > was doing wrong. > > > Thanks a lot for your help! > > > Andy > > > > > > > ------------------------------------------------------------------------ > *From:* Karl Rupp <ru...@iu...> > *Sent:* Wednesday, July 20, 2016 5:00:36 PM > *To:* Andrew Palumbo; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > Hi, > > please keep viennacl-devel in CC: > > Just to clarify: Do you get incorrect values for a 1-by-1 matrix as > indicated in your sample data? In your previous email you mentioned that > results are fine for small matrices... > > I'm afraid I can only guess at the source of the error with the > informations provided. Any chance that you can provide a standalone code > to reproduce the problem with reasonable effort? > > Best regards, > Karli > > > > On 07/20/2016 10:16 PM, Andrew Palumbo wrote: >> Thanks so much for your quick answer! >> >> >> I actually am sorry to say that I made a mistake when writing the last >> email, I copied the wrong signature from the VCL documentation, and then >> the mistake propagated through the rest of the e-mail. >> >> >> I am actually using viennacl::backend::memory_read(). >> >> >> Eg, for the row_jumpers and column_idx I read use: >> >> @Name("backend::memory_read") >> public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer, >> int bytes_to_read, >> int offset, >> IntPointer ptr, >> boolean async); >> >> and for the Values: >> >> >> @Name("backend::memory_read") >> public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer, >> int bytes_to_read, >> int offset, >> DoublePointer ptr, >> boolean async); >> >> And then call: >> >> >> memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) >> memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) >> memoryReadDouble(element_handle, NNz *8,0, values,false) >> >> >> and after convetring them to java.nio.Buffers, am getting results like: >> >> >> rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310 >> >> >> Have also tried reading into BytePointers similarly with the same type >> of results. I know that the use of Javacpp obfuscates what the problem >> may be. But I believe the Memorry is properly allocated. >> >> >> >> Sorry for the mistake. >> >> >> Thanks, >> >> >> Andy >> >> >> ------------------------------------------------------------------------ >> *From:* Karl Rupp <ru...@iu...> >> *Sent:* Wednesday, July 20, 2016 3:50:07 PM >> *To:* Andrew Palumbo; Vie...@li... >> *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix >> Hi Andy, >> >> instead of viennacl::backend::memory_copy(), you want to use >> viennacl::backend::memory_read(), which directly transfers the data into >> your buffer(s). >> >> If you *know* that your handles are in host memory, you can even grab >> the values directly via >> viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); >> defined in viennacl/linalg/host_based/common.hpp, around line 40. >> >> Please let me know if you still get errors after using that. >> >> Best regards, >> Karli >> >> >> >> >> On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >>> Hello, >>> >>> >>> I'm Having some difficulties with compressed_matrix multiplication. >>> >>> >>> Essentially I am copying three buffers, the CSR conversion of an Apache >>> Mahout SparseMatrix, into two compressed_matrices performing matrix >>> multiplication. I am doing this in scala and Java using javacpp. >>> >>> >>> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR >>> format looks like this: >>> >>> >>> NNz: 12 >>> >>> Row Pointer: [0, 1, 4, 6, 9, 12, ] >>> >>> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >>> >>> element Pointer: [0.4065367203992265, 0.04957158909682802, >>> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >>> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >>> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >>> 0.9710498974366047, ] >>> >>> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >>> >>> I use a CompressedMatrix wrapper which essentially wraps the >>> >>> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >>> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >>> >>> constructor as well as the >>> >>> compressed_matrix (matrix_expression< const compressed_matrix, >>> const compressed_matrix, op_prod > const &proxy). >>> >>> I have a helper function, /toVclCompressedMatrix/(..) which essentially >>> does the CSR conversion from a Mahout src matrix, calls the constructor >>> and uses viennacl::compressed_matrix::set(...) to set the buffers: >>> >>> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >>> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >>> >>> >>> and then create a new viennacl::compressed_matrix from the >>> viennacl::linalg::prod of the 2 matrices i.e.: >>> >>> val ompC =new CompressedMatrix(prod(ompA, ompB)) >>> >>> The context in the above case is either the Host or OpenMP (I know that >>> there is some special casting of the row_jumpers and col_idxs that needs >>> to be done in the OpenCL version) >>> >>> The Matrix multiplication completes without error on small Matrices eg. >>> < 300 x 300 >>> but seems to overwrite the resulting buffers on larger Matrices. >>> >>> My real problem, though is getting the memory back out of the >>> resulting`ompC` compresed_matrix so that i can write it back to a mahout >>> SparseMatrix. >>> >>> currently I am using: >>> >>> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >>> mem_handle & dst_buffer, >>> vcl_size_t src_offset, >>> vcl_size_t dst_offset, >>> vcl_size_t bytes_to_copy >>> ) >>> >>> on ompC.handel1,ompC.handel2 and ompC.handel source handels >>> >>> to copy into pre-allocated row_jumper, col_index and element buffers >>> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >>> >>> I am getting nonsensical values back that one would expect from memory >>> errors. eg: >>> >>> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >>> correct and ompC.nnz is a reasonable value. >>> >>> It is possible that I have mis-allocated some of the memory on my side, >>> but I am pretty sure that most of the Buffers are allocated correctly >>> (usually JavaCPP does a pretty good job of this). >>> >>> >>> I guess, long story short, my question is am i using the correct method >>> of copying the memory out of a compressed_matrix? is there something >>> glaringly incorrect that i am doing here? Should I be using >>> viennacl::backend::memory_copy or is there a different method that i >>> should be using? >>> >>> >>> Thanks very much, >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >>> patterns at an interface-level. Reveals which users, apps, and protocols are >>> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >>> J-Flow, sFlow and other flows. Make informed decisions using capacity planning >>> reports.http://sdm.link/zohodev2dev >>> >>> >>> >>> _______________________________________________ >>> ViennaCL-devel mailing list >>> Vie...@li... >>>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >>> >> > |
From: Karl R. <ru...@iu...> - 2016-07-22 16:39:32
|
Hi, your second and third arguments to memory_read() are incorrect: The second argument is the offset from the beginning, the third argument is the number of bytes to be read. Shifting the zero to the second position fixes the snippet (plus correcting the loop bounds when printing at the end) :-) Best regards, Karli On 07/22/2016 08:51 AM, Andrew Palumbo wrote: > a couple of small mistakes in the previous c++ file: > > > The memory_read(..) call should be: > > > // read data back into our product buffers > viennacl::backend::memory_read(handle1, product_size_row * 4, 0, > product_row_ptr, false); > viennacl::backend::memory_read(handle2, product_NNz * 4, 0, > product_col_ptr, false); > viennacl::backend::memory_read(handle, product_NNz * 8, 0, > product_values_ptr, false); > > > (read product_NNz * x bytes instead of product_size_row * x) > > > I've attached the corrected file. > > > Thanks > > > Andy > > ------------------------------------------------------------------------ > *From:* Andrew Palumbo <ap...@ou...> > *Sent:* Thursday, July 21, 2016 11:03:59 PM > *To:* Karl Rupp; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > Hello, > > > I've mocked up a sample of the compressed_matrix multiplication that > I've been working with javacpp on in C++. I am seeing the same type of > memory errors when I try to read the data out of product, and into the > output buffers as I was with javacpp. By printing the matrix to stdout > as in the compressed_matrix example we can see that there are values > there, and they seem reasonable, but when i use > backend::memory_read(...) to retrive the buffers, I'm getting values > consistent with a memory error, and similar to what i was seeing in the > javacpp code. Maybe I am not using the handles correctly? Admittedly > my C++ is more than rusty, but I believe I am referencing the buffers > correctly in the output. > > > Below is the output of the attached file: sparse.cpp > > > Thanks very much, > > > Andy > > > > ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: > (1, 2) 0.329908 > (1, 3) 0.0110522 > (1, 4) 0.336839 > (2, 5) 0.0150778 > (2, 7) 0.0143518 > (3, 3) 0.217256 > (3, 6) 0.346854 > (3, 9) 0.45353 > (4, 3) 0.407954 > (4, 6) 0.651308 > (5, 2) 0.676061 > (5, 3) 0.0226486 > (5, 4) 0.690264 > (6, 5) 0.0998838 > (6, 7) 0.0950744 > (7, 2) 0.346173 > (7, 3) 0.0115971 > (7, 4) 0.353446 > (7, 9) 0.684458 > (8, 5) 0.0448123 > (8, 7) 0.0426546 > (8, 9) 0.82782 > (9, 5) 0.295356 > (9, 7) 0.281134 > > row jumpers: [ > -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968,32767,4203729,] > col ptrs: [ > 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] > elements: [ > 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] > > > and similarly for multiplication of 2 1x1 matrices: > > Result: > > ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: > (0, 0) 0.117699 > > row jumpers: [ > -717571424,32767,] > col ptrs: [ > 6386240,] > elements: [ > 0.289516,6.9479e-310,] > > > > > ------------------------------------------------------------------------ > *From:* Andrew Palumbo <ap...@ou...> > *Sent:* Wednesday, July 20, 2016 5:40:31 PM > *To:* Karl Rupp; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > > Oops, sorry about not cc'ing all. > > > I do not get correct data back for a (Random.nextDouble() populated) 1 x > 1 Matrix. > > > A: > > Row Pointer: [0, 1 ] > > Col Pointer: [0 ] > element Pointer: [0.6465821602909256 ] > > > B: > > > Row Pointer: [0, 1 ] > Col Pointer: [0 ] > element Pointer: [0.9513577109193919 ] > > > C = A %*% B > > Row Pointer: [469762248, 32632] > Col Pointer: [469762248 ] > element Pointer: [6.9245198744523E-310 ] > > > ouch. > > > It looks like I'm not copying the Buffers correctly at all. I'm may be > using the javacpp buffers incorrectly here, or I have possibly wrapped > the viennacl::backend::memory_handle class incorrectly, so I'm using a > pointer to the wrong memory from eg. viennacl::compressed_matrix::handle. > > > I mentioned before that the multiplication completed in on small <~300 x > 300 matrices because if I try to multiply two larger sparse matrices, an > err the JVM crashes with a SIGSEGV. > > > Since this code is all wrapped with javacpp, I don't really have a small > sample that I can show you (not going to dump a whole bunch of code on > you). > > > I'll keep trying to figure it out. Pretty sure the problem is on my end > here �� I really mainly wanted to ask you if I was using the correct > methods at this point, or if there was anything very obviously that I > was doing wrong. > > > Thanks a lot for your help! > > > Andy > > > > > > > ------------------------------------------------------------------------ > *From:* Karl Rupp <ru...@iu...> > *Sent:* Wednesday, July 20, 2016 5:00:36 PM > *To:* Andrew Palumbo; viennacl-devel > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > Hi, > > please keep viennacl-devel in CC: > > Just to clarify: Do you get incorrect values for a 1-by-1 matrix as > indicated in your sample data? In your previous email you mentioned that > results are fine for small matrices... > > I'm afraid I can only guess at the source of the error with the > informations provided. Any chance that you can provide a standalone code > to reproduce the problem with reasonable effort? > > Best regards, > Karli > > > > On 07/20/2016 10:16 PM, Andrew Palumbo wrote: >> Thanks so much for your quick answer! >> >> >> I actually am sorry to say that I made a mistake when writing the last >> email, I copied the wrong signature from the VCL documentation, and then >> the mistake propagated through the rest of the e-mail. >> >> >> I am actually using viennacl::backend::memory_read(). >> >> >> Eg, for the row_jumpers and column_idx I read use: >> >> @Name("backend::memory_read") >> public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer, >> int bytes_to_read, >> int offset, >> IntPointer ptr, >> boolean async); >> >> and for the Values: >> >> >> @Name("backend::memory_read") >> public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer, >> int bytes_to_read, >> int offset, >> DoublePointer ptr, >> boolean async); >> >> And then call: >> >> >> memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) >> memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) >> memoryReadDouble(element_handle, NNz *8,0, values,false) >> >> >> and after convetring them to java.nio.Buffers, am getting results like: >> >> >> rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310 >> >> >> Have also tried reading into BytePointers similarly with the same type >> of results. I know that the use of Javacpp obfuscates what the problem >> may be. But I believe the Memorry is properly allocated. >> >> >> >> Sorry for the mistake. >> >> >> Thanks, >> >> >> Andy >> >> >> ------------------------------------------------------------------------ >> *From:* Karl Rupp <ru...@iu...> >> *Sent:* Wednesday, July 20, 2016 3:50:07 PM >> *To:* Andrew Palumbo; Vie...@li... >> *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix >> Hi Andy, >> >> instead of viennacl::backend::memory_copy(), you want to use >> viennacl::backend::memory_read(), which directly transfers the data into >> your buffer(s). >> >> If you *know* that your handles are in host memory, you can even grab >> the values directly via >> viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); >> defined in viennacl/linalg/host_based/common.hpp, around line 40. >> >> Please let me know if you still get errors after using that. >> >> Best regards, >> Karli >> >> >> >> >> On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >>> Hello, >>> >>> >>> I'm Having some difficulties with compressed_matrix multiplication. >>> >>> >>> Essentially I am copying three buffers, the CSR conversion of an Apache >>> Mahout SparseMatrix, into two compressed_matrices performing matrix >>> multiplication. I am doing this in scala and Java using javacpp. >>> >>> >>> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR >>> format looks like this: >>> >>> >>> NNz: 12 >>> >>> Row Pointer: [0, 1, 4, 6, 9, 12, ] >>> >>> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >>> >>> element Pointer: [0.4065367203992265, 0.04957158909682802, >>> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >>> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >>> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >>> 0.9710498974366047, ] >>> >>> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >>> >>> I use a CompressedMatrix wrapper which essentially wraps the >>> >>> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >>> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >>> >>> constructor as well as the >>> >>> compressed_matrix (matrix_expression< const compressed_matrix, >>> const compressed_matrix, op_prod > const &proxy). >>> >>> I have a helper function, /toVclCompressedMatrix/(..) which essentially >>> does the CSR conversion from a Mahout src matrix, calls the constructor >>> and uses viennacl::compressed_matrix::set(...) to set the buffers: >>> >>> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >>> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >>> >>> >>> and then create a new viennacl::compressed_matrix from the >>> viennacl::linalg::prod of the 2 matrices i.e.: >>> >>> val ompC =new CompressedMatrix(prod(ompA, ompB)) >>> >>> The context in the above case is either the Host or OpenMP (I know that >>> there is some special casting of the row_jumpers and col_idxs that needs >>> to be done in the OpenCL version) >>> >>> The Matrix multiplication completes without error on small Matrices eg. >>> < 300 x 300 >>> but seems to overwrite the resulting buffers on larger Matrices. >>> >>> My real problem, though is getting the memory back out of the >>> resulting`ompC` compresed_matrix so that i can write it back to a mahout >>> SparseMatrix. >>> >>> currently I am using: >>> >>> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >>> mem_handle & dst_buffer, >>> vcl_size_t src_offset, >>> vcl_size_t dst_offset, >>> vcl_size_t bytes_to_copy >>> ) >>> >>> on ompC.handel1,ompC.handel2 and ompC.handel source handels >>> >>> to copy into pre-allocated row_jumper, col_index and element buffers >>> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >>> >>> I am getting nonsensical values back that one would expect from memory >>> errors. eg: >>> >>> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >>> correct and ompC.nnz is a reasonable value. >>> >>> It is possible that I have mis-allocated some of the memory on my side, >>> but I am pretty sure that most of the Buffers are allocated correctly >>> (usually JavaCPP does a pretty good job of this). >>> >>> >>> I guess, long story short, my question is am i using the correct method >>> of copying the memory out of a compressed_matrix? is there something >>> glaringly incorrect that i am doing here? Should I be using >>> viennacl::backend::memory_copy or is there a different method that i >>> should be using? >>> >>> >>> Thanks very much, >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >>> patterns at an interface-level. Reveals which users, apps, and protocols are >>> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >>> J-Flow, sFlow and other flows. Make informed decisions using capacity planning >>> reports.http://sdm.link/zohodev2dev >>> >>> >>> >>> _______________________________________________ >>> ViennaCL-devel mailing list >>> Vie...@li... >>>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >>> >> > |
From: Andrew P. <ap...@ou...> - 2016-07-22 06:51:18
|
a couple of small mistakes in the previous c++ file: The memory_read(..) call should be: // read data back into our product buffers viennacl::backend::memory_read(handle1, product_size_row * 4, 0, product_row_ptr, false); viennacl::backend::memory_read(handle2, product_NNz * 4, 0, product_col_ptr, false); viennacl::backend::memory_read(handle, product_NNz * 8, 0, product_values_ptr, false); (read product_NNz * x bytes instead of product_size_row * x) I've attached the corrected file. Thanks Andy ________________________________ From: Andrew Palumbo <ap...@ou...> Sent: Thursday, July 21, 2016 11:03:59 PM To: Karl Rupp; viennacl-devel Subject: Re: [ViennaCL-devel] Copying Values out of a compressed_matrix Hello, I've mocked up a sample of the compressed_matrix multiplication that I've been working with javacpp on in C++. I am seeing the same type of memory errors when I try to read the data out of product, and into the output buffers as I was with javacpp. By printing the matrix to stdout as in the compressed_matrix example we can see that there are values there, and they seem reasonable, but when i use backend::memory_read(...) to retrive the buffers, I'm getting values consistent with a memory error, and similar to what i was seeing in the javacpp code. Maybe I am not using the handles correctly? Admittedly my C++ is more than rusty, but I believe I am referencing the buffers correctly in the output. Below is the output of the attached file: sparse.cpp Thanks very much, Andy ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: (1, 2) 0.329908 (1, 3) 0.0110522 (1, 4) 0.336839 (2, 5) 0.0150778 (2, 7) 0.0143518 (3, 3) 0.217256 (3, 6) 0.346854 (3, 9) 0.45353 (4, 3) 0.407954 (4, 6) 0.651308 (5, 2) 0.676061 (5, 3) 0.0226486 (5, 4) 0.690264 (6, 5) 0.0998838 (6, 7) 0.0950744 (7, 2) 0.346173 (7, 3) 0.0115971 (7, 4) 0.353446 (7, 9) 0.684458 (8, 5) 0.0448123 (8, 7) 0.0426546 (8, 9) 0.82782 (9, 5) 0.295356 (9, 7) 0.281134 row jumpers: [ -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968,32767,4203729,] col ptrs: [ 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] elements: [ 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] and similarly for multiplication of 2 1x1 matrices: Result: ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: (0, 0) 0.117699 row jumpers: [ -717571424,32767,] col ptrs: [ 6386240,] elements: [ 0.289516,6.9479e-310,] ________________________________ From: Andrew Palumbo <ap...@ou...> Sent: Wednesday, July 20, 2016 5:40:31 PM To: Karl Rupp; viennacl-devel Subject: Re: [ViennaCL-devel] Copying Values out of a compressed_matrix Oops, sorry about not cc'ing all. I do not get correct data back for a (Random.nextDouble() populated) 1 x 1 Matrix. A: Row Pointer: [0, 1 ] Col Pointer: [0 ] element Pointer: [0.6465821602909256 ] B: Row Pointer: [0, 1 ] Col Pointer: [0 ] element Pointer: [0.9513577109193919 ] C = A %*% B Row Pointer: [469762248, 32632] Col Pointer: [469762248 ] element Pointer: [6.9245198744523E-310 ] ouch. It looks like I'm not copying the Buffers correctly at all. I'm may be using the javacpp buffers incorrectly here, or I have possibly wrapped the viennacl::backend::memory_handle class incorrectly, so I'm using a pointer to the wrong memory from eg. viennacl::compressed_matrix::handle. I mentioned before that the multiplication completed in on small <~300 x 300 matrices because if I try to multiply two larger sparse matrices, an err the JVM crashes with a SIGSEGV. Since this code is all wrapped with javacpp, I don't really have a small sample that I can show you (not going to dump a whole bunch of code on you). I'll keep trying to figure it out. Pretty sure the problem is on my end here [?] I really mainly wanted to ask you if I was using the correct methods at this point, or if there was anything very obviously that I was doing wrong. Thanks a lot for your help! Andy ________________________________ From: Karl Rupp <ru...@iu...> Sent: Wednesday, July 20, 2016 5:00:36 PM To: Andrew Palumbo; viennacl-devel Subject: Re: [ViennaCL-devel] Copying Values out of a compressed_matrix Hi, please keep viennacl-devel in CC: Just to clarify: Do you get incorrect values for a 1-by-1 matrix as indicated in your sample data? In your previous email you mentioned that results are fine for small matrices... I'm afraid I can only guess at the source of the error with the informations provided. Any chance that you can provide a standalone code to reproduce the problem with reasonable effort? Best regards, Karli On 07/20/2016 10:16 PM, Andrew Palumbo wrote: > Thanks so much for your quick answer! > > > I actually am sorry to say that I made a mistake when writing the last > email, I copied the wrong signature from the VCL documentation, and then > the mistake propagated through the rest of the e-mail. > > > I am actually using viennacl::backend::memory_read(). > > > Eg, for the row_jumpers and column_idx I read use: > > @Name("backend::memory_read") > public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > IntPointer ptr, > boolean async); > > and for the Values: > > > @Name("backend::memory_read") > public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > DoublePointer ptr, > boolean async); > > And then call: > > > memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) > memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) > memoryReadDouble(element_handle, NNz *8,0, values,false) > > > and after convetring them to java.nio.Buffers, am getting results like: > > > rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310 > > > Have also tried reading into BytePointers similarly with the same type > of results. I know that the use of Javacpp obfuscates what the problem > may be. But I believe the Memorry is properly allocated. > > > > Sorry for the mistake. > > > Thanks, > > > Andy > > > ------------------------------------------------------------------------ > *From:* Karl Rupp <ru...@iu...> > *Sent:* Wednesday, July 20, 2016 3:50:07 PM > *To:* Andrew Palumbo; Vie...@li... > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > Hi Andy, > > instead of viennacl::backend::memory_copy(), you want to use > viennacl::backend::memory_read(), which directly transfers the data into > your buffer(s). > > If you *know* that your handles are in host memory, you can even grab > the values directly via > viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); > defined in viennacl/linalg/host_based/common.hpp, around line 40. > > Please let me know if you still get errors after using that. > > Best regards, > Karli > > > > > On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >> Hello, >> >> >> I'm Having some difficulties with compressed_matrix multiplication. >> >> >> Essentially I am copying three buffers, the CSR conversion of an Apache >> Mahout SparseMatrix, into two compressed_matrices performing matrix >> multiplication. I am doing this in scala and Java using javacpp. >> >> >> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR >> format looks like this: >> >> >> NNz: 12 >> >> Row Pointer: [0, 1, 4, 6, 9, 12, ] >> >> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >> >> element Pointer: [0.4065367203992265, 0.04957158909682802, >> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >> 0.9710498974366047, ] >> >> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >> >> I use a CompressedMatrix wrapper which essentially wraps the >> >> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >> >> constructor as well as the >> >> compressed_matrix (matrix_expression< const compressed_matrix, >> const compressed_matrix, op_prod > const &proxy). >> >> I have a helper function, /toVclCompressedMatrix/(..) which essentially >> does the CSR conversion from a Mahout src matrix, calls the constructor >> and uses viennacl::compressed_matrix::set(...) to set the buffers: >> >> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >> >> >> and then create a new viennacl::compressed_matrix from the >> viennacl::linalg::prod of the 2 matrices i.e.: >> >> val ompC =new CompressedMatrix(prod(ompA, ompB)) >> >> The context in the above case is either the Host or OpenMP (I know that >> there is some special casting of the row_jumpers and col_idxs that needs >> to be done in the OpenCL version) >> >> The Matrix multiplication completes without error on small Matrices eg. >> < 300 x 300 >> but seems to overwrite the resulting buffers on larger Matrices. >> >> My real problem, though is getting the memory back out of the >> resulting`ompC` compresed_matrix so that i can write it back to a mahout >> SparseMatrix. >> >> currently I am using: >> >> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >> mem_handle & dst_buffer, >> vcl_size_t src_offset, >> vcl_size_t dst_offset, >> vcl_size_t bytes_to_copy >> ) >> >> on ompC.handel1,ompC.handel2 and ompC.handel source handels >> >> to copy into pre-allocated row_jumper, col_index and element buffers >> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >> >> I am getting nonsensical values back that one would expect from memory >> errors. eg: >> >> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >> correct and ompC.nnz is a reasonable value. >> >> It is possible that I have mis-allocated some of the memory on my side, >> but I am pretty sure that most of the Buffers are allocated correctly >> (usually JavaCPP does a pretty good job of this). >> >> >> I guess, long story short, my question is am i using the correct method >> of copying the memory out of a compressed_matrix? is there something >> glaringly incorrect that i am doing here? Should I be using >> viennacl::backend::memory_copy or is there a different method that i >> should be using? >> >> >> Thanks very much, >> >> Andy >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity planning >> reports.http://sdm.link/zohodev2dev >> >> >> >> _______________________________________________ >> ViennaCL-devel mailing list >> Vie...@li... >>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> > |
From: Andrew P. <ap...@ou...> - 2016-07-22 03:04:11
|
Hello, I've mocked up a sample of the compressed_matrix multiplication that I've been working with javacpp on in C++. I am seeing the same type of memory errors when I try to read the data out of product, and into the output buffers as I was with javacpp. By printing the matrix to stdout as in the compressed_matrix example we can see that there are values there, and they seem reasonable, but when i use backend::memory_read(...) to retrive the buffers, I'm getting values consistent with a memory error, and similar to what i was seeing in the javacpp code. Maybe I am not using the handles correctly? Admittedly my C++ is more than rusty, but I believe I am referencing the buffers correctly in the output. Below is the output of the attached file: sparse.cpp Thanks very much, Andy ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros: (1, 2) 0.329908 (1, 3) 0.0110522 (1, 4) 0.336839 (2, 5) 0.0150778 (2, 7) 0.0143518 (3, 3) 0.217256 (3, 6) 0.346854 (3, 9) 0.45353 (4, 3) 0.407954 (4, 6) 0.651308 (5, 2) 0.676061 (5, 3) 0.0226486 (5, 4) 0.690264 (6, 5) 0.0998838 (6, 7) 0.0950744 (7, 2) 0.346173 (7, 3) 0.0115971 (7, 4) 0.353446 (7, 9) 0.684458 (8, 5) 0.0448123 (8, 7) 0.0426546 (8, 9) 0.82782 (9, 5) 0.295356 (9, 7) 0.281134 row jumpers: [ -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968,32767,4203729,] col ptrs: [ 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,] elements: [ 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,] and similarly for multiplication of 2 1x1 matrices: Result: ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros: (0, 0) 0.117699 row jumpers: [ -717571424,32767,] col ptrs: [ 6386240,] elements: [ 0.289516,6.9479e-310,] ________________________________ From: Andrew Palumbo <ap...@ou...> Sent: Wednesday, July 20, 2016 5:40:31 PM To: Karl Rupp; viennacl-devel Subject: Re: [ViennaCL-devel] Copying Values out of a compressed_matrix Oops, sorry about not cc'ing all. I do not get correct data back for a (Random.nextDouble() populated) 1 x 1 Matrix. A: Row Pointer: [0, 1 ] Col Pointer: [0 ] element Pointer: [0.6465821602909256 ] B: Row Pointer: [0, 1 ] Col Pointer: [0 ] element Pointer: [0.9513577109193919 ] C = A %*% B Row Pointer: [469762248, 32632] Col Pointer: [469762248 ] element Pointer: [6.9245198744523E-310 ] ouch. It looks like I'm not copying the Buffers correctly at all. I'm may be using the javacpp buffers incorrectly here, or I have possibly wrapped the viennacl::backend::memory_handle class incorrectly, so I'm using a pointer to the wrong memory from eg. viennacl::compressed_matrix::handle. I mentioned before that the multiplication completed in on small <~300 x 300 matrices because if I try to multiply two larger sparse matrices, an err the JVM crashes with a SIGSEGV. Since this code is all wrapped with javacpp, I don't really have a small sample that I can show you (not going to dump a whole bunch of code on you). I'll keep trying to figure it out. Pretty sure the problem is on my end here [?] I really mainly wanted to ask you if I was using the correct methods at this point, or if there was anything very obviously that I was doing wrong. Thanks a lot for your help! Andy ________________________________ From: Karl Rupp <ru...@iu...> Sent: Wednesday, July 20, 2016 5:00:36 PM To: Andrew Palumbo; viennacl-devel Subject: Re: [ViennaCL-devel] Copying Values out of a compressed_matrix Hi, please keep viennacl-devel in CC: Just to clarify: Do you get incorrect values for a 1-by-1 matrix as indicated in your sample data? In your previous email you mentioned that results are fine for small matrices... I'm afraid I can only guess at the source of the error with the informations provided. Any chance that you can provide a standalone code to reproduce the problem with reasonable effort? Best regards, Karli On 07/20/2016 10:16 PM, Andrew Palumbo wrote: > Thanks so much for your quick answer! > > > I actually am sorry to say that I made a mistake when writing the last > email, I copied the wrong signature from the VCL documentation, and then > the mistake propagated through the rest of the e-mail. > > > I am actually using viennacl::backend::memory_read(). > > > Eg, for the row_jumpers and column_idx I read use: > > @Name("backend::memory_read") > public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > IntPointer ptr, > boolean async); > > and for the Values: > > > @Name("backend::memory_read") > public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > DoublePointer ptr, > boolean async); > > And then call: > > > memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) > memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) > memoryReadDouble(element_handle, NNz *8,0, values,false) > > > and after convetring them to java.nio.Buffers, am getting results like: > > > rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310 > > > Have also tried reading into BytePointers similarly with the same type > of results. I know that the use of Javacpp obfuscates what the problem > may be. But I believe the Memorry is properly allocated. > > > > Sorry for the mistake. > > > Thanks, > > > Andy > > > ------------------------------------------------------------------------ > *From:* Karl Rupp <ru...@iu...> > *Sent:* Wednesday, July 20, 2016 3:50:07 PM > *To:* Andrew Palumbo; Vie...@li... > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > Hi Andy, > > instead of viennacl::backend::memory_copy(), you want to use > viennacl::backend::memory_read(), which directly transfers the data into > your buffer(s). > > If you *know* that your handles are in host memory, you can even grab > the values directly via > viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); > defined in viennacl/linalg/host_based/common.hpp, around line 40. > > Please let me know if you still get errors after using that. > > Best regards, > Karli > > > > > On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >> Hello, >> >> >> I'm Having some difficulties with compressed_matrix multiplication. >> >> >> Essentially I am copying three buffers, the CSR conversion of an Apache >> Mahout SparseMatrix, into two compressed_matrices performing matrix >> multiplication. I am doing this in scala and Java using javacpp. >> >> >> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR >> format looks like this: >> >> >> NNz: 12 >> >> Row Pointer: [0, 1, 4, 6, 9, 12, ] >> >> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >> >> element Pointer: [0.4065367203992265, 0.04957158909682802, >> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >> 0.9710498974366047, ] >> >> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >> >> I use a CompressedMatrix wrapper which essentially wraps the >> >> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >> >> constructor as well as the >> >> compressed_matrix (matrix_expression< const compressed_matrix, >> const compressed_matrix, op_prod > const &proxy). >> >> I have a helper function, /toVclCompressedMatrix/(..) which essentially >> does the CSR conversion from a Mahout src matrix, calls the constructor >> and uses viennacl::compressed_matrix::set(...) to set the buffers: >> >> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >> >> >> and then create a new viennacl::compressed_matrix from the >> viennacl::linalg::prod of the 2 matrices i.e.: >> >> val ompC =new CompressedMatrix(prod(ompA, ompB)) >> >> The context in the above case is either the Host or OpenMP (I know that >> there is some special casting of the row_jumpers and col_idxs that needs >> to be done in the OpenCL version) >> >> The Matrix multiplication completes without error on small Matrices eg. >> < 300 x 300 >> but seems to overwrite the resulting buffers on larger Matrices. >> >> My real problem, though is getting the memory back out of the >> resulting`ompC` compresed_matrix so that i can write it back to a mahout >> SparseMatrix. >> >> currently I am using: >> >> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >> mem_handle & dst_buffer, >> vcl_size_t src_offset, >> vcl_size_t dst_offset, >> vcl_size_t bytes_to_copy >> ) >> >> on ompC.handel1,ompC.handel2 and ompC.handel source handels >> >> to copy into pre-allocated row_jumper, col_index and element buffers >> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >> >> I am getting nonsensical values back that one would expect from memory >> errors. eg: >> >> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >> correct and ompC.nnz is a reasonable value. >> >> It is possible that I have mis-allocated some of the memory on my side, >> but I am pretty sure that most of the Buffers are allocated correctly >> (usually JavaCPP does a pretty good job of this). >> >> >> I guess, long story short, my question is am i using the correct method >> of copying the memory out of a compressed_matrix? is there something >> glaringly incorrect that i am doing here? Should I be using >> viennacl::backend::memory_copy or is there a different method that i >> should be using? >> >> >> Thanks very much, >> >> Andy >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity planning >> reports.http://sdm.link/zohodev2dev >> >> >> >> _______________________________________________ >> ViennaCL-devel mailing list >> Vie...@li... >>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> > |
From: Andrew P. <ap...@ou...> - 2016-07-20 21:40:49
|
Oops, sorry about not cc'ing all. I do not get correct data back for a (Random.nextDouble() populated) 1 x 1 Matrix. A: Row Pointer: [0, 1 ] Col Pointer: [0 ] element Pointer: [0.6465821602909256 ] B: Row Pointer: [0, 1 ] Col Pointer: [0 ] element Pointer: [0.9513577109193919 ] C = A %*% B Row Pointer: [469762248, 32632] Col Pointer: [469762248 ] element Pointer: [6.9245198744523E-310 ] ouch. It looks like I'm not copying the Buffers correctly at all. I'm may be using the javacpp buffers incorrectly here, or I have possibly wrapped the viennacl::backend::memory_handle class incorrectly, so I'm using a pointer to the wrong memory from eg. viennacl::compressed_matrix::handle. I mentioned before that the multiplication completed in on small <~300 x 300 matrices because if I try to multiply two larger sparse matrices, an err the JVM crashes with a SIGSEGV. Since this code is all wrapped with javacpp, I don't really have a small sample that I can show you (not going to dump a whole bunch of code on you). I'll keep trying to figure it out. Pretty sure the problem is on my end here [?] I really mainly wanted to ask you if I was using the correct methods at this point, or if there was anything very obviously that I was doing wrong. Thanks a lot for your help! Andy ________________________________ From: Karl Rupp <ru...@iu...> Sent: Wednesday, July 20, 2016 5:00:36 PM To: Andrew Palumbo; viennacl-devel Subject: Re: [ViennaCL-devel] Copying Values out of a compressed_matrix Hi, please keep viennacl-devel in CC: Just to clarify: Do you get incorrect values for a 1-by-1 matrix as indicated in your sample data? In your previous email you mentioned that results are fine for small matrices... I'm afraid I can only guess at the source of the error with the informations provided. Any chance that you can provide a standalone code to reproduce the problem with reasonable effort? Best regards, Karli On 07/20/2016 10:16 PM, Andrew Palumbo wrote: > Thanks so much for your quick answer! > > > I actually am sorry to say that I made a mistake when writing the last > email, I copied the wrong signature from the VCL documentation, and then > the mistake propagated through the rest of the e-mail. > > > I am actually using viennacl::backend::memory_read(). > > > Eg, for the row_jumpers and column_idx I read use: > > @Name("backend::memory_read") > public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > IntPointer ptr, > boolean async); > > and for the Values: > > > @Name("backend::memory_read") > public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > DoublePointer ptr, > boolean async); > > And then call: > > > memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) > memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) > memoryReadDouble(element_handle, NNz *8,0, values,false) > > > and after convetring them to java.nio.Buffers, am getting results like: > > > rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310 > > > Have also tried reading into BytePointers similarly with the same type > of results. I know that the use of Javacpp obfuscates what the problem > may be. But I believe the Memorry is properly allocated. > > > > Sorry for the mistake. > > > Thanks, > > > Andy > > > ------------------------------------------------------------------------ > *From:* Karl Rupp <ru...@iu...> > *Sent:* Wednesday, July 20, 2016 3:50:07 PM > *To:* Andrew Palumbo; Vie...@li... > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > Hi Andy, > > instead of viennacl::backend::memory_copy(), you want to use > viennacl::backend::memory_read(), which directly transfers the data into > your buffer(s). > > If you *know* that your handles are in host memory, you can even grab > the values directly via > viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); > defined in viennacl/linalg/host_based/common.hpp, around line 40. > > Please let me know if you still get errors after using that. > > Best regards, > Karli > > > > > On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >> Hello, >> >> >> I'm Having some difficulties with compressed_matrix multiplication. >> >> >> Essentially I am copying three buffers, the CSR conversion of an Apache >> Mahout SparseMatrix, into two compressed_matrices performing matrix >> multiplication. I am doing this in scala and Java using javacpp. >> >> >> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR >> format looks like this: >> >> >> NNz: 12 >> >> Row Pointer: [0, 1, 4, 6, 9, 12, ] >> >> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >> >> element Pointer: [0.4065367203992265, 0.04957158909682802, >> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >> 0.9710498974366047, ] >> >> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >> >> I use a CompressedMatrix wrapper which essentially wraps the >> >> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >> >> constructor as well as the >> >> compressed_matrix (matrix_expression< const compressed_matrix, >> const compressed_matrix, op_prod > const &proxy). >> >> I have a helper function, /toVclCompressedMatrix/(..) which essentially >> does the CSR conversion from a Mahout src matrix, calls the constructor >> and uses viennacl::compressed_matrix::set(...) to set the buffers: >> >> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >> >> >> and then create a new viennacl::compressed_matrix from the >> viennacl::linalg::prod of the 2 matrices i.e.: >> >> val ompC =new CompressedMatrix(prod(ompA, ompB)) >> >> The context in the above case is either the Host or OpenMP (I know that >> there is some special casting of the row_jumpers and col_idxs that needs >> to be done in the OpenCL version) >> >> The Matrix multiplication completes without error on small Matrices eg. >> < 300 x 300 >> but seems to overwrite the resulting buffers on larger Matrices. >> >> My real problem, though is getting the memory back out of the >> resulting`ompC` compresed_matrix so that i can write it back to a mahout >> SparseMatrix. >> >> currently I am using: >> >> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >> mem_handle & dst_buffer, >> vcl_size_t src_offset, >> vcl_size_t dst_offset, >> vcl_size_t bytes_to_copy >> ) >> >> on ompC.handel1,ompC.handel2 and ompC.handel source handels >> >> to copy into pre-allocated row_jumper, col_index and element buffers >> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >> >> I am getting nonsensical values back that one would expect from memory >> errors. eg: >> >> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >> correct and ompC.nnz is a reasonable value. >> >> It is possible that I have mis-allocated some of the memory on my side, >> but I am pretty sure that most of the Buffers are allocated correctly >> (usually JavaCPP does a pretty good job of this). >> >> >> I guess, long story short, my question is am i using the correct method >> of copying the memory out of a compressed_matrix? is there something >> glaringly incorrect that i am doing here? Should I be using >> viennacl::backend::memory_copy or is there a different method that i >> should be using? >> >> >> Thanks very much, >> >> Andy >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity planning >> reports.http://sdm.link/zohodev2dev >> >> >> >> _______________________________________________ >> ViennaCL-devel mailing list >> Vie...@li... >>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> > |
From: Karl R. <ru...@iu...> - 2016-07-20 21:00:47
|
Hi, please keep viennacl-devel in CC: Just to clarify: Do you get incorrect values for a 1-by-1 matrix as indicated in your sample data? In your previous email you mentioned that results are fine for small matrices... I'm afraid I can only guess at the source of the error with the informations provided. Any chance that you can provide a standalone code to reproduce the problem with reasonable effort? Best regards, Karli On 07/20/2016 10:16 PM, Andrew Palumbo wrote: > Thanks so much for your quick answer! > > > I actually am sorry to say that I made a mistake when writing the last > email, I copied the wrong signature from the VCL documentation, and then > the mistake propagated through the rest of the e-mail. > > > I am actually using viennacl::backend::memory_read(). > > > Eg, for the row_jumpers and column_idx I read use: > > @Name("backend::memory_read") > public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > IntPointer ptr, > boolean async); > > and for the Values: > > > @Name("backend::memory_read") > public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer, > int bytes_to_read, > int offset, > DoublePointer ptr, > boolean async); > > And then call: > > > memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false) > memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false) > memoryReadDouble(element_handle, NNz *8,0, values,false) > > > and after convetring them to java.nio.Buffers, am getting results like: > > > rowBuff.get(1): 0 colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310 > > > Have also tried reading into BytePointers similarly with the same type > of results. I know that the use of Javacpp obfuscates what the problem > may be. But I believe the Memorry is properly allocated. > > > > Sorry for the mistake. > > > Thanks, > > > Andy > > > ------------------------------------------------------------------------ > *From:* Karl Rupp <ru...@iu...> > *Sent:* Wednesday, July 20, 2016 3:50:07 PM > *To:* Andrew Palumbo; Vie...@li... > *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix > Hi Andy, > > instead of viennacl::backend::memory_copy(), you want to use > viennacl::backend::memory_read(), which directly transfers the data into > your buffer(s). > > If you *know* that your handles are in host memory, you can even grab > the values directly via > viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); > defined in viennacl/linalg/host_based/common.hpp, around line 40. > > Please let me know if you still get errors after using that. > > Best regards, > Karli > > > > > On 07/20/2016 09:05 PM, Andrew Palumbo wrote: >> Hello, >> >> >> I'm Having some difficulties with compressed_matrix multiplication. >> >> >> Essentially I am copying three buffers, the CSR conversion of an Apache >> Mahout SparseMatrix, into two compressed_matrices performing matrix >> multiplication. I am doing this in scala and Java using javacpp. >> >> >> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR >> format looks like this: >> >> >> NNz: 12 >> >> Row Pointer: [0, 1, 4, 6, 9, 12, ] >> >> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] >> >> element Pointer: [0.4065367203992265, 0.04957158909682802, >> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, >> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, >> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, >> 0.9710498974366047, ] >> >> Multiplied by a similarly Sparse 10 x 5 compressed_matrix >> >> I use a CompressedMatrix wrapper which essentially wraps the >> >> viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, >> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) >> >> constructor as well as the >> >> compressed_matrix (matrix_expression< const compressed_matrix, >> const compressed_matrix, op_prod > const &proxy). >> >> I have a helper function, /toVclCompressedMatrix/(..) which essentially >> does the CSR conversion from a Mahout src matrix, calls the constructor >> and uses viennacl::compressed_matrix::set(...) to set the buffers: >> >> val ompA =toVclCompressedMatrix(src = mxA, ompCtx) >> val ompB =toVclCompressedMatrix(src = mxB, ompCtx) >> >> >> and then create a new viennacl::compressed_matrix from the >> viennacl::linalg::prod of the 2 matrices i.e.: >> >> val ompC =new CompressedMatrix(prod(ompA, ompB)) >> >> The context in the above case is either the Host or OpenMP (I know that >> there is some special casting of the row_jumpers and col_idxs that needs >> to be done in the OpenCL version) >> >> The Matrix multiplication completes without error on small Matrices eg. >> < 300 x 300 >> but seems to overwrite the resulting buffers on larger Matrices. >> >> My real problem, though is getting the memory back out of the >> resulting`ompC` compresed_matrix so that i can write it back to a mahout >> SparseMatrix. >> >> currently I am using: >> >> void viennacl::backend::memory_copy (mem_handle const & src_buffer, >> mem_handle & dst_buffer, >> vcl_size_t src_offset, >> vcl_size_t dst_offset, >> vcl_size_t bytes_to_copy >> ) >> >> on ompC.handel1,ompC.handel2 and ompC.handel source handels >> >> to copy into pre-allocated row_jumper, col_index and element buffers >> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). >> >> I am getting nonsensical values back that one would expect from memory >> errors. eg: >> >> the Matrix geometry of the result: ompC.size1(), and omp.size2() are >> correct and ompC.nnz is a reasonable value. >> >> It is possible that I have mis-allocated some of the memory on my side, >> but I am pretty sure that most of the Buffers are allocated correctly >> (usually JavaCPP does a pretty good job of this). >> >> >> I guess, long story short, my question is am i using the correct method >> of copying the memory out of a compressed_matrix? is there something >> glaringly incorrect that i am doing here? Should I be using >> viennacl::backend::memory_copy or is there a different method that i >> should be using? >> >> >> Thanks very much, >> >> Andy >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity planning >> reports.http://sdm.link/zohodev2dev >> >> >> >> _______________________________________________ >> ViennaCL-devel mailing list >> Vie...@li... >>https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> > |
From: Karl R. <ru...@iu...> - 2016-07-20 19:50:17
|
Hi Andy, instead of viennacl::backend::memory_copy(), you want to use viennacl::backend::memory_read(), which directly transfers the data into your buffer(s). If you *know* that your handles are in host memory, you can even grab the values directly via viennacl::linalg::host_based::detail::extract_raw_pointer<T>(); defined in viennacl/linalg/host_based/common.hpp, around line 40. Please let me know if you still get errors after using that. Best regards, Karli On 07/20/2016 09:05 PM, Andrew Palumbo wrote: > Hello, > > > I'm Having some difficulties with compressed_matrix multiplication. > > > Essentially I am copying three buffers, the CSR conversion of an Apache > Mahout SparseMatrix, into two compressed_matrices performing matrix > multiplication. I am doing this in scala and Java using javacpp. > > > For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR > format looks like this: > > > NNz: 12 > > Row Pointer: [0, 1, 4, 6, 9, 12, ] > > Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] > > element Pointer: [0.4065367203992265, 0.04957158909682802, > 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, > 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, > 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, > 0.9710498974366047, ] > > Multiplied by a similarly Sparse 10 x 5 compressed_matrix > > I use a CompressedMatrix wrapper which essentially wraps the > > viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, > vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) > > constructor as well as the > > compressed_matrix (matrix_expression< const compressed_matrix, > const compressed_matrix, op_prod > const &proxy). > > I have a helper function, /toVclCompressedMatrix/(..) which essentially > does the CSR conversion from a Mahout src matrix, calls the constructor > and uses viennacl::compressed_matrix::set(...) to set the buffers: > > val ompA =toVclCompressedMatrix(src = mxA, ompCtx) > val ompB =toVclCompressedMatrix(src = mxB, ompCtx) > > > and then create a new viennacl::compressed_matrix from the > viennacl::linalg::prod of the 2 matrices i.e.: > > val ompC =new CompressedMatrix(prod(ompA, ompB)) > > The context in the above case is either the Host or OpenMP (I know that > there is some special casting of the row_jumpers and col_idxs that needs > to be done in the OpenCL version) > > The Matrix multiplication completes without error on small Matrices eg. > < 300 x 300 > but seems to overwrite the resulting buffers on larger Matrices. > > My real problem, though is getting the memory back out of the > resulting`ompC` compresed_matrix so that i can write it back to a mahout > SparseMatrix. > > currently I am using: > > void viennacl::backend::memory_copy (mem_handle const & src_buffer, > mem_handle & dst_buffer, > vcl_size_t src_offset, > vcl_size_t dst_offset, > vcl_size_t bytes_to_copy > ) > > on ompC.handel1,ompC.handel2 and ompC.handel source handels > > to copy into pre-allocated row_jumper, col_index and element buffers > (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). > > I am getting nonsensical values back that one would expect from memory > errors. eg: > > the Matrix geometry of the result: ompC.size1(), and omp.size2() are > correct and ompC.nnz is a reasonable value. > > It is possible that I have mis-allocated some of the memory on my side, > but I am pretty sure that most of the Buffers are allocated correctly > (usually JavaCPP does a pretty good job of this). > > > I guess, long story short, my question is am i using the correct method > of copying the memory out of a compressed_matrix? is there something > glaringly incorrect that i am doing here? Should I be using > viennacl::backend::memory_copy or is there a different method that i > should be using? > > > Thanks very much, > > Andy > > > > > > > > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity planning > reports.http://sdm.link/zohodev2dev > > > > _______________________________________________ > ViennaCL-devel mailing list > Vie...@li... > https://lists.sourceforge.net/lists/listinfo/viennacl-devel > |
From: Andrew P. <ap...@ou...> - 2016-07-20 19:05:46
|
Hello, I'm Having some difficulties with compressed_matrix multiplication. Essentially I am copying three buffers, the CSR conversion of an Apache Mahout SparseMatrix, into two compressed_matrices performing matrix multiplication. I am doing this in scala and Java using javacpp. For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR format looks like this: NNz: 12 Row Pointer: [0, 1, 4, 6, 9, 12, ] Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ] element Pointer: [0.4065367203992265, 0.04957158909682802, 0.5205586068847993, 0.3708618354358446, 0.6963900565931678, 0.8330915529787706, 0.32839112750638844, 0.7856168903297948, 0.4265801782090245, 0.14733066454561583, 0.9501663495824946, 0.9710498974366047, ] Multiplied by a similarly Sparse 10 x 5 compressed_matrix I use a CompressedMatrix wrapper which essentially wraps the viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols, vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context()) constructor as well as the compressed_matrix (matrix_expression< const compressed_matrix, const compressed_matrix, op_prod > const &proxy). I have a helper function, toVclCompressedMatrix(..) which essentially does the CSR conversion from a Mahout src matrix, calls the constructor and uses viennacl::compressed_matrix::set(...) to set the buffers: val ompA = toVclCompressedMatrix(src = mxA, ompCtx) val ompB = toVclCompressedMatrix(src = mxB, ompCtx) and then create a new viennacl::compressed_matrix from the viennacl::linalg::prod of the 2 matrices i.e.: val ompC = new CompressedMatrix(prod(ompA, ompB)) The context in the above case is either the Host or OpenMP (I know that there is some special casting of the row_jumpers and col_idxs that needs to be done in the OpenCL version) The Matrix multiplication completes without error on small Matrices eg. < 300 x 300 but seems to overwrite the resulting buffers on larger Matrices. My real problem, though is getting the memory back out of the resulting `ompC` compresed_matrix so that i can write it back to a mahout SparseMatrix. currently I am using: void viennacl::backend::memory_copy (mem_handle const & src_buffer, mem_handle & dst_buffer, vcl_size_t src_offset, vcl_size_t dst_offset, vcl_size_t bytes_to_copy ) on ompC.handel1, ompC.handel2 and ompC.handel source handels to copy into pre-allocated row_jumper, col_index and element buffers (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly). I am getting nonsensical values back that one would expect from memory errors. eg: the Matrix geometry of the result: ompC.size1(), and omp.size2() are correct and ompC.nnz is a reasonable value. It is possible that I have mis-allocated some of the memory on my side, but I am pretty sure that most of the Buffers are allocated correctly (usually JavaCPP does a pretty good job of this). I guess, long story short, my question is am i using the correct method of copying the memory out of a compressed_matrix? is there something glaringly incorrect that i am doing here? Should I be using viennacl::backend::memory_copy or is there a different method that i should be using? Thanks very much, Andy |
From: Karl R. <ru...@iu...> - 2016-07-19 06:53:36
|
Hi Sumit, > I point your attention to this : > http://www.iue.tuwien.ac.at/cse/index.php/gsoc/2011/ideas-2011/104-viennacl-mpi-layer-for-linear-algebra-with-large-matrices-new.html > > has this ever been incorporated into ViennaCL? No, no student ever worked on this. However, ViennaCL's functionality is available in an MPI setting via PETSc [1]. This is much better than offering MPI-funtionality in ViennaCL directly. Best regards, Karli [1] http://www.mcs.anl.gov/petsc/ |
From: Sumit K. <dos...@ya...> - 2016-07-19 06:46:11
|
Karl, I point your attention to this : http://www.iue.tuwien.ac.at/cse/index.php/gsoc/2011/ideas-2011/104-viennacl-mpi-layer-for-linear-algebra-with-large-matrices-new.html has this ever been incorporated into ViennaCL? Thanks and Regards Sumit |
From: Karl R. <ru...@iu...> - 2016-07-18 17:46:53
|
Hi, > Is DGEMM your performance-critical operation? Are there any other > performance-critical operations? > > > For now we are only looking at (especially sparse) blas3 and > decompositions. Basically, your normal R base functionality for > in-memory sparse algebra. Sparse factorizations (LU, QR, etc.) are very hard to parallelize for many-core architectures (GPUs in particular). > One more question i had: > > do you guys handle low resource cases? like transfer optimization for > blockwise multiplication in case operands do not fit -- out-of-core > algorithms? out-of-core has gone out-of-fashion. The reason is that the differences in memory speed has become so large that falling back to a slower memory type almost never pays off. > Did you look at gpu+cpu combined balanced algorithms (as i guess MAGMA > did for some)? yes, a couple of algorithms in ViennaCL use GPUs for the main work (i.e. GEMM) and CPUs for sequential in the algorithm. Best regards, Karli |
From: Dmitriy L. <dl...@gm...> - 2016-07-18 17:42:20
|
Thank you, Karl! this is very helpful! > Is DGEMM your performance-critical operation? Are there any other > performance-critical operations? > > > For now we are only looking at (especially sparse) blas3 and decompositions. Basically, your normal R base functionality for in-memory sparse algebra. One more question i had: do you guys handle low resource cases? like transfer optimization for blockwise multiplication in case operands do not fit -- out-of-core algorithms? Did you look at gpu+cpu combined balanced algorithms (as i guess MAGMA did for some)? |
From: Karl R. <ru...@iu...> - 2016-07-18 11:42:05
|
> Why do you expect to beat OpenBLAS? Their kernels are really well > optimized, and for lare dense matrix-matrix you are always FLOP-limited. > > > I don't expect, i experiment. I don't know why, current results are such > that stock ubuntu blas takes about 88 seconds for dense 10k > multiplication test (with R which is setup to use it, perhaps they also > take long time to convert to blas, but nevertheless it pins cpu 100%). > If i compile Vienna with -march=haswell and -ffast-math then i get about > 35 seconds. What's purplexing, the same test in bidmat's MatD matrices > takes less than 10 seconds on my computer -- and they don't even > saturate my cpu 100%. Something is fishy about bidmat. I don't have a > super-beafy cpu, only a 6-core/12threads haswell-e. I know that even mkl > takes in the area of 16 seconds on 24 threads in xeons, so 88 seconds > for openblas on my platform looks plausible. 10 or even 8 seconds > (BidMat+supposedly MKL) does not -- something is fishy there. it shouldn't be too hard to directly verify correctness of the results :-) > Multiplication of 10k-by-10k matrices amounts to 200 GFLOP of > compute in double precision. A Haswell-E machine provides that > within a few seconds, depending on the number of cores (2.4 GHz * 4 > doubles with AVX * 2 for FMA = 19.2 GFLOP/sec per core. MKL achieves > about 15 GFLOP/sec per core). > > > So this sounds like a validation of the BidMat's results. Interesting. > Why R+openblas is so slow then? What is the expected output for ViennaCL > + OpenMP then compared to MKL rates? I don't know the internals of R+OpenBLAS. Maybe there is extensive debugging going, or OpenBLAS is only used with a single thread. ViennaCL+OpenMP vs. MKL is hard to answer in general. It all depends a lot on compiler flags, the underlying CPU, etc. > How much of improvement do you observe/expect from a new pull request, > is there any hope to get closer to MKL dense dgemm? The student reported about 50 percent of MKL on a laptop CPU. More importantly, though, is that the new code provides a good infrastructure for further improvements for different architectures, e.g. ARM-based CPUs. > The primary reason against blas/mkl are that they are yet another > platform which, most importantly, we cannot redistribute being an > apache2 licensed. So we'd have to ask people to install a particular > commercial product, but if ViennaCL would cover our sparse algorithm > needs, we'd rather just have it all in one package (or at least leverage > hardware/software support in steps). We are very limited in resources, > that's why reason we are trying to get working with ViennaCL: > > -- it has sparse algorithms > -- it supports host/OpenCL/cuda with need for new apis/conversions > -- it does not require installation of any shared libraries beyond what > javacpp already does for us automagically. So we basically can drop a > jar with javacpp in it into a spark application and having it running on > ViennaCL. Even netlib (blas) or netlib-java api does not make it quite > as easy (which btw we cannot redistribute either becaause of their > licenses). ah, makes sense! > This is hard to beat, especially if ViennaCL becomes well-rounded in > performance in most areas of interest, we don't need to depend on a > particular flavor of libblas.so to be present (or any libblas.so for > that matter). Is DGEMM your performance-critical operation? Are there any other performance-critical operations? > One more question: is it possible to copy one matrix into an openCL > device while solving another? > thank you! yes, that is possible using async_copy(). I recommend to copy before the solver is started. You can also achieve a similar effect through a second OpenCL command queue. (Needless to say, you should first profile in order to find out whether it is worth the effort) Best regards, Karli |
From: Dmitriy L. <dl...@gm...> - 2016-07-15 23:02:21
|
On Thu, Jul 14, 2016 at 11:21 AM, Dmitriy Lyubimov <dl...@gm...> wrote: > One more question: is it possible to copy one matrix into an openCL device > while solving another? > thank you! > >> >> Also known in cuda as "concurrent kernel and execution" capability? Thank you. |
From: Dmitriy L. <dl...@gm...> - 2016-07-14 18:25:08
|
sorry this should read > > I don't expect, i experiment. I don't know why, current results are such > that stock ubuntu OPENBLAS takes about 88 seconds for dense 10k > multiplication test > |
From: Dmitriy L. <dl...@gm...> - 2016-07-14 18:21:36
|
Karl, thank you for your reply! On Thu, Jul 14, 2016 at 1:45 AM, Karl Rupp <ru...@iu...> wrote: > Hi again, > > >> > 15 seconds of copying for a 10k-by-10k matrix looks way too much. > 10k-by-10k is 800 MB of data for double precision, so this should not take > much more than 100 ms on a low-range laptop (10 GB/sec memory bandwidth). > Even with multiple matrices and copies you should stay in the 1 second > regime. You are right. i don't see a significant variation weither i use fast_copy or constructor. The time at this point is mostly consumed by moving mahout data structures into RM or CCS format and it is really a POC now so we are working to get it faster. But java is really slow, especially when working with native buffers naively -- we will have to improve that. For the record, 15 seconds probably include loading all necessary classes, and to serialize 2 10k-x-10k matrices in and 1 10k x k matrix out, including all these scala-side conversions. > Why do you expect to beat OpenBLAS? Their kernels are really well > optimized, and for lare dense matrix-matrix you are always FLOP-limited. I don't expect, i experiment. I don't know why, current results are such that stock ubuntu blas takes about 88 seconds for dense 10k multiplication test (with R which is setup to use it, perhaps they also take long time to convert to blas, but nevertheless it pins cpu 100%). If i compile Vienna with -march=haswell and -ffast-math then i get about 35 seconds. What's purplexing, the same test in bidmat's MatD matrices takes less than 10 seconds on my computer -- and they don't even saturate my cpu 100%. Something is fishy about bidmat. I don't have a super-beafy cpu, only a 6-core/12threads haswell-e. I know that even mkl takes in the area of 16 seconds on 24 threads in xeons, so 88 seconds for openblas on my platform looks plausible. 10 or even 8 seconds (BidMat+supposedly MKL) does not -- something is fishy there. > > > > On the other hand, bidmat (which allegedly uses mkl) does the same test, >> double precision, in under 10 seconds. I can't fathom how, but it does. >> I have a haswell-E platform. >> > > Multiplication of 10k-by-10k matrices amounts to 200 GFLOP of compute in > double precision. A Haswell-E machine provides that within a few seconds, > depending on the number of cores (2.4 GHz * 4 doubles with AVX * 2 for FMA > = 19.2 GFLOP/sec per core. MKL achieves about 15 GFLOP/sec per core). > So this sounds like a validation of the BidMat's results. Interesting. Why R+openblas is so slow then? What is the expected output for ViennaCL + OpenMP then compared to MKL rates? How much of improvement do you observe/expect from a new pull request, is there any hope to get closer to MKL dense dgemm? The primary reason against blas/mkl are that they are yet another platform which, most importantly, we cannot redistribute being an apache2 licensed. So we'd have to ask people to install a particular commercial product, but if ViennaCL would cover our sparse algorithm needs, we'd rather just have it all in one package (or at least leverage hardware/software support in steps). We are very limited in resources, that's why reason we are trying to get working with ViennaCL: -- it has sparse algorithms -- it supports host/OpenCL/cuda with need for new apis/conversions -- it does not require installation of any shared libraries beyond what javacpp already does for us automagically. So we basically can drop a jar with javacpp in it into a spark application and having it running on ViennaCL. Even netlib (blas) or netlib-java api does not make it quite as easy (which btw we cannot redistribute either becaause of their licenses). This is hard to beat, especially if ViennaCL becomes well-rounded in performance in most areas of interest, we don't need to depend on a particular flavor of libblas.so to be present (or any libblas.so for that matter). One more question: is it possible to copy one matrix into an openCL device while solving another? thank you! > > > |
From: Karl R. <ru...@iu...> - 2016-07-14 08:45:23
|
Hi again, > So fast_copy still copies the memory and has copying overhead, even with > MAIN_MEMORY context? Yes. It's a copy() operation, so it just does what the name suggests. > Is there a way to do shallow copying (i.e. just pointer initialization) > to the matrix data buffer? Isn't it what some constructors of matrix or > matrix_base do? Yes, you can pass your pointer via the constructors, e.g. https://github.com/viennacl/viennacl-dev/blob/master/viennacl/matrix.hpp#L721 > What i am getting at, it looks like i am getting a significant overhead > for just copying -- actually, it seems i am getting double overhead -- > once when i prepare padding and all as required by the internal_size?(), > and then i pass it into the fast_copy() which apparently does copying > again, even if we are using host memory matrices. If you want to 'wrap' your data in a ViennaCL matrix, pass the pointer to the constructors. If you want to quickly copy your data over to memory managed by a ViennaCL matrix, use copy() or fast_copy(). From your description it looks like you are now looking for the constructor calls, but from your earlier email I thought that you are looking for a fast_copy(). > all in all, by my estimates this copying back and forth (which is, > granted, is not greatly optimized on our side) takes ~15..17 seconds out > of 60 seconds total when multiplying 10k x 10k dense arguments via > ViennaCL. I also optimize to -march=haswell and use -ffast-math, > without those i seem to fall too far behind what R + openblas can do in > this test. Then, my processing time swells up to 2 minutes without > optimizing for non-compliant arithmetics. 15 seconds of copying for a 10k-by-10k matrix looks way too much. 10k-by-10k is 800 MB of data for double precision, so this should not take much more than 100 ms on a low-range laptop (10 GB/sec memory bandwidth). Even with multiple matrices and copies you should stay in the 1 second regime. > If i can wrap the buffer and avoid copying for MAIN_MEMORY context, i'd > be shaving off another 10% or so of the execution time. Which would make > me happier, as i probably would be able to beat openblas given custom > cpu architecture flags. Why do you expect to beat OpenBLAS? Their kernels are really well optimized, and for lare dense matrix-matrix you are always FLOP-limited. > On the other hand, bidmat (which allegedly uses mkl) does the same test, > double precision, in under 10 seconds. I can't fathom how, but it does. > I have a haswell-E platform. Multiplication of 10k-by-10k matrices amounts to 200 GFLOP of compute in double precision. A Haswell-E machine provides that within a few seconds, depending on the number of cores (2.4 GHz * 4 doubles with AVX * 2 for FMA = 19.2 GFLOP/sec per core. MKL achieves about 15 GFLOP/sec per core). ViennaCL's host-backend is not strong on dense matrix-matrix multiplies (even though we've got some improvements in a pull request), so for this particular operation you will get better performance from MKL, OpenBLAS, or libflame. Best regards, Karli > On Tue, Jul 12, 2016 at 9:27 AM, Karl Rupp <ru...@iu... > <mailto:ru...@iu...>> wrote: > > Hi, > > > One question: you mentioned padding for the `matrix` type. When i > > initialize the `matrix` instance, i only specify dimensions. how > do I > know padding values? > > > if you want to provide your own padded dimensions, consider using > matrix_base directly. If you want to query the padded dimensions, > use internal_size1() and internal_size2() for the internal number of > rows and columns. > > http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix > > Best regards, > Karli > > > > > On Tue, Jul 12, 2016 at 5:53 AM, Karl Rupp > <ru...@iu... <mailto:ru...@iu...> > <mailto:ru...@iu... <mailto:ru...@iu...>>> > wrote: > > Hi Dmitriy, > > On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote: > > Hi, > > I am trying to create some elementary wrappers for VCL > in javacpp. > > Everything goes fine, except i really would rather not > use those > "cpu" > types (std::map, > std::vector) and rather initialize matrices directly by > feeding > row-major or CCS formats. > > I see that matrix () constructor accepts this form of > initialization; > but it really states that > it does "wrapping" for the device memory. > > > Yes, the constructors either create their own memory buffer > (zero-initialized) or wrap an existing buffer. These are > the only > reasonable options. > > > Now, i can create a host matrix() using host memory and > row-major > packing. This works ok it seems. > > However, these are still host instances. Can i copy host > instances to > instances on opencl context? > > > Did you look at viennacl::copy() or viennacl::fast_copy()? > > > That might be one way bypassing unnecessary (in my case) > complexities of > working with std::vector and std::map classes from java > side. > > But it looks like there's no copy() variation that > would accept a > matrix-on-host and matrix-on-opencl arguments (or > rather, it of > course > declares those to be ambiguous since two methods fit). > > > If you want to copy your OpenCL data into a > viennacl::matrix, you > may wrap the memory handle (obtained with .elements()) into > a vector > and copy that. If you have plain host data, use > viennacl::fast_copy() and mind the data layout (padding of > rows/columns!) > > > For compressed_matrix, there seems to be a set() > method, but i guess > this also requires CCS arrays in the device memory if I > use it. Same > question, is there a way to send-and-wrap CCS arrays to an > opencl device > instance of compressed matrix without using std::map? > > > Currently you have to use .set() if you want to bypass > viennacl::copy() and std::map. > > I acknowledge that the C++ type system is a pain when > interfacing > from other languages. We will make this much more convenient in > ViennaCL 2.0. The existing interface in ViennaCL 1.x is too > hard to > fix without breaking lots of user code, so we won't invest > time in > that (contributions welcome, though :-) ) > > Best regards, > Karli > > > > > |
From: Karl R. <ru...@iu...> - 2016-07-14 08:28:53
|
Hi Dmitriy, > To get a little bit more background, we are thinking of enabling Apache > mahout algebra to auto probe for hardware and use ViennaCL-supported > in-memory computations there (not to mention additional solvers are just > great, some of our basic in-memory java-only algebra is just very slow). > > We were thinking, and were having a question, perhaps if we wanted to > support all of the possible backend options, it looks like we would have > to build 3 different jars: one is that loads -l openCL, one that loads > cuda, and one is just all in host memory backend. You might be fine with just 2: host-only and host+OpenCL. You can use OpenCL on NVIDIA GPUs without any performance drop compared to CUDA. host+OpenCL is also much more build-friendly than anything CUDA-related. > The reason we don't seem to be able to create just one jar maven module > to support all backends in one maven artifact is because once we load > the library, it will probably at some point will try to load both opencl > and cuda, whereas in reality we expect environments that at runtime may > have only one (or even none) of those apis configured and supported. > > So what we were thinking, perhaps we need 3 separate artifacts > eventually, one for host+opencl, one for host+cuda, and if all of that > works, then we could also have just host-only module (since cpu > benchmarks seem to be quite good too, so if we get this for free, then > there's no reason not to use it). > > on the downside of this 3 differently compiled modules, it would seem we > won't able to load any two of these modules at the same time since the > symbols are likely clash. Hard to see how much that might be a problem. > > Or, is it possible that we are completely misunderstanding this and it > is possible to compile a single ViennaCL-enabled .so so that it will > load libOpenCL.so if it is available only, and cuda if it is available > only etc. etc. (I don't think so, right now loading .so crashes if there > is no libOpenCL.so in the system but we try to create opencl context). > > Or perhaps it is just possible to probe for support of opencl and cuda > and just avoid creating context with non-supported devices by our logic > and then we can support all options in one module? That option would be > terrific to have. I really don't like having to compile and publish 3 > different maven artifcacts for each case too much. What you are asking for is a dynamic load of backends at runtime, just like you can load all kinds of plugins in your webbrowser dynamically. Currently such a dynamic backend enabling is not supported in ViennaCL (most scientific software does not support that), so you have to work with two or three separate builds of ViennaCL and load the correct one at runtime to avoid symbol clashes. We intend to provide a dynamic backend detection at runtime with ViennaCL 2.0. Although an earlier poll in this mailing list indicated that most users are fine with the status quo, over the last months I've come to the conclusion that such a dynamic backend detection mechanism is necessary to bring ViennaCL to the next level. Applications such as the one you are working on cannot (or do not want to) afford messy recompilations or managing multiple builds of the different libraries they rely on. This is a topic I touched in some of my recent talks, so it's something I'm really serious about :-) Best regards, Karli > > > On Tue, Jul 12, 2016 at 9:27 AM, Karl Rupp <ru...@iu... > <mailto:ru...@iu...>> wrote: > > Hi, > > > One question: you mentioned padding for the `matrix` type. When i > > initialize the `matrix` instance, i only specify dimensions. how > do I > know padding values? > > > if you want to provide your own padded dimensions, consider using > matrix_base directly. If you want to query the padded dimensions, > use internal_size1() and internal_size2() for the internal number of > rows and columns. > > http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix > > Best regards, > Karli > > > > > On Tue, Jul 12, 2016 at 5:53 AM, Karl Rupp > <ru...@iu... <mailto:ru...@iu...> > <mailto:ru...@iu... <mailto:ru...@iu...>>> > wrote: > > Hi Dmitriy, > > On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote: > > Hi, > > I am trying to create some elementary wrappers for VCL > in javacpp. > > Everything goes fine, except i really would rather not > use those > "cpu" > types (std::map, > std::vector) and rather initialize matrices directly by > feeding > row-major or CCS formats. > > I see that matrix () constructor accepts this form of > initialization; > but it really states that > it does "wrapping" for the device memory. > > > Yes, the constructors either create their own memory buffer > (zero-initialized) or wrap an existing buffer. These are > the only > reasonable options. > > > Now, i can create a host matrix() using host memory and > row-major > packing. This works ok it seems. > > However, these are still host instances. Can i copy host > instances to > instances on opencl context? > > > Did you look at viennacl::copy() or viennacl::fast_copy()? > > > That might be one way bypassing unnecessary (in my case) > complexities of > working with std::vector and std::map classes from java > side. > > But it looks like there's no copy() variation that > would accept a > matrix-on-host and matrix-on-opencl arguments (or > rather, it of > course > declares those to be ambiguous since two methods fit). > > > If you want to copy your OpenCL data into a > viennacl::matrix, you > may wrap the memory handle (obtained with .elements()) into > a vector > and copy that. If you have plain host data, use > viennacl::fast_copy() and mind the data layout (padding of > rows/columns!) > > > For compressed_matrix, there seems to be a set() > method, but i guess > this also requires CCS arrays in the device memory if I > use it. Same > question, is there a way to send-and-wrap CCS arrays to an > opencl device > instance of compressed matrix without using std::map? > > > Currently you have to use .set() if you want to bypass > viennacl::copy() and std::map. > > I acknowledge that the C++ type system is a pain when > interfacing > from other languages. We will make this much more convenient in > ViennaCL 2.0. The existing interface in ViennaCL 1.x is too > hard to > fix without breaking lots of user code, so we won't invest > time in > that (contributions welcome, though :-) ) > > Best regards, > Karli > > > > > |