You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
(5) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(8) |
Feb
(15) |
Mar
(8) |
Apr
(13) |
May
(18) |
Jun
(66) |
Jul
(35) |
Aug
(13) |
Sep
(47) |
Oct
(53) |
Nov
(23) |
Dec
(5) |
2012 |
Jan
(3) |
Feb
(13) |
Mar
(25) |
Apr
(22) |
May
(33) |
Jun
(43) |
Jul
(48) |
Aug
(18) |
Sep
(98) |
Oct
(17) |
Nov
(7) |
Dec
(31) |
2013 |
Jan
(102) |
Feb
(33) |
Mar
(50) |
Apr
(36) |
May
(59) |
Jun
(26) |
Jul
(9) |
Aug
(31) |
Sep
(70) |
Oct
(33) |
Nov
(24) |
Dec
(75) |
2014 |
Jan
(106) |
Feb
(22) |
Mar
(68) |
Apr
(62) |
May
(29) |
Jun
(18) |
Jul
(35) |
Aug
(41) |
Sep
(21) |
Oct
(4) |
Nov
(2) |
Dec
(32) |
2015 |
Jan
(12) |
Feb
(17) |
Mar
(34) |
Apr
(35) |
May
(44) |
Jun
(43) |
Jul
(15) |
Aug
(5) |
Sep
(2) |
Oct
(57) |
Nov
(11) |
Dec
(5) |
2016 |
Jan
(12) |
Feb
(6) |
Mar
(86) |
Apr
(13) |
May
(16) |
Jun
(7) |
Jul
(13) |
Aug
(12) |
Sep
(15) |
Oct
(8) |
Nov
(17) |
Dec
(1) |
2017 |
Jan
(25) |
Feb
|
Mar
(50) |
Apr
(17) |
May
(29) |
Jun
(36) |
Jul
(33) |
Aug
(45) |
Sep
(4) |
Oct
(18) |
Nov
(17) |
Dec
(6) |
2018 |
Jan
(8) |
Feb
(5) |
Mar
(20) |
Apr
(6) |
May
(3) |
Jun
(1) |
Jul
(4) |
Aug
|
Sep
(1) |
Oct
(5) |
Nov
(3) |
Dec
(13) |
2019 |
Jan
(6) |
Feb
(2) |
Mar
(21) |
Apr
(3) |
May
(15) |
Jun
(10) |
Jul
(9) |
Aug
(8) |
Sep
(4) |
Oct
(1) |
Nov
(11) |
Dec
(7) |
2020 |
Jan
(12) |
Feb
(5) |
Mar
(8) |
Apr
(3) |
May
(2) |
Jun
(30) |
Jul
(40) |
Aug
(23) |
Sep
(20) |
Oct
(1) |
Nov
|
Dec
(1) |
From: Brad C. <bra...@hp...> - 2020-12-17 01:15:42
|
Hello Chapel Community — This is a final reminder for 2020 that we're in the process of retiring these SourceForge-based Chapel mailing lists in favor of our new Discourse site: https://chapel.discourse.group/ If you are interested in keeping in touch with the Chapel community, please be sure to register there, as these mailing lists will be going away very soon. Since my last message: * We've made the Discourse site publicly readable such that you don't need to register to browse its contents (though you still do to post) * We've posted some instructions on how to use Discourse like a mailing list, which can be used to recreate the experience of these mailing lists for those who aren't attracted to web-based forums: https://chapel.discourse.group/t/welcome-to-the-chapel-programming-language-discourse-page/8 * We've added the ability to sign into the site using your GitHub credentials to avoid having to create a new account from scratch. Best wishes for the end of 2020 and the start of the new year, -Brad |
From: Brad C. <bra...@hp...> - 2020-10-16 02:07:00
|
Hi Chapel Community — This is a second reminder that if you feel like it's been a bit too quiet here recently, remember that we're in the process of replacing the Chapel mailing lists with our new Discourse site: https://chapel.discourse.group/ In particular, steer yourself towards the "Announcements" category (https://chapel.discourse.group/c/announcements) to catch up on recent news like: * Highlights of today's Chapel 1.23.0 release * Chapel being named a 2020 Bossie Award winner * How to watch my keynote on "Compiling Chapel" from PACT'20 last week We hope to see you there! -Brad |
From: Albrecht, B. <ben...@hp...> - 2020-09-28 13:34:26
|
Hi Suyash, I suggest starting on the contributing page: https://chapel-lang.org/contributing.html It should walk you through the entire process of getting started with Chapel to identifying contributions you can make to the project. Thanks, Ben From: Suyash Patil <suy...@gm...> Date: Saturday, September 26, 2020 at 12:35 PM To: "cha...@li..." <cha...@li...> Subject: [Chapel-developers] Regarding Contributions Hi, I am Suyash Patil, a second year undergraduate student, currently pursuing Engineering. I am very eager to start my open source journey with Chapel. I know C, C++ and Git. I want to know what are the projects available and what is required to contribute to them. Regards, Suyash Patil |
From: Suyash P. <suy...@gm...> - 2020-09-26 16:34:33
|
Hi, I am Suyash Patil, a second year undergraduate student, currently pursuing Engineering. I am very eager to start my open source journey with Chapel. I know C, C++ and Git. I want to know what are the projects available and what is required to contribute to them. Regards, Suyash Patil |
From: Brad C. <bra...@hp...> - 2020-09-24 21:55:38
|
Dear Chapel mailing list subscribers — After years of talking about it but failing to act, we've finally started the process of retiring Chapel's SourceForge-hosted mailing lists (the ones where you're receiving this message) in favor of a more modern, ad-free way of supporting discussions within the community via email or the web. Specifically, we've launched a Chapel Discourse site. If you're not familiar with Discourse, it's a web-based technology for discussions that can be used both from a browser or in more of a mailing list mode. Discourse supports 'topics' sorted into 'categories' where you can think of: * topics = an email thread or a discussion thread on a web forum * category = like a mailing list or a tag/folder on a web forum We invite and encourage everyone subscribed here to register and to join us for further discussion about Chapel at: https://chapel.discourse.group/ Once you've registered, I suggest: * taking a look at the 'Categories' tab, which is a good way to get an overview of the site, particularly if you're coming from a mailing list mindset: https://chapel.discourse.group/categories Each top-level category should have a pinned "about this category" post that's intended to describe what it's for and how you can post to it via email, once registered. * Decide which categories you want to follow or mute. * Take a moment to introduce yourself to the community: https://chapel.discourse.group/t/introduce-yourself/ * Send us your questions and feedback in the "site feedback" category: https://chapel.discourse.group/c/site-feedback At some point this fall, we will be disabling the SourceForge mailing lists, but for a time, we'll keep both forums going while people work on converting over. Looking forward to further Discourse with you, -Brad (on behalf of the Chapel team at HPE) |
From: Rohit S. <roh...@gm...> - 2020-09-10 02:06:16
|
Hi Engin, Thanks for replying so quickly! I will visit the gitter channel. Thanks, Rohit. On Wed, Sep 9, 2020 at 9:37 AM Kayraklioglu, Engin <en...@hp...> wrote: > Hi Rohit, > > Thanks for your interest. Note that some of the items in the project idea > list has been taken up by this year's GSoC students: > > > https://summerofcode.withgoogle.com/organizations/4605282207924224/#projects > > For contributing to Chapel, start by reading > > https://chapel-lang.org/contributing.html > > The best way for community interaction is through our Gitter channel: > > https://gitter.im/chapel-lang/chapel > > Engin > > On 9/9/20, 9:32 AM, "Rohit Shinde" <roh...@gm...> wrote: > > Hello everyone, > I have been learning Chapel out of interest for a while now. I was > interested in learning a parallel language. > > While looking for things to play around with in Chapel, I came across > the GSoC page where ideas for projects were listed. > > I was interested in a couple of them, and I was wondering if it would > be a good idea to pick one of them up. The ones I am interested in are: > > 1. Making an iterator library for Chapel. > I am quite familiar with Python's iterators and I think I could build > something equivalent for Chapel. > 2. Developing modules for Chapel's standard library. > 3. String performance improvements. > 4. Web libraries > 5. Implementing a Parser. > I have some experience in this area. Not a whole lot. But it would be > fun to hack on it until I get something working. > > Please let me know what you think. I love Chapel a lot and I would > like to contribute to the community. > > > Thanks, > Rohit. > > |
From: Kayraklioglu, E. <en...@hp...> - 2020-09-09 16:38:01
|
Hi Rohit, Thanks for your interest. Note that some of the items in the project idea list has been taken up by this year's GSoC students: https://summerofcode.withgoogle.com/organizations/4605282207924224/#projects For contributing to Chapel, start by reading https://chapel-lang.org/contributing.html The best way for community interaction is through our Gitter channel: https://gitter.im/chapel-lang/chapel Engin On 9/9/20, 9:32 AM, "Rohit Shinde" <roh...@gm...> wrote: Hello everyone, I have been learning Chapel out of interest for a while now. I was interested in learning a parallel language. While looking for things to play around with in Chapel, I came across the GSoC page where ideas for projects were listed. I was interested in a couple of them, and I was wondering if it would be a good idea to pick one of them up. The ones I am interested in are: 1. Making an iterator library for Chapel. I am quite familiar with Python's iterators and I think I could build something equivalent for Chapel. 2. Developing modules for Chapel's standard library. 3. String performance improvements. 4. Web libraries 5. Implementing a Parser. I have some experience in this area. Not a whole lot. But it would be fun to hack on it until I get something working. Please let me know what you think. I love Chapel a lot and I would like to contribute to the community. Thanks, Rohit. |
From: Rohit S. <roh...@gm...> - 2020-09-09 16:31:58
|
Hello everyone, I have been learning Chapel out of interest for a while now. I was interested in learning a parallel language. While looking for things to play around with in Chapel, I came across the GSoC page where ideas for projects were listed. I was interested in a couple of them, and I was wondering if it would be a good idea to pick one of them up. The ones I am interested in are: 1. Making an iterator library for Chapel. I am quite familiar with Python's iterators and I think I could build something equivalent for Chapel. 2. Developing modules for Chapel's standard library. 3. String performance improvements. 4. Web libraries 5. Implementing a Parser. I have some experience in this area. Not a whole lot. But it would be fun to hack on it until I get something working. Please let me know what you think. I love Chapel a lot and I would like to contribute to the community. Thanks, Rohit. |
From: Brad C. <bra...@hp...> - 2020-09-08 22:20:42
|
Hi Damian — Sorry for the belated response. I believe that what you've written here should be fine performance-wise; specifically, that no temporary will be introduced to capture the RHS '[(r, c) in slab] ...' expression. -Brad On Tue, 1 Sep 2020, Damian McGuckin wrote: > > On another point in the same code, I try and grab several adhacent rows from > the original matrix 'v' and transpose them, and put them into what I call a > slab. It is not a tile like you see in the chapel DGEMM. > > var vslab : [cslice, common] R; > //either > [(r, c) in vslab.domain] vslab[r, c] = v[c, r]; > //or > [j in cslice] vslab[j, common] = v[j, common]; > > where R is a general real(?w). > > Technically vslab is 'const' so I stabbed in the dark and tried > > const slab : domain(2) = (cslice, common); > const vslab = [(r, c) in slab] v[c, r]; > > It seems to run in the same elapsed time, is genuinely 'const', and looks > cleaner. > > Does it create any un-necessary data, i.e. does it create a temporary on the > right before assigning to vslab or does it do it only in cslice*common > real(?w) numbers? > > Thanks - Damian > > Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 > Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here > Views & opinions here are mine and not those of any past or present employer > |
From: Ferguson, M. P. P. (C. Developer)
<mic...@hp...> - 2020-09-08 13:23:01
|
Hi Damian - A tiny bit of follow-up is inline below: That expands my understanding greatly. Your excellent explanation belongs in the documentation somewhere. If you can think of a good place to add it - feel free to make a PR pasting it in to documentation somewhere that makes sense to you, and we can, in the PR review process, get it to something that renders reasonably well & makes sense to me as well. > In your example, a task calling `putStage` could, on a non-x86 system, > end up storing the value of `stage[i]` in a per-core cache somewhere, > (let's say, L1 cache, or a write reorder buffer), and put > off committing it to memory indefinitely. As a result, the parallel > program would have a load imbalance problem since the current value > of this variable isn't being communicated to the other tasks. Understood. If I update a column of a matrix in one core in some parallel task T, I would still hope that if subsequently, i.e. in a serial sense and after the task T has finished, I try and read that same column of the matrix in another core in another parallel task T', I see only the updated column. As I said, I assume that a program spawing 2 tasks, say T1 and T2 updating columns 1 and 2 respectively of a matrix, can be assured on completion of both T1 and T2 that any subsequent tasks will see only the values written by T1 and T2 into the matrix. Yes. The MCM chapter of the spec says this: > Chapel’s fork-join constructs introduce additional order dependencies. Operations within a task cannot behave as though they started before the task started. Similarly, all operations in a task must appear to be completed to a parent task when the parent task joins with that task. https://chapel-lang.org/docs/language/spec/memory-consistency-model.html I view your questions above as rephrasing / corollary to the same idea. Best, -michael |
From: Damian M. <da...@es...> - 2020-09-06 06:42:29
|
Michael, On Fri, 4 Sep 2020, Ferguson, Michael Paul Pratt (Chapel Developer) wrote: > Here I am interpreting "locking" as a software lock that protects a > critical section and causes other tasks trying to enter the critical > section to wait. (Like ` pthread_mutex_lock`). That was my understanding although I thought it was simpler than a pthread_mutex_lock. > Atomics don't use locks on any normal configurations of Chapel. Atomics > are generally pretty fast and are directly supported by the processor > (or possibly the network). > They use special CPU instructions to ensure atomicity. One potential > source of confusion is that on x86 these might involve the `lock` > prefix; but AFAIK that instruction prefix has this name for historical > reasons and might more reasonably be named something else like `atomic`. That expands my understanding greatly. Your excellent explanation belongs in the documentation somewhere. > You can use a relaxed atomic to get pretty much the same effect > as a ``volatile`` in your program and I would expect it to have > similarly low overhead. Ditto. But I need to learn more about all the various types of atomic. And clear documentation on that seems very thin on the ground. Well, it was the last time I looked for a C++ content. > If you do some research on volatile, you might learn that it's not able > to control the way the CPU processor itself optimizes loads/stores > (rather; it only prevents the compiler from doing so). While that might > be OK on x86 it would not function on other platforms like ARM for > example. (x86 uses TSO - Total Store Order - which is a stronger > constraint than other platforms). OK. Stopping the compiler doing naughty things is a good start as I have found when trying to interrogating the Floating Point Control/Status Register. > In your example, a task calling `putStage` could, on a non-x86 system, > end up storing the value of `stage[i]` in a per-core cache somewhere, > (let's say, L1 cache, or a write reorder buffer), and put > off committing it to memory indefinitely. As a result, the parallel > program would have a load imbalance problem since the current value > of this variable isn't being communicated to the other tasks. Understood. If I update a column of a matrix in one core in some parallel task T, I would still hope that if subsequently, i.e. in a serial sense and after the task T has finished, I try and read that same column of the matrix in another core in another parallel task T', I see only the updated column. > Or, worse than that, the write in `putStage` might store something in > memory that is neither the previous value nor the new value but some mix > of the two. (For example, maybe only the low byte is set to the new > value on a platform only able to write 1 byte at a time to memory). It > is hard to see how the algorithm could function correctly in this > situation. Agreed. See my earlier comment. > These cases are part of the reason that processors support atomics - > they allow different tasks/threads/cores to communicate in a > reasonable manner. As I said, I assume that a program spawing 2 tasks, say T1 and T2 updating columns 1 and 2 respectively of a matrix, can be assured on completion of both T1 and T2 that any subsequent tasks will see only the values written by T1 and T2 into the matrix. > But you don't have to just take it from me I believe you. Your knowledge is good enough for me. > The atomic section of the spec is here: > > https://chapel-lang.org/docs/master/language/spec/task-parallelism-and-synchronization.html#atomic-variables I have read that numerous times and it found it less than useful. > > However the information you probably need is here: You are very correct. > https://chapel-lang.org/docs/language/spec/memory-consistency-model.html#relaxed-atomic-operations I had seen/read this - many times. Every time I try to read it, my head hurts! > However the specification does not currently describe acquire, release, > or acqRel orderings. I will add an open issue note so that it is clearer > that this is missing from the spec and not just something one isn't > finding. Great. > I will update the link to refer to the right section (which BTW we > could not do before with the PDF spec). Thanks. > PR #16341 will make the documentation improvements I mentioned here. Perfect. Thanks for the insight - Damian P.S. If you want to know how useful your advice is, read on. For my problem, I will now use atomics. I believe relaxed ordering will do the job based on my (very limited) understanding of the details of memory consistency. This is for a Jacobi SVD. My algorithm, now using atomics, with a matrix of order 36, i.e. a 36x36 real(w)'s. on my 6-core machine, would have 35 tasks created by a forall over 1..N-1. On average, almost 6 tasks going concurrently all the time. Each such task would on average, process N/2 columns of the matrix one after the other. You can alternatively reorganize the algorithm into M (sequential) steps. Each one of those M steps, say the I'th, would involve multiple Jacobi rotations on a full column pair within the matrix. Each such column pair operation can be run in parallel as each I'th step has only operations which are independent of each other. No need for atomics. There are 70 such groups for the 36x36 matrix, requiring a total of 630 tasks where there are on average between 1 and 18 tasks per group. Obviouls you can merge some of the column pairs to reduce the parallelization load. But the code to do this seriously obscures the logic of the underlying algorithm and hence the program's readability, and makes the code parallel-centric which goes against all of our programming KPIs. There are several other reordering algorithms but I do not understand them well enough to program them in Chapel nor can I find a remotely readable parallel implementation of them in any reference in any language. Apart from the readability concerns, it is the nominal 600+ tasks for the algorithm which avoids atomics as opposed to 36 tasks for one which needs atomics. I will run with the code which uses 'atomics'. There are also only 6 extra LOC (lines of code within the algorithm) over the serial version and the algorithm details are unchanged. The discepancy just gets worse for even larger matrices. Cache misses do occur in both approaches because they need to process a column-major matrix by columns. Unavoidable sadly. For reasonably sized matrices, the Jacobi algorithm is arguably now the algorithm of preference for SVD. It is far more accurate than a 1970s Golub+Reinsch (Householder reduction-based) SVD so as that which I have provided as a reference point for the GSoC project. On the down side, it has more floating point computations than the Golub+Reinsch approach. But on the plus side from a parallelization perspective, the Jacobi algorithm (with atomics as I describe) will I think have a much reduced parallelism overhead compared to my Golub+Reinsch code. Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |
From: Ferguson, M. P. P. (C. Developer)
<mic...@hp...> - 2020-09-04 13:21:45
|
Hi Damian - (Just responding to the 2nd part of your mail). See reply inline below. So, to avoid treading on the heals of its predecessor, Task#I+1 is continually reading/polling stage[I] which it must read it from memory to ensure it knows what Task#I is doing. Again no locking is required. This is your garden variety C volatile variable. How do I handle it in Chapel? An atomic variable seems like massive overhill as I assume locking is involved with an atomic. I can access stage[??] through a pair of custom (tiny) external C routines void putStage(long *stage, long i, long k) { stage[i] = k; } and long getStage(long *stage, long i) { return stage[i]; } But that seems like I am sticking my head in the sand and avoiding the underlying problem. And it probably cripples any optimization being done in the code which is calling these routines. I still contend that Chapel needs a 'vol[atile]' declaration concept or something like it. The identifier so declared is much like a 'var[iable]' but it must always be accessed through, i.e. written-to/read-from, memory. But I do not know enough of Chapel's big picture so I could be talking through my hat! I wish I had known enough of Chapel to be a productive participant in the conversation in 2012 when volatile got removed from Chapel 1.15.0. Then again, maybe I am talking about peek/poke for atomics which appears to have lesser overhead than other types of memory ordering. But again, I have no idea what the overhead for these really is but it all seems way too high for what I want. > An atomic variable seems like massive overhill as I assume > locking is involved with an atomic Here I am interpreting "locking" as a software lock that protects a critical section and causes other tasks trying to enter the critical section to wait. (Like ` pthread_mutex_lock`). Atomics don't use locks on any normal configurations of Chapel. Atomics are generally pretty fast and are directly supported by the processor (or possibly the network). They use special CPU instructions to ensure atomicity. One potential source of confusion is that on x86 these might involve the `lock` prefix; but AFAIK that instruction prefix has this name for historical reasons and might more reasonably be named something else like `atomic`. You can use a relaxed atomic to get pretty much the same effect as a ``volatile`` in your program and I would expect it to have similarly low overhead. If you do some research on volatile, you might learn that it's not able to control the way the CPU processor itself optimizes loads/stores (rather; it only prevents the compiler from doing so). While that might be OK on x86 it would not function on other platforms like ARM for example. (x86 uses TSO - Total Store Order - which is a stronger constraint than other platforms). In your example, a task calling `putStage` could, on a non-x86 system, end up storing the value of `stage[i]` in a per-core cache somewhere, (let's say, L1 cache, or a write reorder buffer), and put off committing it to memory indefinitely. As a result, the parallel program would have a load imbalance problem since the current value of this variable isn't being communicated to the other tasks. Or, worse than that, the write in `putStage` might store something in memory that is neither the previous value nor the new value but some mix of the two. (For example, maybe only the low byte is set to the new value on a platform only able to write 1 byte at a time to memory). It is hard to see how the algorithm could function correctly in this situation. These cases are part of the reason that processors support atomics - they allow different tasks/threads/cores to communicate in a reasonable manner. But you don't have to just take it from me - the linux kernel developers also frown upon using volatile; see https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html By the way, the online Chapel documentation on atomics does not appear to explain (or link to an explanation of) memory ordering types. In an older PDF document, it refers to C11 which then refers to the C++ definition and other documentation. The atomic section of the spec is here: https://chapel-lang.org/docs/master/language/spec/task-parallelism-and-synchronization.html#atomic-variables However the information you probably need is here: https://chapel-lang.org/docs/language/spec/memory-consistency-model.html#relaxed-atomic-operations However the specification does not currently describe acquire, release, or acqRel orderings. I will add an open issue note so that it is clearer that this is missing from the spec and not just something one isn't finding. If you click on the Chapel Language Specification you land on a page which has no reference whatsoever about atomics. If I actually had any solid grasp of the subject, I would offer to rewrite it, but I do not, which is why I am reading about it in the first place. I find that by the time I have gone through all the links to links, I have long forgotten what my precise original Chapel problem was. Just my 2c. Might be worth putting onto the to-do list. I will update the link to refer to the right section (which BTW we could not do before with the PDF spec). PR #16341 will make the documentation improvements I mentioned here. Best, -michael |
From: Damian M. <da...@es...> - 2020-09-04 05:52:33
|
Hi Ben, Michael, I am guessing that the stuff of most interest to yourself is in the latter half of this email but the rest is background. Hopefully I am not rambling too much. On Wed, 2 Sep 2020, Albrecht, Ben wrote: > For example, say task 0 processes column pairs (1..N, 2) serially. After > completing (2,2), task 0 create a new parallel task (task 1) to process > the pairs (2..N, 3). Not quite. Sorry for my poor explanation. Let's assume that Task #0 is the controlling process. It starts: * Task #1 fto processes column PAIRS (row#1, columns#1..N) serially After Task#1 has completed (1,1), it starts * Task #2 with qualifications: - Task#2 is given responsibility for (row#2, columns#1..N) but - initially you want Task#2 to only process (2,1), i.e. column#1 -- because Task#1 is by now busy processing column#2 and we -- do not want Task#2 and come along and mess with that work - Task#2 must check with Task#1 before proceeding to the next column, i.e. -- it can only process column (2,2) if the parent has finished with -- (1,2) it, and it can only process (2,3) if the parent has finished -- with (1,3) and so on until it gets to (2, N). After Task#2 has completed (2,1) it starts * Task #2 with qualifications: - Task#2 is given responsibility for (row#2, columns#1..N) but ... same stuff as above ... To give each spawned task (that is currently processing row I) the job of spawning Task#I+1 for row I+1 seems less than optimal. But maybe I am missing something. Also, Task#I+1 continually needs to check with Task#I whether it can step to the next column, i.e. to ensure that its creator (i.e. Task#I) has completed its own processing of the next column that Task I+1 now wants to process. So, I (maybe wrongly) let Task#0 handle all of the spawning, which it hopefully does in such a way as to be optimal for the underlying system (or locales of systems) using a 'forall'. Additionally, the overhead I saw in my last use of a 'cobegin' was so high that I now stay away from it. It was awful. I have yet to experiment seriously with 'coforall'. I very rarely see the need for a begin. > In the task-parallel representation, you are creating the tasks as > needed, rather than spinning them all up at once and having them wait > for work. Even if I start up the tasks as needed, each new task still needs to communicate with its creater to know if it is safe to proceed to the next column. So I cannot see what your approach buys but I am always happy to learn. I think that I did not clearly explain that each task needs to keep monitoring what its creator is doing. > However, this may require some effort to reach desired performance. For > example, a naive implementation of this representation would create N > tasks, rather than creating a number tasks appropriate for your hardware > like the forall approach does. I like the fact that forall creates tasks appropriate to the hardware. > See https://chapel-lang.org/docs/master/primers/taskParallel.html and > https://chapel-lang.org/docs/master/users-guide/index.html#task-parallelism > for more background on task parallelism in Chapel Thanks for that although I found that this did not explain much. But that is probably because I am coming from such a beginners-level base. Also, somebody needs to add something about the overhead in each approach (which is way beyond my knowledge currently). This next two paragraphs needs some insight/input from others as they are related to some past discussions. In my algorithm, Task#I must keep track of where it is up, so let's define an array of int's, stage[0..N] : int where stage[I] reflects the current column that task#I has completed. Each element of that array, i.e. stage[I] is updated by only Task#I so there is no need for the overhead that an atomic variable would involve. Task#I+1 must read stage[I] as it steps through the columns to make sure it does not try to update column 'K' before Task#I has finished with it. Task#I+1 should run slightly behind Task#I because it starts later. In practice. a a thread started after another thread may not always run behind a thread started before it. So, to avoid treading on the heals of its predecessor, Task#I+1 is continually reading/polling stage[I] which it must read it from memory to ensure it knows what Task#I is doing. Again no locking is required. This is your garden variety C volatile variable. How do I handle it in Chapel? An atomic variable seems like massive overhill as I assume locking is involved with an atomic. I can access stage[??] through a pair of custom (tiny) external C routines void putStage(long *stage, long i, long k) { stage[i] = k; } and long getStage(long *stage, long i) { return stage[i]; } But that seems like I am sticking my head in the sand and avoiding the underlying problem. And it probably cripples any optimization being done in the code which is calling these routines. I still contend that Chapel needs a 'vol[atile]' declaration concept or something like it. The identifier so declared is much like a 'var[iable]' but it must always be accessed through, i.e. written-to/read-from, memory. But I do not know enough of Chapel's big picture so I could be talking through my hat! I wish I had known enough of Chapel to be a productive participant in the conversation in 2012 when volatile got removed from Chapel 1.15.0. Then again, maybe I am talking about peek/poke for atomics which appears to have lesser overhead than other types of memory ordering. But again, I have no idea what the overhead for these really is but it all seems way too high for what I want. By the way, the online Chapel documentation on atomics does not appear to explain (or link to an explanation of) memory ordering types. In an older PDF document, it refers to C11 which then refers to the C++ definition and other documentation. Even in the current primers https://chapel-lang.org/docs/primers/atomics.html?highlight=atomic it says For more information on Chapels atomics, see the Chapel Language Specification. If you click on the Chapel Language Specification you land on a page which has no reference whatsoever about atomics. If I actually had any solid grasp of the subject, I would offer to rewrite it, but I do not, which is why I am reading about it in the first place. I find that by the time I have gone through all the links to links, I have long forgotten what my precise original Chapel problem was. Just my 2c. Might be worth putting onto the to-do list. Thanks - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |
From: Albrecht, B. <ben...@hp...> - 2020-09-02 18:31:56
|
(resending to the mailing list) On 9/2/20, 2:28 PM, "Albrecht, Ben" <ben...@hp...> wrote: Hi Damian, I don’t have much to add on improving this algorithm implementation. At a high level, the load imbalance and data dependence across parallel iterations makes me wonder if a lower-level recursive task-parallel implementation would be more suitable for this algorithm. For example, say task 0 processes column pairs (1..N, 2) serially. After completing (2,2), task 0 create a new parallel task (task 1) to process the pairs (2..N, 3). After task 1 processes (3,3), it creates a new parallel task to process (3..N, 4), and so on. In the task-parallel representation, you are creating the tasks as needed, rather than spinning them all up at once and having them wait for work. However, this may require some effort to reach desired performance. For example, a naive implementation of this representation would create N tasks, rather than creating a number tasks appropriate for your hardware like the forall approach does. See https://chapel-lang.org/docs/master/primers/taskParallel.html and https://chapel-lang.org/docs/master/users-guide/index.html#task-parallelism for more background on task parallelism in Chapel Hope that helps. Thanks, Ben On 8/29/20, 12:10 AM, "Damian McGuckin" <da...@es...> wrote: Hi, There exists a need to process the N columns of an array 'U' as for j in 1..N do { // treat column j as the anchor column for k in j+1 .. N do // Stage 'j' processing { // mess with operations on column 'j' and 'k' } } The inner loops can be treated as independent of each other subject to a constraint, i.e. there is a need to guarantee (or somehow enforce) that when stage 'j' wants to process column 'k', it knows (or can check) that processing of that same column 'k' by stage(s) prior, i.e. 'j-1', is completed (or can twiddle its thumbs waiting for that to happen before proceeding). That last statement is obviously recursive although it does not need to be programmed as such. One can program this (maybe poorly) as var stage : [0..N] atomic int; stage[0].write(N); // define that the 'ghost' precursor stage // has completed processing of all columns forall j in 1..N-1 do { // define that this stage has completed NO columns stage[j].write(0); // this routine must update stage[j] with 'k' when // it has finished its own processing of column 'k' processStage(j, U, ......); } and then update stage[j] within processStage(j, U, ......) to reflect the column 'last processed', say 'k', in that stage. This then allows the next stage processStage(j+1, U, ......) to inspect that variable, i.e. stage[j], during its own operation to ensure that it does not attempt to use any column 'i' where 'i > k'. This approach involves waiting which is a big no-no and demands that any distributed implementation update what amounts to a variable in the primary locale. This is less than ideal although the overhead is yet to be quantified. There is an upside. Because one would probably process columns in blocks, say of 4 or 8 (or at a pinch 16), the apparent need to test the atomic variable every columns drops by that same blocking factor. So, while the waiting is not so critical (even if it is detrimental to the algorithm's readability), the atomic variables in the primary locale are still a worry. Are there better ways to attack this problem? And yes, if looking at parallel Jacobi SVD sweeps, there are algorithms that try and parallelize that logic very differently. But they do/did not have the benefit of Chapel at their disposal at the time there were being developed. And besides, they are quite complex, do even naughtier things to the readability of the algorithm. Avoiding them, would really, really, be a desirable thing. Thanks - Damian _______________________________________________ Chapel-developers mailing list Cha...@li... https://lists.sourceforge.net/lists/listinfo/chapel-developers |
From: Ferguson, M. P. P. (C. Developer)
<mic...@hp...> - 2020-09-02 15:45:27
|
Hi Damian - We've removed the PDF spec since we wanted to more easily link between spec sections and other documentation. The spec is here: https://chapel-lang.org/docs/language/spec/index.html Right now there is 1 webpage per chapter but we could consider making an all-in-one-page view of the spec and/or a pre-rendered PDF version if that was useful. Printing one of the chapters to PDF creates something relatively manageable on my system. -michael Where does this live these days please? I find perusing a PDF file more useful sometimes than messing around within a browser, especially when my mouse hand is busy holding a glass or a cup of refreshment. Regards - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer _______________________________________________ Chapel-developers mailing list Cha...@li... https://lists.sourceforge.net/lists/listinfo/chapel-developers |
From: Damian M. <da...@es...> - 2020-09-02 03:53:27
|
Where does this live these days please? I find perusing a PDF file more useful sometimes than messing around within a browser, especially when my mouse hand is busy holding a glass or a cup of refreshment. Regards - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |
From: Damian M. <da...@es...> - 2020-09-01 08:24:52
|
On Mon, 31 Aug 2020, Brad Chamberlain wrote: > What link did you use for the .zip? I just used the "Source code (zip)" > link from: > > https://github.com/chapel-lang/chapel/releases/tag/1.22.1 > > and it seemed to unpack fine for me. The problem is not the 1.22.1 release. That does not have Rahul's stuff in it. I went to the master repository and clicked 'Code' and it gave me chapel-master.zip While it listed the table of contents cleanly, it could not extract things because it had problems with file names which were too long. I had to update unzip from 6.0.1 on my system. I now have 6.0.5 and it can extract chapel-master.zip cleanly. Regards - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |
From: Damian M. <da...@es...> - 2020-09-01 07:12:16
|
Brad, As often happens, you did not solve my problem but gave me enough extra insight to allow me to solve my own problems. On Mon, 31 Aug 2020, Brad Chamberlain wrote: > Original: > >> forall r in rows do >> { >> const ref ur = u[r, ..]; >> >> for j in cslice do >> { >> t[r, j] = vmDot(common, ur, vslab[j, ..]); >> } >> } >> } } > > > Forall expr: > >> forall r in rows do >> { >> const ref ur = u[r, ..]; >> const x = [j in cslice] vmDot(common, ur, vslab[j, ..]); >> >> t[r, cslice] = x; >> } > A way to check would be to write the initialization of 'x' as: > >> const x = for j in cslice do vmDot(common, ur, vslab[j, ..]); > > If this returned the lost 5%, I think that's the answer. Sadly no. And I should correct myself, it is 6%. Both your suggestion and my attempt labelled as just Forall above take about 8.5 seconds for my GEMM of 4000*4000 on my old 6-core E5-1660. On the other hand, the attempt labelled Original which has no intermediate copy takes 8seconds, or say 7.96, or 8.04, or ... The problem turns out to be the temporary which should have been obvious to me. Avoiding the temporary altogether with t[r, cslice] = for j in cslice do vmDot(common, ur, vslab[j, ..]) which I will call the Single Statement approach, and that 6% is recovered. Interestingly, the Original approach for j in cslice do { t[r, j] = vmDot(common, ur, vslab[j, ..]); } takes the same time as the Single Statement approach so I will stick with it. Thanks for the insight - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |
From: Damian M. <da...@es...> - 2020-09-01 06:37:42
|
On another point in the same code, I try and grab several adhacent rows from the original matrix 'v' and transpose them, and put them into what I call a slab. It is not a tile like you see in the chapel DGEMM. var vslab : [cslice, common] R; //either [(r, c) in vslab.domain] vslab[r, c] = v[c, r]; //or [j in cslice] vslab[j, common] = v[j, common]; where R is a general real(?w). Technically vslab is 'const' so I stabbed in the dark and tried const slab : domain(2) = (cslice, common); const vslab = [(r, c) in slab] v[c, r]; It seems to run in the same elapsed time, is genuinely 'const', and looks cleaner. Does it create any un-necessary data, i.e. does it create a temporary on the right before assigning to vslab or does it do it only in cslice*common real(?w) numbers? Thanks - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |
From: Brad C. <bra...@hp...> - 2020-09-01 02:01:00
|
Hi Damian — What link did you use for the .zip? I just used the "Source code (zip)" link from: https://github.com/chapel-lang/chapel/releases/tag/1.22.1 and it seemed to unpack fine for me. -Brad On Mon, 31 Aug 2020, Damian McGuckin wrote: > On Mon, 31 Aug 2020, Damian McGuckin wrote: > >> On Sat, 29 Aug 2020, Damian McGuckin wrote: >> >>> Can you remind me how to grab a copy of master? >> >> Don't worry. I dragged the information from the deep reaches of my brain. > > I downloaded the '.zip' file from Github. > > Unzipped the file but something is wrong with it. It is corrupt as it tried > to create files with names which are actually Chapel programs. > > Has anybody seem this? > > Tried it twice. > > Regards - Damian > > Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 > Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here > Views & opinions here are mine and not those of any past or present employer > > > _______________________________________________ > Chapel-developers mailing list > Cha...@li... > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_chapel-2Ddevelopers&d=DwICAg&c=C5b8zRQO1miGmBeVZ2LFWg&r=QUQW-BniEL_d2a7btR4rP5TPiNmpm1pG-Qa_xXzGVKc&m=XPazNXDKey7Mz6PSh8ttSGgFYERvjuk0QvNsQUEq9gY&s=1Hy6yKp5jCCLIMPg2llU_o8mO-SKj2OzSBrB2vpnxKQ&e= |
From: Brad C. <bra...@hp...> - 2020-09-01 01:56:27
|
Hi Damian — Taking your three versions: Original: > forall r in rows do > { > const ref ur = u[r, ..]; > > for j in cslice do > { > t[r, j] = vmDot(common, ur, vslab[j, ..]); > } > } > } } Forall expr: > forall r in rows do > { > const ref ur = u[r, ..]; > const x = [j in cslice] vmDot(common, ur, vslab[j, ..]); > > t[r, cslice] = x; > } > Succinct: > forall r in rows do > { > const x = [j in cslice] vmDot(common, u[r, ..], vslab[j, ..]); > > t[r, cslice] = x; > } > > slows down seriously, about 25+%. I believe the difference between the final two is a simple case of Chapel not doing loop hoisting optimizations for non-trivial expressions. Specifically, you and I can see that `u[r, ..]` is independent of the value of 'j' so could be evaluated once and re-used for all iterations of the 'j' loop, but the Chapel compiler isn't mature enough to do this yet. So your "Forall expr" version gets an improvement by manually hoisting the evaluation of that expression out of the loop. The delta between the original and forall expression version is less obvious, but I would guess that it could be due to the use of nested parallelism (though we'd hope that the impact would be more minimal than 5%, at least for loops with large trip counts). Specifically, by default, '[j in cslice]' will be executed in parallel, but it'll first check to see whether there's already a task per core, and if so, will serialize the loop. Maybe this execution-time check is adding the 5% overhead? A way to check would be to write the initialization of 'x' as: > const x = for j in cslice do vmDot(common, ur, vslab[j, ..]); If this returned the lost 5%, I think that's the answer. -Brad |
From: Damian M. <da...@es...> - 2020-09-01 01:12:15
|
On Mon, 31 Aug 2020, Brad Chamberlain wrote: > That error messages suggests to me that you're not compiling with > version 1.22.0. Could you run `chpl --version` in that workspace to > verify? Oops. Senior's moment. I changed our system last week to have the chapel compiler as part of everybody's environment. And made a typo. Isn't it nice when your mistakes are there for the whole world to see! Thanks - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |
From: Brad C. <bra...@hp...> - 2020-08-31 20:11:11
|
Hi Damian — That error messages suggests to me that you're not compiling with version 1.22.0. Could you run `chpl --version` in that workspace to verify? -Brad On Mon, 31 Aug 2020, Damian McGuckin wrote: > > I compiled it with 1.22.0 and I get: > > $CHPL_HOME/modules/internal/DefaultRectangular.chpl:582: In function > 'dsiDim': > $CHPL_HOME/modules/internal/DefaultRectangular.chpl:583: error: tuple index 0 > is out of bounds > $CHPL_HOME/modules/internal/DefaultRectangular.chpl:583: note: tuple elements > start at index 1 > $CHPL_HOME/modules/internal/ChapelArray.chpl:1392: Function 'dsiDim' > instantiated as: dsiDim(this: borrowed domain(2,int(64),false), param d = 0) > > Sure, the Matrix is indexed from 0 but I can see nothing in the code that > should stop it compiling with 1.22.0. Besides, 1.22.0 is the release with > tuples which begin with 0. > > Regards - Damian > > Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 > Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here > Views & opinions here are mine and not those of any past or present employer > > > _______________________________________________ > Chapel-developers mailing list > Cha...@li... > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_chapel-2Ddevelopers&d=DwICAg&c=C5b8zRQO1miGmBeVZ2LFWg&r=QUQW-BniEL_d2a7btR4rP5TPiNmpm1pG-Qa_xXzGVKc&m=Ll-QKque2gISU-_vKw5rkeEELxW0xOLi9j4_ZUkAcvk&s=8LmuEL7DVmDDmsa2VW8l0NzNUBURXR0wqhu3MNakchM&e= |
From: Rahul G. <u61...@an...> - 2020-08-31 19:29:00
|
Hi Damian, I just did a fresh install from the zip file I downloaded from GitHub and it built without any issues. Regards, Rahul -- Rahul Ghangas Advanced Computing (R&d) (Honours) The Australian National University Ph- +61 0435040074 Email - rah...@an... , rah...@gm... > On Sep 1, 2020, at 5:01 AM, Damian McGuckin <da...@es...> wrote: > > On Mon, 31 Aug 2020, Damian McGuckin wrote: > >> On Sat, 29 Aug 2020, Damian McGuckin wrote: >> >>> Can you remind me how to grab a copy of master? >> >> Don't worry. I dragged the information from the deep reaches of my brain. > > I downloaded the '.zip' file from Github. > > Unzipped the file but something is wrong with it. It is corrupt as it tried to create files with names which are actually Chapel programs. > > Has anybody seem this? > > Tried it twice. > > Regards - Damian > > Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 > Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here > Views & opinions here are mine and not those of any past or present employer |
From: Damian M. <da...@es...> - 2020-08-31 19:01:33
|
On Mon, 31 Aug 2020, Damian McGuckin wrote: > On Sat, 29 Aug 2020, Damian McGuckin wrote: > >> Can you remind me how to grab a copy of master? > > Don't worry. I dragged the information from the deep reaches of my brain. I downloaded the '.zip' file from Github. Unzipped the file but something is wrong with it. It is corrupt as it tried to create files with names which are actually Chapel programs. Has anybody seem this? Tried it twice. Regards - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer |