Thread: RE: [SSI-devel] Re: get shared memory and user space talking
Brought to you by:
brucewalker,
rogertsang
From: Walker, B. J <bru...@hp...> - 2003-10-21 21:25:29
|
Frank, Stevie: This looks like it started as a private exchange and you didn't include stevie's original message, so responding is a little difficult. Anyway, I have had some communication with Asmita (of the geek girls - love the name). The migshm extension to openmosix is providing 2 things. The first is a way to migrate when you have a shared memory segment attached. OpenSSI already has that and in fact does so in a more coherent manner than is possible with migshm (see the comment below about the use of semaphores). The second feature migshm has is that it will re-join clones that have both migrated to the same node (so they share their process address space again). What I was hoping they had was a way to actually coherently have the clones on different nodes. I'm pretty sure that is not provided. I believe we could provide the memory re-join capability pretty trivially in the current code base (I'll check with John Byrne on this). The concern we had about migrating clones was making sure they all migrated at the same time and to the same node. I don't know if openMosix does anything to guarantee that. If you don't guarantee it, the process data space can become incoherent and the processes are in trouble. John was telling me that current base code now has a primitive to rendevous all the threads so migrating the thread group together is more feasible. We are certainly interested in migrating multi-threaded processes (clones) so stay tuned. Bruce ************************************************************************ ******************* Some notes from Asmita about migshm: =20 --- Its a bit different. Atleast one node (the owner node of the sared=20 memory segment) has the latest synched entire copy. All others nodes, either=20 have the latest copy or dont have the copy at all, and a process accessign a=20 shared memory page there would result in a page fault which we route to the=20 owner node so that it gets the latest copy. So, even we have a single point=20 of failure for the shared memory segment. --- Well, openMosix creates a new memory map for a migrated process on the=20 new node. we make sure that when two processes attached to the same shared=20 memory section (or when two clones) get migrate dto the same node, they=20 share the same physical pages of the shm segment (or the memory map in case=20 of clones) on the new node just as they would had they not been migrated at=20 all. About the coherency, migShm assumes that processes would use semaphores to=20 synchronize accesses to the shared memory segment. At the time of release of=20 the semaphore, we sync up the changes to teh owner node copy (we send only=20 the dirty pages) and invalidate pte's of rest of the nodes. So when a=20 process running on a node which in neither the owner node nor that of the=20 last writer accesses the page, it page faults. We route this page fault to=20 the owner node and get the latest page from there. You can get more details about this in=20 http://www.mcaserta.com/maask/Migshm_Report.pdf. Regards, Asmita ************************************************************************ ************ > From: Frank Mayhar [mailto:fr...@ex...]=20 > Sent: Monday, October 20, 2003 7:51 PM > To: stevie mckibbin > Cc: ssi...@li... > Subject: [SSI-devel] Re: get shared memory and user space talking >=20 >=20 > Hi, I'm not ignoring you, I've just been a bit busy with=20 > other things (mostly > trying to find a job). >=20 > The place you want to start is the SysV Shared Memory IPC=20 > handling. The > underlying implementation is exactly what you want. Basically, if my > (limited) understanding is correct, there's a vnode (at least one) for > the shared memory segment. This vnode is handled by CFS; there's sort > of a file system that describes that kind of shared memory,=20 > or at least > CFS is stacked on top of the SHM implementation. Look at the file > openssi/kernel/cluster/ssi/cfs/cfs_ipcshm.c for some clues. I've CC'd > this to the devel list, so Dave can explain more fully if he wants... >=20 > The real question I have is, what is backing a process's=20 > virtual address > space? The executable itself is backed by the disk image of=20 > the program; > that's static and isn't really interesting. The anonymous=20 > pages are what > have to be shared, stuff like the stack, data pages and=20 > malloc'ed memory. > To do this the way SHM does it, we would need to back anonymous pages > with another file system. Kind of like distributed swap was=20 > in Unixware, > I guess, although I never really dug into that. >=20 > Alternatively, one could expose the token interfaces within=20 > CFS so that > you could use them directly. I suspect you would get into=20 > some serious > wheel-reinvention doing that, though. >=20 > I would be interested to see any feedback you get from the=20 > "geek girls." > --=20 > Frank Mayhar fr...@ex... http://www.exit.com/ > Exit Consulting http://www.gpsclock.com/ > http://www.exit.com/blog/frank/ >=20 >=20 > ------------------------------------------------------- > This SF.net email is sponsored by OSDN developer relations > Here's your chance to show off your extensive product knowledge > We want to know what you know. Tell us and you have a chance=20 > to win $100 > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel >=20 |
From: Walker, B. J <bru...@hp...> - 2003-10-27 07:42:26
|
Thanks for clearing that up. Do you have any application examples that are using the migshm capability to allow clones to run on different nodes? =20 Bruce > -----Original Message----- > From: MAASK Group [mailto:maa...@ya...]=20 > Sent: Sunday, October 26, 2003 9:30 PM > To: Walker, Bruce J > Cc: ste...@ho...; ssi...@li... > Subject: RE: [SSI-devel] Re: get shared memory and user space talking >=20 >=20 > Hi, > Stevie, Sorry for this delayed reply ... was on leave > for a few days. >=20 > --- "Walker, Bruce J" <bru...@hp...> wrote: >=20 > > The second feature migshm has is that it will > > re-join clones that have > > both migrated to the same node (so they share their > > process address > > space again). What I was hoping they had was a way > > to actually > > coherently have the clones on different nodes. I'm > > pretty sure that is > > not provided. =20 > --- That is wrong Bruce. This is in fact provided in > migShm, provided that multi-threaded programs also use > semaphores for synchronization.=20 >=20 > This constraint(for shared memory processes as well as > clones) come in the picture because we need some event > for handling consistency. Any calls in the kernel can > be used for this. We decided upon semaphores as that > came to our mind first when we talk of shared memory. > As accesses to shared memory(or cloned memory map) > cannot be detected, we do not know when to sync up all > the copies ... so we had to put this constraint.=20 >=20 > >I believe we could provide the memory > > re-join capability > > pretty trivially in the current code base (I'll > > check with John Byrne > > on this). The concern we had about migrating clones > > was making sure > > they all migrated at the same time and to the same > > node. I don't know > > if openMosix does anything to guarantee that. =20 > --- Once again, this is not needed as migShm can have > two clones running on different nodes with consistency > handled as the time of release of the semaphore. >=20 > Regards, > Asmita >=20 > __________________________________ > Do you Yahoo!? > Exclusive Video Premiere - Britney Spears > http://launch.yahoo.com/promos/britneyspears/ >=20 |
From: MAASK G. <maa...@ya...> - 2003-10-27 11:15:56
|
Unfortunately no. We could not find any applicatiosn which uses only clone() (and not pthreads). We had written a small multi-threaded program which would calculate primes numbers in a given huge range using clone() dividing the range into smaller ones and each thread calculating promies in the smaller range given to it. This app migrated well and gave good performance too. Regards, Asmita --- "Walker, Bruce J" <bru...@hp...> wrote: > Thanks for clearing that up. Do you have any > application examples that > are using the migshm capability to allow clones to > run on different > nodes? > > Bruce > > > > -----Original Message----- > > From: MAASK Group [mailto:maa...@ya...] > > Sent: Sunday, October 26, 2003 9:30 PM > > To: Walker, Bruce J > > Cc: ste...@ho...; > ssi...@li... > > Subject: RE: [SSI-devel] Re: get shared memory and > user space talking > > > > > > Hi, > > Stevie, Sorry for this delayed reply ... was on > leave > > for a few days. > > > > --- "Walker, Bruce J" <bru...@hp...> wrote: > > > > > The second feature migshm has is that it will > > > re-join clones that have > > > both migrated to the same node (so they share > their > > > process address > > > space again). What I was hoping they had was a > way > > > to actually > > > coherently have the clones on different nodes. > I'm > > > pretty sure that is > > > not provided. > > --- That is wrong Bruce. This is in fact provided > in > > migShm, provided that multi-threaded programs also > use > > semaphores for synchronization. > > > > This constraint(for shared memory processes as > well as > > clones) come in the picture because we need some > event > > for handling consistency. Any calls in the kernel > can > > be used for this. We decided upon semaphores as > that > > came to our mind first when we talk of shared > memory. > > As accesses to shared memory(or cloned memory map) > > cannot be detected, we do not know when to sync up > all > > the copies ... so we had to put this constraint. > > > > >I believe we could provide the memory > > > re-join capability > > > pretty trivially in the current code base (I'll > > > check with John Byrne > > > on this). The concern we had about migrating > clones > > > was making sure > > > they all migrated at the same time and to the > same > > > node. I don't know > > > if openMosix does anything to guarantee that. > > --- Once again, this is not needed as migShm can > have > > two clones running on different nodes with > consistency > > handled as the time of release of the semaphore. > > > > Regards, > > Asmita > > > > __________________________________ > > Do you Yahoo!? > > Exclusive Video Premiere - Britney Spears > > http://launch.yahoo.com/promos/britneyspears/ > > __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ |
From: Chirag K. <chi...@hp...> - 2003-10-27 11:35:55
|
On Mon, Oct 27, 2003 at 03:07:19AM -0800, MAASK Group wrote: | which uses only clone() (and not pthreads). We had | written a small multi-threaded program which would | calculate primes numbers in a given huge range using | clone() dividing the range into smaller ones and each | thread calculating promies in the smaller range given | to it. | This app migrated well and gave good performance too. <snip> Could I get a look at the program? Regards, -- Chirag Kantharia, Hewlett-Packard India Software Operations Bangalore, India. |
From: MAASK G. <maa...@ya...> - 2003-10-27 11:44:06
Attachments:
parallel_prime.tgz
|
More test programs can be found in the migShm test suite on migShm website. Regards, Asmita --- Chirag Kantharia <chi...@hp...> wrote: > On Mon, Oct 27, 2003 at 03:07:19AM -0800, MAASK > Group wrote: > | which uses only clone() (and not pthreads). We had > | written a small multi-threaded program which would > | calculate primes numbers in a given huge range > using > | clone() dividing the range into smaller ones and > each > | thread calculating promies in the smaller range > given > | to it. > | This app migrated well and gave good performance > too. > <snip> > > Could I get a look at the program? > > Regards, > -- > Chirag Kantharia, Hewlett-Packard India Software > Operations > Bangalore, India. > __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ |
From: Walker, B. J <bru...@hp...> - 2003-10-27 07:53:16
|
If we used the token mechanism that the cluster filesystem CFS uses, the idea would be that, on a page level basis, any number of nodes can cache the page if they all need it r/o or a single node can cache it r/w. In that way one node can make many changes to the page while it is cached. The trick is working the mm to cache based on tokens and to relate the 2 memory images on the 2 nodes. =20 Bruce > -----Original Message----- > From: MAASK Group [mailto:maa...@ya...]=20 > Sent: Sunday, October 26, 2003 11:34 PM > To: fr...@ex... > Cc: Walker, Bruce J; ste...@ho...;=20 > ssi...@li... > Subject: Re: [SSI-devel] Re: get shared memory and user space talking >=20 >=20 >=20 > > In a kernel implementation (which is the direction I > > strongly lean), you > > could modify the memory manager to mark the shared > > pages so that an access > > (read or write) would cause a fault; the fault > > handler would use the > > coherence mechanism to make sure everything is > > consistent. > --- Won't that be too much of an overhead? Reads may > be ignored ... but for writes, for each byte access, > the whole page will be flushed (migShm has page level > granularity). This would result in network traffic per > access. >=20 > > This is a lot more dynamic and eliminates the need > > to modify the > > application. > --- Agreed :) >=20 > Regards, > Asmita >=20 > __________________________________ > Do you Yahoo!? > Exclusive Video Premiere - Britney Spears > http://launch.yahoo.com/promos/britneyspears/ >=20 |
From: MAASK G. <maa...@ya...> - 2003-10-27 11:15:01
|
ah okay. Sorry for my ignorance, I dont know how things are done in CFS. Regards, ASmita --- "Walker, Bruce J" <bru...@hp...> wrote: > If we used the token mechanism that the cluster > filesystem CFS uses, the > idea would be that, on a page level basis, any > number of nodes can cache > the page if they all need it r/o or a single node > can cache it r/w. In > that way one node can make many changes to the page > while it is cached. > The trick is working the mm to cache based on tokens > and to relate the 2 > memory images on the 2 nodes. > > Bruce > > > > -----Original Message----- > > From: MAASK Group [mailto:maa...@ya...] > > Sent: Sunday, October 26, 2003 11:34 PM > > To: fr...@ex... > > Cc: Walker, Bruce J; ste...@ho...; > > ssi...@li... > > Subject: Re: [SSI-devel] Re: get shared memory and > user space talking > > > > > > > > > In a kernel implementation (which is the > direction I > > > strongly lean), you > > > could modify the memory manager to mark the > shared > > > pages so that an access > > > (read or write) would cause a fault; the fault > > > handler would use the > > > coherence mechanism to make sure everything is > > > consistent. > > --- Won't that be too much of an overhead? Reads > may > > be ignored ... but for writes, for each byte > access, > > the whole page will be flushed (migShm has page > level > > granularity). This would result in network traffic > per > > access. > > > > > This is a lot more dynamic and eliminates the > need > > > to modify the > > > application. > > --- Agreed :) > > > > Regards, > > Asmita > > > > __________________________________ > > Do you Yahoo!? > > Exclusive Video Premiere - Britney Spears > > http://launch.yahoo.com/promos/britneyspears/ > > __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ |
From: Aneesh K. KV <ane...@di...> - 2003-10-28 14:48:35
|
Walker, Bruce J (HP) wrote: >If we used the token mechanism that the cluster filesystem CFS uses, the >idea would be that, on a page level basis, any number of nodes can cache >the page if they all need it r/o or a single node can cache it r/w. In >that way one node can make many changes to the page while it is cached. >The trick is working the mm to cache based on tokens and to relate the 2 >memory images on the 2 nodes. > >Bruce > > > > Last time i went through the code i remember the current implementation of token to be file based.( that's the entire file will be unmapped in case another node access the same mapping ) Am i missing something. ? Did david already made it page wise ? -aneesh |
From: Walker, B. J <bru...@hp...> - 2003-10-28 15:11:52
|
The code for page-level tokens is there but not yet leveraged so you are correct that so far it is file level. Hopefully David will get a chance to correct that. Bruce > -----Original Message----- > From: Kumar, Aneesh (Digital GlobalSoft)=20 > Sent: Tuesday, October 28, 2003 6:41 AM > To: Walker, Bruce J > Cc: MAASK Group; fr...@ex...; ste...@ho...;=20 > ssi...@li... > Subject: Re: [SSI-devel] Re: get shared memory and user space talking >=20 >=20 > Walker, Bruce J (HP) wrote: >=20 > >If we used the token mechanism that the cluster filesystem=20 > CFS uses, the > >idea would be that, on a page level basis, any number of=20 > nodes can cache > >the page if they all need it r/o or a single node can cache=20 > it r/w. In > >that way one node can make many changes to the page while it=20 > is cached. > >The trick is working the mm to cache based on tokens and to=20 > relate the 2 > >memory images on the 2 nodes. =20 > > > >Bruce > > > > > > =20 > > >=20 >=20 > Last time i went through the code i remember the current=20 > implementation=20 > of token to be file based.( that's the entire file will be=20 > unmapped in=20 > case another node access the same mapping ) Am i missing something. ?=20 > Did david already made it page wise ? >=20 >=20 > -aneesh >=20 >=20 >=20 |
From: <sc...@ya...> - 2003-10-24 02:07:21
|
is there any documentation available for the implementation of any of the openssi features. particularly i am interested in the CFS implementation and the Shared Memory implementation? regards stevie mckibbin ________________________________________________________________________ Want to chat instantly with your online friends? Get the FREE Yahoo! Messenger http://mail.messenger.yahoo.co.uk |
From: Mario C. <ma...@te...> - 2003-10-24 03:46:42
|
Hi, Stevie, Check this out.... http://h30097.www3.hp.com/cluster/cfs_wp_1002.pdf Hope it helps... I just found other day while researching TruCluster Software... Regards, Mario Carvalho -----Mensagem original----- De: ssi...@li... [mailto:ssi...@li...] Em nome de stevie mckibbin Enviada em: quinta-feira, 23 de outubro de 2003 15:07 Para: Walker, Bruce J Cc: ssi...@li... Assunto: [SSI-devel] implementation documentation is there any documentation available for the implementation of any of the openssi features. particularly i am interested in the CFS implementation and the Shared Memory implementation? regards stevie mckibbin ________________________________________________________________________ Want to chat instantly with your online friends? Get the FREE Yahoo! Messenger http://mail.messenger.yahoo.co.uk ------------------------------------------------------- This SF.net email is sponsored by: The SF.net Donation Program. Do you like what SourceForge.net is doing for the Open Source Community? Make a contribution, and help us add new features and functionality. Click here: http://sourceforge.net/donate/ _______________________________________________ ssic-linux-devel mailing list ssi...@li... https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.529 / Virus Database: 324 - Release Date: 16/10/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.529 / Virus Database: 324 - Release Date: 16/10/2003 |
From: MAASK G. <maa...@ya...> - 2003-10-27 05:33:53
|
Hi, Stevie, Sorry for this delayed reply ... was on leave for a few days. --- "Walker, Bruce J" <bru...@hp...> wrote: > The second feature migshm has is that it will > re-join clones that have > both migrated to the same node (so they share their > process address > space again). What I was hoping they had was a way > to actually > coherently have the clones on different nodes. I'm > pretty sure that is > not provided. --- That is wrong Bruce. This is in fact provided in migShm, provided that multi-threaded programs also use semaphores for synchronization. This constraint(for shared memory processes as well as clones) come in the picture because we need some event for handling consistency. Any calls in the kernel can be used for this. We decided upon semaphores as that came to our mind first when we talk of shared memory. As accesses to shared memory(or cloned memory map) cannot be detected, we do not know when to sync up all the copies ... so we had to put this constraint. >I believe we could provide the memory > re-join capability > pretty trivially in the current code base (I'll > check with John Byrne > on this). The concern we had about migrating clones > was making sure > they all migrated at the same time and to the same > node. I don't know > if openMosix does anything to guarantee that. --- Once again, this is not needed as migShm can have two clones running on different nodes with consistency handled as the time of release of the semaphore. Regards, Asmita __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ |
From: Frank M. <fr...@ex...> - 2003-10-27 06:02:15
|
MAASK Group wrote: > This constraint(for shared memory processes as well as > clones) come in the picture because we need some event > for handling consistency. Any calls in the kernel can > be used for this. We decided upon semaphores as that > came to our mind first when we talk of shared memory. > As accesses to shared memory(or cloned memory map) > cannot be detected, we do not know when to sync up all > the copies ... so we had to put this constraint. Please pardon my ignorance, but why is this a hard constraint? More specifically, why can't you detect shared memory accesses? Is this entirely a user-space implementation? In a kernel implementation (which is the direction I strongly lean), you could modify the memory manager to mark the shared pages so that an access (read or write) would cause a fault; the fault handler would use the coherence mechanism to make sure everything is consistent. This is a lot more dynamic and eliminates the need to modify the application. I believe (although I haven't checked the code lately) that this is how the SysV Shared Memory implementation works in OpenSSI. -- Frank Mayhar fr...@ex... http://www.exit.com/ Exit Consulting http://www.gpsclock.com/ http://www.exit.com/blog/frank/ |
From: MAASK G. <maa...@ya...> - 2003-10-27 07:34:08
|
> In a kernel implementation (which is the direction I > strongly lean), you > could modify the memory manager to mark the shared > pages so that an access > (read or write) would cause a fault; the fault > handler would use the > coherence mechanism to make sure everything is > consistent. --- Won't that be too much of an overhead? Reads may be ignored ... but for writes, for each byte access, the whole page will be flushed (migShm has page level granularity). This would result in network traffic per access. > This is a lot more dynamic and eliminates the need > to modify the > application. --- Agreed :) Regards, Asmita __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ |