omuscd-general Mailing List for omuscd
Status: Beta
Brought to you by:
g_remlin
You can subscribe to this list here.
| 2007 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
(2) |
Jun
(3) |
Jul
(7) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(9) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: g_remlin <g_r...@ro...> - 2009-07-31 12:59:55
|
omuscd is certainly doing it's bit Ok, so it is either possibly a network problem, or more likely the openMosix kernel itself. I suggest you post to "Linuxpmi" who have taken over openMosix, and see if you can get help there. Don't forget to mention what kernel version you are using, where you obtained the source from, and a description of your network setup. Also quote the IP addresses and PID's in full (including the contents of the PID's "where" "debug" & "stay" of both computer nodes) the more detailed information you can give, the more likely someone will help you. Include any error messages "exactly" as they appear on your screen, "I got an error something like" is not helpful, a developer cannot search the source to find an error "something like" it! Generally, developers will not waste their time asking questions to get technical information from you and will ignore posts such as "my process won't migrate, why?", as there could be a million reasons. In short, help them to help you. |
|
From: Anna P. <and...@gm...> - 2009-07-31 11:03:04
|
I change switch to hub. Process try to migrate from node1 to node2 but problem is the same when I check /proc/PID/om/where I see: first: migration later: IP_node2 (correct) and finally: 0.0.0.0 It looks like the only one packet has migrated and the process gone .. -- Ania |
|
From: g_remlin <g_r...@ro...> - 2009-07-30 19:31:35
|
It looks like the network layer timed out. Make sure you do not have any firewalls running. I presume you are using a managed switch. If you can, I suggest you change the switch for a standard hub and try again. |
|
From: Anna P. <and...@gm...> - 2009-07-30 13:44:18
|
Ok, I change some properties in switch configuration, and the process isn't now on 0.0.0.0 Now, cat /proc/PID/om/where tell me: migration But, when I wait dor the moment, I see: [oM] Migration of process [PID] failed with error -110 and other information with name of function For example: task_local_send: failed __task_move_to_node+0x10a/0x12f .... etc... -- Ania |
|
From: Anna P. <and...@gm...> - 2009-07-30 12:37:00
|
Im sure that the PID of process which try to migrate is correct. It's the PID od "timewaster". /proc/PID/om/stay is clear\empty. /proc/PID/om/debug say: debug: dflags: 0x00000101 I don't know yet why it's mean. -- Ania |
|
From: g_remlin <g_r...@ro...> - 2009-07-29 23:45:46
|
> > I see that the process isn't on node which one node try to migrate > (send?) to another. > For example: > I see that node 192.168.1.21 send signal migration process PID to node > 192.168.1.22 > but on node 192.168.1.22 I didn't see this process. > > When I try to look where he (the process which try to migrate) is: > (from node 192.168.1.21) > cd /proc/PID/om/ > cat where > the response is : > > 0.0.0.0 (instead 192.168.1.22) > > OK, because omuscd tries to migrate a process, it looks like it is working properly. Confirm omuscd is trying to migrate the "correct" process, this is important, not all processes are migratable. Check that the PID shown by omuscd matches one of the "timewaster" PID's, if it does, the problem is with the openMosix kernel itself, not omuscd. You then need to look at /proc/$$/om/stay and /proc/$$/om/debug of the process that is supposed to migrate. These will give you a clue what the openMosix kernel is doing. You may have to look at the openMosix source to see what they mean as I do not think they are documented (or post them to this list). |
|
From: Anna P. <and...@gm...> - 2009-07-29 21:21:23
|
Thank You for response. I see that the process isn't on node which one node try to migrate (send?) to another. For example: I see that node 192.168.1.21 send signal migration process PID to node 192.168.1.22 but on node 192.168.1.22 I didn't see this process. When I try to look where he (the process which try to migrate) is: (from node 192.168.1.21) cd /proc/PID/om/ cat where the response is : 0.0.0.0 (instead 192.168.1.22) P.S. Sorry for my english .. (is not good). 2009/7/28 g_remlin <g_r...@ro...>: > You have not given enough information to determine anything useful, how > do you know the process did not migrate ? > > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Omuscd-general mailing list > Omu...@li... > https://lists.sourceforge.net/lists/listinfo/omuscd-general > -- Ania |
|
From: g_remlin <g_r...@ro...> - 2009-07-27 22:29:00
|
You have not given enough information to determine anything useful, how do you know the process did not migrate ? |
|
From: Anna P. <and...@gm...> - 2009-07-27 13:13:23
|
Hello, I've got a problem: I have two computers in my cluster. Computers are link in switch. This is a small diskless cluster (bootable in network). In each computers omuscd is activate (in peer mode) Each node see the others. On one node I used test program "timewaster" (4-5 times). omuscd wrote: "Signaled migration ... PID.... to IP" (or sth like this) but process dont migrate.. Why? Can somebody help me? -- Ania |
|
From: g4sra <g4...@ya...> - 2007-10-02 12:54:36
|
>Can anyone can tell me why my process was not migrate ? >The log level 255 give me this : /Rejecting PID mypidnumber, reason to stay "Monkey" / The key phrase in this log message is "reason to stay". the openMosix kernel itself has deemed that this process is unsuitable for migration. As omuscd does not concern itself as to "why" a process is unsuitable (it only cares about suitable processes). omusd has been helpful in telling you the process in this specific instance is rejected as it is a "Monkey", you will need to refer to the openMosix kernel information (suggest their email archives or wiki, or post to ope...@li... :>) to find out what a "Monkey" is (or hope an openMosix developer reads your post here)./ / |
|
From: g4sra <g4...@ya...> - 2007-07-09 19:02:03
|
You can use the ps command (as root) to view all processes (including migrated processes) running on a node. The command you illustrate is for running omuscd automatically at system boot from your /etc/inittab, not for testing, remove it until you are happy with the way your cluster is functioning.. Whilst testing, just run omuscd in a normal console (type "omuscd" as root at a command prompt on all your computers), the output from omuscd will indicate if it tells the kernel to migrate a process. The "lines mean" you have two computer's (running omuscd) in your cluster, with the IP's of 192.168.1.180 and 192.168.1.182. You executed monitor.py on 192.168.1.182, neither computer is doing any work, create a high load on one of the nodes (use the "timewaster" test program, run "make" whilst in the examples directory to compile it). Don't bother with the python scripts until you know what you are doing. |
|
From: yong l. <yle...@ya...> - 2007-07-09 17:45:33
|
Hi, is there a way to know whether jobs are being
migrated or not ? Is there a way to check the output
message to figure this out? I ran the omuscd program
as instructed, but I do not know what I am doing. My
output for the following command is blank so far.
/usr/bin/openvt -f -w -c 1 -- /usr/local/bin/omuscd -n
-l63
Also, I ran the monitor.py in the examples directory,
and my output is the following. Would someone explain
to me what these lines mean ? If you have helpful
scripts, would you share with me or tell me where/what
to run in the examples directory. I am not familiar
with these scripting language. Many thanks.
iteration 172 life cost loadavg freemem weight
----------------------------------------------------------
192.168.1.180: 44 13 0.00000 1632600 501
192.168.1.182: 44 1 0.00000 1500980 0
----------------------------------------------------------
iteration 173 life cost loadavg freemem weight
----------------------------------------------------------
192.168.1.180: 44 13 0.00000 1632600 501
192.168.1.182: 44 1 0.00000 1500980 0
----------------------------------------------------------
iteration 174 life cost loadavg freemem weight
----------------------------------------------------------
192.168.1.180: 44 13 0.00000 1632600 501
192.168.1.182: 44 1 0.00000 1500980 0
----------------------------------------------------------
iteration 175 life cost loadavg freemem weight
----------------------------------------------------------
192.168.1.180: 44 13 0.00000 1632600 501
192.168.1.182: 44 1 0.00000 1500980 0
____________________________________________________________________________________Ready for the edge of your seat?
Check out tonight's top picks on Yahoo! TV.
http://tv.yahoo.com/
|
|
From: Florian D. <fd...@e8...> - 2007-07-04 06:27:36
|
Did you activate the debug option in the HPC sub menu of the make menuconfig ? yong lee wrote: > Hi, when I ran the omuscd tool, I got the following > error. I had built and boot using the openmosix kernel > 2.6.17 version. > > The error is -"This does not appear to be an openMosix > kernel! The required openMosix kernel debug option is > not present. Please compile an openMosix debug enabled > kernel!" > > I did 'make all' and 'make install' to build and > install the kernel. Please tell me how do I turn on > the debugging mode when I build the kernel. > > Thanks > Yong > > > > ____________________________________________________________________________________ > Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. > http://smallbusiness.yahoo.com/webhosting > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > openMosix-general mailing list > ope...@li... > https://lists.sourceforge.net/lists/listinfo/openmosix-general > > |
|
From: g4sra <g4...@ya...> - 2007-07-04 00:09:31
|
When you configure the kernel HPC options (make xconfig, or make
menuconfig), make sure that "debug" is selected.
<*> HPC Communication daemon
[*] Enable OpenMosix clustering
[*] Enable OpenMosix to be more verbose
[*] Add some message when migrating
[*] Enable OpenMosix debug
[ ] Add lots of message and print step when
migrating
<*> Add debug files on debugfs
< > control filesystem for openMosix
|
|
From: yong l. <yle...@ya...> - 2007-07-03 23:00:04
|
Hi, when I ran the omuscd tool, I got the following
error. I had built and boot using the openmosix kernel
2.6.17 version.
The error is -"This does not appear to be an openMosix
kernel! The required openMosix kernel debug option is
not present. Please compile an openMosix debug enabled
kernel!"
I did 'make all' and 'make install' to build and
install the kernel. Please tell me how do I turn on
the debugging mode when I build the kernel.
Thanks
Yong
____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
http://smallbusiness.yahoo.com/webhosting
|
|
From: g4sra <g4...@ya...> - 2007-07-03 22:50:03
|
For testing, log into a terminal as root, and execute "omuscd" on every node in your cluster. To see if it working, watch the log output on the terminal display. There is a Fedora/RedHat style startup script in the examples directory. Alternatively, once you are happy it is working properly, put an entry in /etc/inittab as covered in the FAQ. |
|
From: yong l. <yle...@ya...> - 2007-07-03 21:41:02
|
Hi,
I am new to openMosix and this tool. After reading
through the FAQ.src and other documentations, I am
still confused on how to check whether this tool is
working or not. Would someone please shed some light,
please ?
If you have the start up and shut down script for this
omuscd tool, would you mind share it with me, or tell
me how to start this thing as a daemon in my debian
system. Also, Can you tell me how to verify whether
the jobs are migrating or not.
Many Thanks
Yong
____________________________________________________________________________________
Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545433
|
|
From: g4sra <g4...@ya...> - 2007-06-18 01:00:26
|
Wes Wagner wrote: > Thanks, I was having a problem with an interface driver, however, I am > still confused as to how I am supposed to get all my nodes listening > to multicast address 239.192.0.1:1334 <http://239.192.0.1:1334> since > it is quasi-hard coded. I can't assign the nodes addresses to the > subnet, and the nodes consider the address unreachable (and for > seemingly good cause... there is no way to route to it legally). > > Sincerely, > Wes Wagner > Hi Wes, after reading your emails all together, I fear you may have changed the hard-coded IP in omuscd. To do so will definitely break it, and multicast will fail. It is the IP range itself that has special meaning to the kernel network layer. The way to route it legally is to create a default route, OR tell omuscd which network device to use (it will then tell the kernel). The omuscd daemon is virtually plug'n'pray you need change nothing in the source. |
|
From: Wes W. <wes...@gm...> - 2007-06-17 02:38:14
|
Thanks, I was having a problem with an interface driver, however, I am still confused as to how I am supposed to get all my nodes listening to multicast address 239.192.0.1:1334 since it is quasi-hard coded. I can't assign the nodes addresses to the subnet, and the nodes consider the address unreachable (and for seemingly good cause... there is no way to route to it legally). Sincerely, Wes Wagner On 6/16/07, g4sra <g4...@ya...> wrote: > > Wes Wagner wrote: > > I have a 2.6.17 openmosix AMD64 kernel and a AMD64 build of omuscd. > > > > My issue is that it attempts to multicast to the default address of > > 239.192.0.1:1334 <http://239.192.0.1:1334> - but that is unreachable. > > My two nodes I am trying to connect up are connected by a layer 2 only > > gigabit switch. > > > > What am I missing from an installation standpoint toget these systems > > speaking with each other? > > > > Thanks - > > > > -Wes Wagner > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Omuscd-devel mailing list > > Omu...@li... > > https://lists.sourceforge.net/lists/listinfo/omuscd-devel > > > Hi Wes, I don't quite follow your email. > > If you mean "omuscd" is reporting that the multicast network > (239.192.0.1) is unreachable - > It can't figure out which network device interface (eth0, eth1, etc) it > needs to use, this can happen if you do not have a default route > configured. You can resolve this issue either by creating a default > route ("route add default eth0", or whatever) or by specifying the > network device interface for omuscd to use as an argument. If you choose > the latter, remember that specifying any argument will turn off the > debug defaults (and omuscd will detach from the console), probably not > what you want, you must specify all options you want. > eg "omuscd -n -p -l127 -i eth0" > > I will amend the FAQ to include this information - thanks for raising my > awareness to it's omission. > If this does not solve your problem, please raise it again. > > N.B the advantages of porting omuscd to 64bit (due to the nature of the > beast) are insignificant, I am currently prioritizing development of > openMosix itself (on i386) over omuscd, which is well ahead of the game > in functionality. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Omuscd-general mailing list > Omu...@li... > https://lists.sourceforge.net/lists/listinfo/omuscd-general > |
|
From: g4sra <g4...@ya...> - 2007-06-16 21:45:50
|
Wes Wagner wrote: > I have a 2.6.17 openmosix AMD64 kernel and a AMD64 build of omuscd. > > My issue is that it attempts to multicast to the default address of > 239.192.0.1:1334 <http://239.192.0.1:1334> - but that is unreachable. > My two nodes I am trying to connect up are connected by a layer 2 only > gigabit switch. > > What am I missing from an installation standpoint toget these systems > speaking with each other? > > Thanks - > > -Wes Wagner > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > ------------------------------------------------------------------------ > > _______________________________________________ > Omuscd-devel mailing list > Omu...@li... > https://lists.sourceforge.net/lists/listinfo/omuscd-devel > Hi Wes, I don't quite follow your email. If you mean "omuscd" is reporting that the multicast network (239.192.0.1) is unreachable - It can't figure out which network device interface (eth0, eth1, etc) it needs to use, this can happen if you do not have a default route configured. You can resolve this issue either by creating a default route ("route add default eth0", or whatever) or by specifying the network device interface for omuscd to use as an argument. If you choose the latter, remember that specifying any argument will turn off the debug defaults (and omuscd will detach from the console), probably not what you want, you must specify all options you want. eg "omuscd -n -p -l127 -i eth0" I will amend the FAQ to include this information - thanks for raising my awareness to it's omission. If this does not solve your problem, please raise it again. N.B the advantages of porting omuscd to 64bit (due to the nature of the beast) are insignificant, I am currently prioritizing development of openMosix itself (on i386) over omuscd, which is well ahead of the game in functionality. |
|
From: g4sra <g4...@ya...> - 2007-05-23 19:38:41
|
Hi Fernando,
From your report, I deduce that omuscd has decided to migrate a process
to reduce the load, but cannot identify a process suitable for
migration. Run a couple of instances of the timewaster program
(distributed with omuscd source) and confirm it migrates (it should)
proving there is nothing wrong with your cluster (or omuscd). All being
well, examine the process that you are running that fails to migrate.
Check that the process itself is suitable for migration on openMosix by
checking it has no "reason to stay" flags set ("cat /proc/$$/om/stay",
where $$ is the PID of your program). You will need to refer to the
openMosix project to identify what the different flags (such as "system"
or "monkey" mean). If there is no reason for the process to stay, check
that the program is running with an User ID & Group ID set to 500 or
greater, omuscd will not select a process for migration with UIDs or
GIDs lower than this, so you cannot migrate a process run as root (UID
0). Log in and run your program as a normal user. Hope this helps, if
not, set omuscd's log level to 255, run your process, and then email me
(g_remlin at users.sourceforge.net) the log. When you fix it, please
post how you corrected the problem to the list so others (and myself)
may benefit from your experiences.
|
|
From: Fernando S. <fsp...@gm...> - 2007-05-23 17:48:43
|
Hello folks, I`m running omuscd on 9 gentoo boxes with 2.6.17-om kernel If I run a controller and drones, or all boxes as peer, I always get a message like : Peer overloading, no candidate process The communication between then are ok, but they don't migrate process Any suggestion or additional information ? Thanks Fernando |
|
From: C R. <c.r...@or...> - 2007-03-14 18:46:46
|
Hi Matt, from the information you have given, I can not determine any reason why omuscd should segfault. I suggest you run omuscd in a text console with debug level logging enabled and then posting the log output for me to see. the omuscd daemon is specifically aimed at running on a minimal system so should be no problems there. |
|
From: C R. <c.r...@or...> - 2007-03-14 18:36:14
|
Hi Apollo, I suggest you direct all future email queries to omu...@li... as specified in the documentation. >I changed the hardcode ip address to my masternode 190.168.0.1... Why ? Please read (at least the very first paragraph of) the /usr/local/share/doc/omuscd-0b2/README file distributed with the source. Changing the IP broke omuscd. I suggest you reinstall and "make install" the source as distributed. Make sure that you distribute the same binary version of "/usr/local/bin/omuscd" to all your cluster nodes. >the default port for the omuscd of 3490 is taken up by other service Please email me stating which service is using this port, it may be preferable to change the default port used by omuscd. Read the manpage "man omuscd", look at the EXAMPLES section and also refer to /usr/local/share/doc/omuscd-0b2/FAQ distributed with the source, both of which specifically cover controller\drone clusters and changing the default ports used. Passing any argument to the omuscd daemon will switch off the development defaults, so you will probably want something like the following (see the man page): When starting omuscd from a consol on the controller node: omuscd -n -t42000 -l127 -c on each the drone nodes: omuscd -n -t42000 -l63 -d Alternatively if running as a daemon at startup, as specified in the documentation edit /etc/sysconfig/omuscd and /etc/services file on every node . Keep me informed of your progress :>). |
|
From: Matt <bri...@th...> - 2007-03-10 07:41:48
|
I built omuscd on my ubuntu system, not using a openmosix kernel. This was for a linux livecd/pxeboot image which is just the barebones however, omuscd throws a Segmentation Fault after it seems like it starts. I was wondering if there was anything vital that i was missing to get omuscd working on a very minimal system. |