set up nuke render farm

astan
2012-07-19
2012-12-11
  • astan

    astan - 2012-07-19

    Hi,
    Thanks for developing a simple and powerful render farm software. I had it set up really fast for a very small farm setup but now I'm having trouble getting the nuke renderers going:
    The server, client and slave are all running on windows 7 64-bit  but on different physical machines. All 3 machines mounts a network drive (that uses the same letter) so that all 3 machines are almost identical in set up.
    I've gotten the slave, server and client set up correctly (I think) because when I send a job from the client, I can see the slave (and server) trying to process it. I'm trying to send a nuke job to the farm using the 'Afanasy->submit job' menu item and making sure everything is on the network drive and nothing refers to local paths or UNC paths but when I submit, the slave gets the job and then fails immediately (and retries a few times). I'm not using the nuke plugins yet because I want to see if this works first.
    Now I'm trying to debug this error (or try and find if I've set up something incorrectly) but the log doesn't say anything except for 'avoiding host'. Is there some log to check if I've set up the slave correctly to run nuke? Like an environment variable or something?
    Anyway, thanks again

     
  • Timur Hairulin

    Timur Hairulin - 2012-07-19

    Hi.
    You can see task process output. There can be a useful error message.
    Task log is some messages produced by server, that it was started, finished or failed with an error.
    So, look at output (not only on log).

     
  • astan

    astan - 2012-07-19

    Thanks for pointing that out. I realized I didn't use keeper to set up the server which was what was causing the error.
    Thanks

     
  • astan

    astan - 2012-07-20

    Okay, might've been too eager on that one.
    I'm still having problems rendering nuke.
    Before doing anything, I modified the cgru-windows/afanasy/config_default.xml file so that the servername tags point to both the IP (IPv4) version of the server as well as the server name. I also modified the cgru-windows/software_setup/setup_nuke.cmd so that the nuke dir points to C:\Program Files\Nuke6.3v8
    These files that I've modified on the server, client and slave machines.
    This is how I set up the server (windows 7) first:
    1. run cgru-windows/start.cmd
    2. in the Keeper toolbar, go to AFANASY->browse and then run cgru-windows/afanasy/bin/afserver.exe

    This is how I set up the slave (windows 7) second:
    1. run cgru-windows/start.cmd
    2. in the Keeper toolbar, go to AFANASY->browse and then run cgru-windows/start/AFANASY/render.cmd

    This is how I set up the client (windows 7) third:
    1. run cgru-windows/start.cmd
    2. in the Keeper toolbar, go to AFANASY->start watch
    and then when that is up and running, to submit a job, in the Keeper toolbar, I run AFANASY->Submit Job
    The nuke job that I'm testing with is a simple one frame resize and works fine if I render it locally (all the mapped drives work fine) on the slave without the afanasy.  When I try to submit a job, the server console says something like:

    Fri 20 Jul 11:57.11: User registered: #2:99 astan j0/0 r0/-1 Fri 20 Jul 11:57.11 astan-pc T - 835 bytes.
    Fri 20 Jul 11:57.11: Job registered: "test_render.nk": astan@astan-pc - 6291 bytes.
    Fri 20 Jul 11:57.13: AfContainer::generateList: No node matches "server-pc" founded.
    AFERROR: af::msgsend: connect failure for msgType 'TMonitorJobsAdd': 10.10.30.13:51001: No error

    But the slave actually says something like this:

    Started PID=1016:  astan: test_render.nk-1
    Finished PID=1016: Exit Code=1

    Now I'm just wondering if I've set up the configuration incorrectly. I'm not using any database (but I thought afanasy can work without a database).
    Did I set up something incorrectly? Have I missed something?
    Thanks again for any help

     
  • Timur Hairulin

    Timur Hairulin - 2012-07-20

    It seems that afserver can't connect to 10.10.30.13:51001 address. Does your afwatch there ? May be firewall blocks the connection.

    That error:
    "Fri 20 Jul 11:57.13: AfContainer::generateList: No node matches "server-pc" founded."
    is not critical at all. This is keeper tries to get info about local render. Better to launch slave on server too to monitor its resources. You can set its priority to 0 (zero) it you do not run tasks on it.

     
  • Timur Hairulin

    Timur Hairulin - 2012-07-20

    Finished PID=1016: Exit Code=1
    - means that task finished with an error. see task process output for details.

     
  • astan

    astan - 2012-07-22

    Thats wierd. If the ports are blocked, wouldn't the slave (and server) see no activity from the client? The afwatch is the client program right? so yes, thats the client

     
  • Timur Hairulin

    Timur Hairulin - 2012-07-23

    Afanasy clients afwatch, afrender listen a port too. And send server its address on register. If client port is blocked, afserver can't send information to client. So afwatch will not receive changes.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks