Menu

Obtaining subjobs out a job collection

2014-01-27
2014-04-30
  • Manuel Rodríguez-Pascual

    hi all,

    I am trying to obtain the jobs composing a job collection, but I keep founding some errors. Maybe you an help me to locate them.

    This is my source code. I have included 3 tests trying to see how GEJobDescription work, but none of them does.

    _log.info("Obtaining subjobs composing found job");
    //subjobs composing found jobs
    java.util.Vector<java.lang.String[]> subJobs = job.getSubJobs();
    //print info, just for fun (and test)
    _log.debug("List of subjobs composing found job");
    for (String[] subJob : subJobs){
    . _log.info("----- SUBJOB----");
    . String jobContent = "";
    . for (String param: subJob)
    . . jobContent += " | " + param;
    . _log.info(jobContent);
    . _log.info ("Looking for a job with ID: " + subJob[0]);
    . GEJobDescription firstTry = GEJobDescription.findJobDescriptionByJobId(subJob[0]);
    . _log.info("firstTry Job ID= " + firstTry.getjobId());
    . _log.info("firstTry files: " + firstTry.getInputFiles());
    . GEJobDescription secondTry = new GEJobDescription();
    . secondTry.findJobDescriptionByJobId(subJob[0]);
    . _log.info("secondTry Job ID= " + secondTry.getjobId());
    . _log.info("secondTry input files: " + secondTry.getInputFiles());
    . GEJobDescription thirdTryAux = new GEJobDescription();
    . GEJobDescription thirdTry = thirdTryAux.findJobDescriptionByJobId(subJob[0]);
    . _log.info("thirdTry Job ID= " + thirdTry.getjobId());
    . _log.info("thirdTry input files: " + thirdTry.getInputFiles());
    }

    with Job being an instance of ActiveInteractions.

    My ActiveJobInteraction table contains

    | id | common_name | tcp_address | timestamp | grid_interaction | grid_id | robot_certificate | proxy_id | virtual_organization | fqan | user_description | status | grid_ce | latitude | longitude | timestamp_endjob | email | e_token_server | id_job_collection |
    | 54 | test | 127:0:0:1 | 2014-01-27 15:18:12 | 12 | [wms://gridrb.fe.infn.it:7443/glite_wms_wmproxy_server]-[https://gridrb.fe.infn.it:9000/xTGDtubuv4AtOme4kkhQmA] | /C=IT/O=INFN/OU=Robot/L=Catania/CN=Robot: Catania Science Gateway - Roberto Barbera | 332576f78a4fe70a52048043e90cd11f | gridit | gridit | multi-infrastructure job description_1 | SUBMITTED | | 0 | 0 | NULL | | etokenserver.ct.infn.it:8082 | 53 |

    And JobDescription:

    | id | jobId | executable | arguments | output | error | queue | file_transfer | total_cpu | SPDM_variation | number_of_processes | JDL_requirements | output_path | input_files | output_files | proxy_renewal | resubmit_count |
    | 54 | [wms://gridrb.fe.infn.it:7443/glite_wms_wmproxy_server]-[https://gridrb.fe.infn.it:9000/xTGDtubuv4AtOme4kkhQmA] | phyml | -i jmodeltest4765588979883639522.phy -d nt -n 1 -b 0 --run_id HKY+I+G -m 010010 -f m -v e -c 4 -a e -s BEST --no_memory_check -o tlr | myOutput-1.txt | myError-1.txt | NULL | NULL | NULL | NULL | NULL | NULL | /tmp/jobOutput/ | /tmp/jModelTest2_executions_2014-01-27-15:17:12/phyml,/tmp/jModelTest2_executions_2014-01-27-15:17:12/jmodeltest4765588979883639522.phy,/opt/liferay/glassfish-3.1.2/domains/domain1/autodeploy/jModelTest2Parallel-portlet//WEB-INF/job/pilot_script.sh | jmodeltest4765588979883639522.phy_phyml_stats_HKY+I+G.txt,jmodeltest4765588979883639522.phy_phyml_tree_HKY+I+G.txt | Y | 10 |

    So I know the content has been correctly stored.

    then, If we look at the execution logs.

    [#|2014-01-27T15:54:13.829+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,829 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] ----- SUBJOB----
    |#]
    [#|2014-01-27T15:54:13.830+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,829 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] | 54 | liferay.com | jModelTest2Parallel-portlet | multi-infrastructure job description_1 | 2014-01-27 15:18:12.0 | SUBMITTED | 53
    |#]

    this means that the DB has been accesed.

    [#|2014-01-27T15:54:13.830+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,830 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] Looking for a job with ID: 54
    |#]

    This is the JobID that appears both in JobDescription and ActiveJobInteraction, as you can see in the previous database.

    [#|2014-01-27T15:54:13.831+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,830 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] firstTry Job ID= null
    |#]
    [#|2014-01-27T15:54:13.831+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,831 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] firstTry files:

    but nothing is returned. The same happens with the other ways of accesing the method, which I have created just in case the API works on a different way that I am expecting.

    |#]
    [#|2014-01-27T15:54:13.832+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,831 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] secondTry Job ID= null
    |#]
    [#|2014-01-27T15:54:13.832+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,832 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] secondTry input files:
    |#]
    [#|2014-01-27T15:54:13.833+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,832 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] thirdTry Job ID= null
    |#]
    [#|2014-01-27T15:54:13.833+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=104;_ThreadName=Thread-2;|15:54:13,833 INFO [http-thread-pool-8080(2)][jModelTest2Parallel_portlet:108] thirdTry input files:
    |#]

    But as yoy can see, none of the employed ways is capable of acessing the database and creating the desired object.

    Am I doing something wrong? Should I employ any other method?

     
    • Mario

      Mario - 2014-03-18

      Hello Manuel,

      first of all I'm sorry for the delay in response, but we were quite busy in the last weeks. However now I will try to answer your question.

      To access the userstracking database and retreive information about submitted jobs, you can use the UsersTrackingDBInterface (javadoc is here), in this class you can find all methods to interact with the DB. In particular, you can use the following two methods:

      # Vector<ActiveInteractions> getActiveInteractionsByName(String commonName)
      # Vector<ActiveInteractions> getDoneInteractionsByName(String commonName)
      

      These methods allow you to retrieve information respectively for:
      1. active interactions: jobs/job collections for which the output hasn't been downloaded
      2. done interactions: jobs/job collections for which the output has been downloaded

      I attached to this reply a java code snippet that you can use to get information about the sub jobs that belong to a job collection.

      I hope this is helpful to solve your issue.

      Cheers.
      Mario

       

      Last edit: Mario 2014-03-18
  • Manuel Rodríguez-Pascual

    Hi Mario,

    thanks for your response. I had been working on some other stuff so did not have time to look at it before, sorry.

    unfortunately, that does not completely solve my doubt.

    With your suggestion, it is possible to access ActiveJobInteraction table content. That is what I did on the first lines of my code:

    java.util.Vector subJobs = job.getSubJobs();
    for (String param: subJob)
    jobContent += " | " + param;

    Now, in my needs, the next steps are to:
    a) identify those subjobs
    b) obtain all their information, like input files.

    this has to be done accessing to JobDescription DB table.

    Looking at the database content (which I posted before) two possibilities arise from my point of view:
    a) accesing with ID=subjob[0], in this case "54"
    b) accesing with jobID = "wms://gridrb.fe.infn.it:7443/glite_wms_wmproxy_server]-[https://gridrb.fe.infn.it:9000/xTGDtubuv4AtOme4kkhQmA]" . I don't know how to obtain this from subjob object.

    So, starting from the information provided by DBInterface.getDoneInteractionsByName().getSubJobs()

    how can I create a subjob object that encapsulates the content of JobDescription? If not possible, how can I access that information from my portlet?

    thanks again for your help. Best regards,

    Manuel

     

    Last edit: Manuel Rodríguez-Pascual 2014-04-04
  • Mario

    Mario - 2014-04-08

    Hello Manuel,

    to achieve your purpose: once you get the Vector<ActiveInteraction> you should iterate its elements and for those that are collections you should retrieve the list of GEActiveGridInteraction. These objects have more detailed information than the ActiveInteractions ones, in particular they have the jobId attribute that links the records in the ActiveGridInteractions table with the ones in the JobDescription table.

    To do this you can use:

    # List<GEActiveGridInteraction>  GEActiveGridInteraction.findActiveJobForJobCollection(int idJobCollection)
    

    where idJobCollection is the ActiveJobCollections identifier (the external key in the ActiveGridInteractions), this method returns the information for the jobs belonging to the specified collection, by iterating the objects in that list, for those aren't in DONE status, using:

    # GEJobDescription GEJobDescription.findJobDescriptionByJobId(String jobId)
    

    where jobId is the job unique identifier, you can get an object containing the full job description: executable, arguments, inputFiles, etc...

    Please, pay attention that you could retreive JobDescription only for those jobs that aren't in DONE status, because when a job has been successfully complete its description is automatically deleted from the JobDescription table. There is also a transient state when the job is currently in submission, in this case it appears in SUBMITTED state in the ActiveGridinteractions table but it hasn't yet got the jobId, when the job is really submitted and it gets the job unique identifier you can use this id(jobId) to retrieve the description from the DB.

    Into the attached code snippet, you'll find an example in how to retrieve the subjobs descriptions

    I hope this time this post is more helpful than previous one.
    Feel free to contact me for further clarifications.

    Best regards.

    Mario

     
  • Manuel Rodríguez-Pascual

    Thanks very much for your help Mario. I understand it perfectly now.

    Then, now I don't really know how to proceed.

    As I have talked with Diego on some occasions, what I am doing is a portlet that:

    a) submits a number of tasks
    b) when they are all finished, it processes the partial results.

    To do so, I need both the input and output files.

    But with the information you have given me, I understand that there is no way of knowing where this partial results are. is it true? Or how should I do it?

     
  • Mario

    Mario - 2014-04-16

    Hi Manuel,

    maybe in the previous posts I missed the crucial point of your request. However I talked with Diego about your problem and I will try to summarize some steps that you could follow to know where the partial results are stored.
    As default behavior, the GridEngine collects the subjobs output files of a collection in an archive stored in the specified output path. When the collection is successfully completed you can "untar" this archive and access the output files.

    In particular you should:

    1. Retrieve the list of done collections;
    2. For each collection in the list, you can:
      • build the collection output path (collectionOutputDir), as follows: {outputPath}/{collectionIdentifier}_{collectionDbId}, where:
        • outputPath = jobCollection.getOutputPath();
        • collectionIdentifier = JSagaJobSubmission.removeNotAllowedCharacter(jobCollection.getDescripti*n()));
        • collectionDbId = jobCollection.getId();
      • build the collection output archive path. It has the following form: {collectionOutputDir}.tgz
      • untar the collection archive. This process produces a directory containing a set of subfolders with the output files of each subjobs;
      • get the list of the ActiveGridInteractions belonging to this collection, by using themethod findActiveJobForJobCollection(int idJobCollection);
      • finally, iterating over each element in the list, you could build the path to reach the directory containing the subjob output files, by using the database id of each ActiveGridInteractions. This path has the following form:
        {collectionOuputDir}/{collectionIdentifier}{subjobOrdinal}{ActiveGridInteractionsDbId}, where:
        • collectionOuputDir = see before;
        • collectionIdentifier = see before;
        • subjobOrdinal = index of the iteration (0 <= subjobOrdinal < n° of subjobs);
        • ActiveGridInteractionsDbId = ActiveGridInteractions database id (activeGridInteraction.getId());
          (obviously you know the output filename because you specified it when you created the description). Inside this folderyou should find the subjob output files.
    3. Repeat the previous process.

    To clarify the above process see the attached code snippet.

    Best regards.

    Mario.

     
  • Manuel Rodríguez-Pascual

    Hi Mario,

    you are completely right. I should have stated a bigger picture of my problem. As I have been talking with different people from your team, each of you only knows a small part of what I am trying to do, and that is not very efficient to ask for help. Thanks for taking the effort of talking to Diego and understanding it :)

    Anyway, your response is great, very complete and easy to understand. I will start with the development this very same afternoon.

    Thanks again for your help,

    Manuel

     
  • Manuel Rodríguez-Pascual

    I ghink I am close to the solution, but it is still not working.

    So, I am creating the collection that I submit to the Grid with:

    ArrayList<GEJobDescription> tasksToExecute = new ArrayList<GEJobDescription>();

    and then adding elements with:

    GEJobDescription miJobSubmission = new GEJobDescription();
    ...
    tasksToExecute.add(element);

    Now, to the the collection after the execution I follow your instructions:

    Vector<JobCollection> collections = JobCollection.getDoneJobCollections("test");
    ...(find the one with the desired ID)
    log.info("Collection Common name: "+ jobCollection.getCommonName() );
    _log.info("Collection Description: "+ jobCollection.getDescription() );
    _log.info("Collection ID: "+ jobCollection.getId() );
    _log.info("Collection output path: "+ jobCollection.getOutputPath() );
    ..
    String collectionIdentifier = JSagaJobSubmission.removeNotAllowedCharacter(jobCollection.getDescription());
    String collectionOutputDir = jobCollection.getOutputPath() + collectionIdentifier +"
    " + jobCollection.getId();

    This works OK and the output is correct.

    Then I unzip the results in

    tarPath = collectionOutputDir + ".tgz";

    and it works OK.

    But when accesing the tmp folders with

    List<GEActiveGridInteraction> activeGridInteractions = GEActiveGridInteraction.findActiveJobForJobCollection(jobCollection.getId());
    for (int i = 0; i < activeGridInteractions.size(); i++) {
    GEActiveGridInteraction activeGridInteraction = activeGridInteractions.get(i);
    String subjobOutputDir = collectionOutputDir + "/" +collectionIdentifier + "" + i + "" + activeGridInteraction.getId();
    System.out.println("\t|_" + subjobOutputDir);

    it fails.

    Looking at the log,

    Collection Common name: test
    Collection Description: JMT_Short_Test
    Collection ID: 58
    Collection output path: /tmp/jobOutput/
    ..
    name1=JMT_Short_Test
    name2=JMT_Short_Test
    name3=JMT_Short_Test
    ..
    tar -C /tmp/jobOutput/ -zxvf /tmp/jobOutput/JMT_Short_Test_58.tgz
    ..
    /tmp/jobOutput/JMT_Short_Test_58/JMT_Short_Test_0_299 <- does not exist

    and it is right, the folder does not exist. The content of the tgz file is:

    JMT_Short_Test_0_301
    JMT_Short_Test_1_299
    JMT_Short_Test_2_300

    As you can see the IDs are OK but in a different order or something like that. It is expecting "0_299" but the existing one is "1_299"

    Looking at the database, table ActiveGridInteractions(this is a bit simplified)

    | 299 | test | 12 | JMT_Short_Test_1 | DONE | 58 |
    | 300 | test | 12 | JMT_Short_Test_2 | DONE | 58 |
    | 301 | test | 12 | JMT_Short_Test_0 | DONE | 58 |

    the content is correct and it corresponds to the existing folder organization

    So, any suggestion?

    Thanks for your help,

    Manuel

     

    Last edit: Manuel Rodríguez-Pascual 2014-04-29
  • Diego Scardaci

    Diego Scardaci - 2014-04-30

    Hi Manuel,
    change the following line:
    String subjobOutputDir = collectionOutputDir + "/" +collectionIdentifier + "" + i + "" + activeGridInteraction.getId();

    to

    String subjobOutputDir = collectionOutputDir + "/" +activeGridInteraction.getUserDescription+ "_" + activeGridInteraction.getId();

    let me know if it is work, otherwise, please, send me the new path generated.

    Cheers,
    Diego

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.