Could you benifit from genotype format converter: FCgene ?

2012-10-09
2013-05-24
  • Nab Raj Roshyara

     
  • O. Sotolongo

    O. Sotolongo - 2013-05-24

    I'm preparing the imputation of 5 DBs. I want to compare impute2 and mach results. Since my DBs are in plink format FCgene is very usefull to me.

    A point I found very usefull is the script information from the impute conversion. However the perl script launches the imputation processes in a serial way, at least in my system. As an example, I took the chromosome 22 of one of my DBs and got this script,

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    #!/usr/bin/perl -w
    my @lowerBound=(15544478, 20644310, 25744142, 30843974, 
    35943806, 41043638, 46143470);
    my @upperBound= (20644309, 25744141, 30843973, 35943805, 
    41043637, 46143469, 51243297);
    my $nChunks = 7;
    my @segs= (0 .. $nChunks-1);
    foreach $i (@segs)
    {
            my $actChunk=$i+1;
             system (
                    "./impute2 ". 
                    "-m genetic_map.txt ". 
                    "-g  mydb_2imp2_chr22.gens ".
                    "-strand_g  mydb_2imp2_chr22.strand.txt ". 
                    "-int  $lowerBound[$i]  $upperBound[$i] ". 
                    "-Ne 11418  -call_thresh 0.9 -pgs ". 
                    "-o mydb_2imp2_chr22.impute2_chunk_$actChunk"
            );
            print("\nImputation of chunk $actChunk  is finished.\n\n");
    }
    

    I need to change it so it can be run in parallel,

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    #!/usr/bin/perl -w
    use Parallel::ForkManager;
    #change below the number of CPUs you will use
    my $max_processes = 4; 
    my @lowerBound=(15544478, 20644310, 25744142, 
    30843974, 35943806, 41043638, 46143470);
    my @upperBound= (20644309, 25744141, 30843973,
    35943805, 41043637, 46143469, 51243297);
    my $nChunks = 7;
    my @segs= (0 .. $nChunks-1);
    my $pm = new Parallel::ForkManager($max_processes);
    foreach $i (@segs){
        my $actChunk=$i+1;
        $pm->start and next;
        system (
            "impute2 ". 
            "-m genetic_map.txt ". 
            "-g  mydb_2imp2_chr22.gens ".
            "-strand_g  mydb_2imp2_chr22.strand.txt ". 
            "-int  $lowerBound[$i]  $upperBound[$i] ". 
            "-Ne 20000 -call_thresh 0.9 -pgs ". 
            "-o mydb_2imp2_chr22.impute2_chunk_$actChunk"
        );
        print("\nImputation of chunk $actChunk  is finished.\n\n");
        $pm->finish;
    }
    

    I know this can be done in several ways (GNU parallel, bash jobs) but I prefer doing it at a Perl way.

    Maybe this kind of info could be useful for final users.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks