TO INSTALL:
LINUX: just run make in the folder with the source files
WINDOWS: Compile using your favorite compiler or using Visual C++ to open the sln file and build
Example:
Eloper -i1 Mesoplasma_f.fa -i2 Mesoplasma_r.fa -o1 Mesoplasma_f_eloped.fa -o2 Mesoplasma_r_eloped.fa -md10 -ms20
The program supports the following command line parameters:
-ax, where x is either
n (meaning the program must only thicken and save the paired reads. This is the default) or
y (meaning the program must thicken the paired reads, split them into single ones, run the
assembly again and save the results)
-cln, where n specifies the maximum length of the overlapping area of two sequences that is checked
before the sequences can be considered matching. The parameter is ignored unless -fy is found
in the command line
-fx, where x is either
n (meaning the program must not check the sequences for match beyond the matching area of the
specified length, this is the default) or
y (meaning the program must make sure the complete overlapping area of the sequences matches
fully or up to the limit set by -cln parameter)
-i1FileName, where FileName is the name of the first input file in FASTA format (forward reads).
If this parameter is not supplied the default name is used: f_i.fa.
-i1 FileName - same as -i1FileName
-i2FileName, where FileName is the name of the second input file in FASTA format (reverse reads).
If this parameter is not supplied the default name is used: r_i.fa.
-i2 FileName - same as -i2FileName
-ln, where n specifies the maximum number of overlap references that will be stored and processed for
each sequence. The value of 0 (which is the default) there's no means no limit at all.
-max, where x is either
s (meaning the more sophisticated merging algorithm is to be used. This is the default) or
b (meaning a basic merging algorithm must be used)
-mdn, This option specifies the minimum length of the overlapping area for each of the two ends of the
sequences that will cause the sequences to be considered overlapping. The default value is 16.
-mlx, where x is the upper limit for the total amount of memory (in GB) to be allocated for link descriptors.
If this limit is reached, threads that can not get link descriptors stop looking for more matches.
-msn, This option specifies the minimum length of the overlapping area between two single (de-coupled)
sequences that will cause the sequences to be considered overlapping. The default value is 32.
-o1FileName, where FileName is the name of the first output file in FASTA format (thickened forward reads).
This file is only created if the program is run in "don't assemble single reads" mode (-asn),
see -asx command line option.
If this parameter is not supplied the default name is used: f_o.fa.
-o1 FileName - same as -o1FileName
-o2FileName, where FileName is the name of the second output file in FASTA format (thickened reverse reads).
This file is only created if the program is run in "don't assemble single reads" mode (-asn),
see -asx command line option.
If this parameter is not supplied the default name is used: r_o.fa.
-o2 FileName - same as -o2FileName
-osFileName, where FileName is the name of an output file in FASTA format containing thickened single reads.
This file is only created if the program is run in "assemble single reads" mode (-asy),
see -asx command line option.
-os FileName - same as -osFileName
-pln, where n specifies the maximum number of passes. By default, the number of passes is unlimited and only
depends on the ability to merge the sequences
-px, where x is either
y (meaning the program must work with paired end sequences spread across the two input files.
This is the default) or
n (meaning the program must load just one data file and work with single sequences)
-rn, this command line option specifies the relative orientation of the reads in the second (reverse) file.
Depending on the actual value of n, the option specifies the following formats:
0 (which is also the default) means that the reads in the second file are reverse complements of the
opposite end of the respective seqeunce from the first file.
1 means that the reads in the second file are the opposite ends of the respective seqeunce from the
first file.
2 means that the reads in the second file are the reversed opposite ends of the respective seqeunce
from the first file.
-tn, Where n specifies the maximum number of overlap finding threads that will be created for each pass.
The actual number of threads that will be used will never exceed the number of processors (CPUs)
available on the system. If this parameter is not specified, the actual number of threads will
equal the number of available processors.
-ux, where x is either
y (meaning the program must attempt unifying the opposite ends of the paired sequence after each pass,
this is the default)
n (meaning the program must not attempt unifying the opposite ends of the paired sequence after each pass)
-vmx, where x is either
y (meaning the program must not create a link descriptor until it verifies the sequences actually match.
This causes the program to run much slower but use less memory)
n (meaning the program must create a link descriptor based solely on the hash value match of any of the
subsequences. This is the default and it causes the program to run much faster but use much more
memory)
Samples of commands:
Eloper -i1 Mesoplasma_f.fa -i2 Mesoplasma_r.fa -o1 Mesoplasma_f_cInf.fa -o2 Mesoplasma_r_cInf.fa -md10 -ms20 -ml10 -vmn -mab -fy -un -an
Eloper -i1CE_I_20_f -i2CE_I_20_r -o1CE_I_20_f_c1_a.fa -o2CE_I_20_r_c1_a.fa -pl10 -md15 -ms30 -ml10 -vmy -mab -fy