RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes
1. Introduction
---------------
This software package contains the prototypical implementation of the approach
presented in the research paper titled "RECORD: Reference-Assisted Genome
Assembly for Closely Related Genomes" by K. Buza, B. Wilczynski and N. Dojer.
This file is a short documentation of the software.
2. Licence, terms and conditions of usage
-----------------------------------------
By using the software you agree that:
- the software is a research prototype, and therefore, there is absolutely
no guarantee associated with the software, neither for its fit for any specific
purpose, nor that it produces correct or accurate results,
- the fact that the software is a research prototype means that it may contain
much higher amount of bugs (errors) than usual (i.e., commercial) software. Such
bugs may cause that the software stops working unexpectedly or the software may
produce incorrect results, therefore, the software is neither desinged nor
suited for the usage in any operational environment (including, but not limited
to, industrial and medical environments),
- you may only use this software or its components entirely on your own
responsibility,
- the author of the software is NOT responsible in any way for ANY kind of
demage caused by the software or associated with its usage,
- it is absolutely forbidden to use the software in any application that is
not conform with the law or applicable regulations,
- the author definitely FORBIDS to use this software in ANY application that
aims to infect humans or to cause diseases to humans in any other way or
might be associated with killing human persons (including abortion in any stage
of embrionic development),
- whenever you use the software, you should properly acknowledge its authors
and/or the authors of the aforementioned research paper describing the
methodology behind the software (e.g. if you write a paper and use this
software, you are kindly asked to refer to the website from which you obtained
the software, and/or to the aforementioned research paper).
If you agree the above terms and conditions, you can freely use this software.
3. Installation
---------------
The prototypical implementation of the RECORD approach consists of a set of
scripts, most of them being written in Perl, while one of the programs was
written in Java. Therefore, in order to be able to run RECORD, you must
have Perl and Java installed on your computer. RECORD calls the genome
assembler Velvet and the genome aligner MUMmer, therefore, you need to have
these tools installed as well.
Summary of the software tools that have to be installed to run RECORD:
- Perl,
- Java,
- genome assembler called "Velvet" (tested with version 1.2.08),
- genome alignment tool called "MUMmer" (tested with version 3.23).
If the aforementioned required softwares are installed, in order to
install RECORD, you only need to copy the scripts into a new folder
and set them executable (using e.g. the command
chmod +x [name_of_the_script_file] ).
4. Running RECORD
-----------------
As the software tool RECORD has many parameters, in order to run it,
these parameters are provided in a structured text file that is
parsed by the program. Therefore, you only need to type
./main.pl configuration_file.txt
in order to start RECORD.
And example for the configuration file, together with the explanation
of each parameter is attached to the software, see
configuration_file.txt .
You have to prepare a workspace folder for each run of RECORD. In the
configuration file, you will have to provide the name of the workspace
folder. Please make sure that nothing else is stored in the workspace
folder, because RECORD will produce the intermediate results in the
workspace folder, and it may overwrite files that have the same name.
You have to place the reference genome in fasta format in the
"ref" subfolder under the workspace folder:
[WORKSPACE]/ref/ - this folder should contain the reference
genome of the species, which is a necessary input of the RECORD pipeline
The results of the intermediate steps of the RECORD will appear
in the "results" subfolder of your workspace folder and its
subfolders.
In particular:
[WORKSPACE]/results/pseudoreads1.fastq - the first mate of the pseudoreads generated
from the reference genome
[WORKSPACE]/results/pseudoreads2.fastq - the second mate of the pseudoreads generated
from the reference genome
[WORKSPACE]/results/velvet_assembly/ - the output of the genome assembler Velvet
[WORKSPACE]/results/alignment/ - this folder contains the alignment between
the contigs outputted by Velvet and the reference genome, RECORD uses MUMmer to
obtain this alignment
[WORKSPACE]/results/edited_ref/ - this folder contains the edited reference,
i.e., the primary output of RECORD in FASTA format, and an additional file similar
to MUMmer alignment-files which shows which parts of the reference were replaced
while editing the reference
5. Contact
----------
I case of further questions, you may contact the author of the software via
e-mail: Krisztian Buza, chrisbuza@yahoo.com
Good luck!