ReferenceFree Code

Scripts for reference free genomic analysis

Status: Beta

Brought to you by: jrharting

Tree [a30473] master / History

HTTPS access

File	Date	Author	Commit
ReferenceFreeSource	2012-04-17	John Harting	[c4a85b] initial commit
phylokmer	2012-04-17	John Harting	[c4a85b] initial commit
ABYSS	2012-04-17	John Harting	[c4a85b] initial commit
AssembleGroups.py	2012-04-17	John Harting	[c4a85b] initial commit
AssembleTypeI.py	2012-04-17	John Harting	[c4a85b] initial commit
Makefile	2012-04-17	John Harting	[c4a85b] initial commit
README.txt	2012-04-17	John Harting	[a30473] Added README
ReadsSelector	2012-04-17	John Harting	[c4a85b] initial commit

Read Me

Inside the zip file:
AssembleTypeI.py
AssembleGroups.py
ReferenceFreeSource (directory)
	ReferenceFree package
	ReadsSelector.cpp
phylokmer (directory)
	various, see below

ReferenceFreeSource is the package containing code for both Python applications. (Author: John Harting)
AssembleTypeI.py is an application for creating TypeI contigs. (Author: John Harting)
AssembleGroups.py is an application for creating group contigs. (Author: John Harting)
ReadsSelector.cpp is the C++ source for ReadsSelector (Author: Ye Chengxi)
phylokmer is a directory containing several programs (C, Perl) for creating shared kmer files (Author: Jue Ruan)


Dependencies:
Perl
Python 2.6 and higher 2.X versions  (NOT Python 3.0+).
Biopython package for Python
ReadsSelector
ABYSS
g++/gcc compilers

Installation:
1.  Unpack into some directory and cd into that directory

3. Compile ReadsSelector and make Assemble*.py files executable by typing 'make' at the command line.

make

4. Compile phylokmer programs (used to build shared kmer files).

cd phylokmer
make 

5. Compile ABYSS into same directory as Assemble*.py apps.  http://www.bcgsc.ca/platform/bioinfo/software/abyss
   (In the future we will add the ability to use other assemblers, but for now, ABYSS is the go-to program)

(You can also compile ABYSS elsewhere and put a symbolic link pointing to it in the ReferenceFree directory with the Python apps)

6. Install the ReferenceFree package into python.

cd ReferenceFreeSource
sudo python setup.py install

*If you do not wish to install the package into the system-wide python site-package directory using sudo, you can install it into a suitably-enabled directory in your pythonpath using:

python setup.py install --prefix=/yourdir

**If you do not have admin access and/or want to set up a virtual python interpreter with local package directory (e.g. on a university computing grid), you can do something like the following which will set up in /mydirectory (see http://pypi.python.org/pypi/virtualenv for more info):

wget https://raw.github.com/pypa/virtualenv/master/virtualenv.py
python virtualenv.py /mydirectory
source /mydirectory/bin/activate
cd /path/to/ReferenceFreeSource
/mydirectory/bin/python setup.py install

Then, when you execute AssembleTypI.py and/or AssembleGroups.py, make sure you invoke the virtual python interpreter explicitly:

/mydirectory/bin/python AssembleTypeI.py [args] [options] 
/mydirectory/bin/python AssembleGroups.py [args] [options]

(You can also add /mydirectory/bin to your PATH variable by adding a line in your .profile or .bashrc (or whatever shell initialization file you use) to make this the 'default' python interpreter)  

-------------------------------------------

Help menus for phylokmer programs can be displayed by typing the executable name at the terminal.  The following is an example set of commands to create a shared kmer file for analysis:

perl /path/to/phylokmer/phylokmer.pl -l 21 -n 3 -d /path/to/sequence_data -f FA -j 1 -o /path/to/outputdata/pkdat/somegroupname_l21_n3_j1.shared.pkdat
mkdir /path/to/outputdata/pkdat/somegroupname_21l/
mv /path/to/outputdata/pkdat/*pkdat /path/to/outputdata/pkdat/somegroupname_21l/

NOTE:  The /path/to/sequence_data directory contains directories of fa/fq files for each taxa/individual.  The directories should be labelled 'Genus_species' (or more generally 'identifier1_identifier2'), eg:

/path/to/sequence_data/genusA_species1
/path/to/sequence_data/genusA_species2
/path/to/sequence_data/genusB_species1
...

-------------------------------------------

Help menus for both Python applications can be displayed by the following at the terminal:

./AssembleTypeI.py -h
./AssembleGroups.py -h

There are lots of options, but most of them have 'reasonable' defaults.  Each help menu has a set of 'minimum' arguments to run the steps in the applications.  Assuming you already have your shared kmer file and reads available, a typical run of one of the programs completing all steps would use the command:

./AssembleTypeI.py all -s /path/to/sharedkmerfile -i /path/to/kmer/dir -r /path/to/reads 

Each step can take a little while, depending on the dataset size, so its also possible to run steps independently (see help). 

Data outputs go to the specified or generated directories (see options), and also stdout and stderror outputs from ReadsSelector/ABYSS are captured and put in respective text files in the folder containing the code.

ReferenceFree Code

Scripts for reference free genomic analysis

Branches

Tree [a30473] master / Download Snapshot History

Read Me

Tree [a30473] master /

History