GCB_package Code

Stand-alone version of the Genome Complexity Browser

Status: Alpha

Brought to you by: dnkonanov

Tree [55c2a8] master /

History

HTTPS access

File	Date	Author	Commit
app	2019-02-07	DNKonanov	[7c6d24] user-defined colors was added
build	2019-02-08	DNKonanov	[55c2a8] submit form, limits
source	2019-02-01	DNKonanov	[7ee31f] first commit
.flaskenv	2019-02-01	DNKonanov	[7ee31f] first commit
.gitignore	2019-02-08	DNKonanov	[99f60b] fix colors uploading
LICENSE	2019-02-01	Dmitry N. Konanov	[9789e7] Initial commit
README.md	2019-02-05	Dmitry N. Konanov	[70caa9] Update README.md
add_names.py	2019-02-01	DNKonanov	[7ee31f] first commit
dump_script.py	2019-02-01	DNKonanov	[7ee31f] first commit
export_data.py	2019-02-01	DNKonanov	[7ee31f] first commit
favicon.ico	2019-02-01	DNKonanov	[7ee31f] first commit
full_container.py	2019-02-07	DNKonanov	[7c6d24] user-defined colors was added
gcb_server.py	2019-02-01	DNKonanov	[7ee31f] first commit
requirements.txt	2019-02-01	DNKonanov	[72cb03] add readme
strains_decode.txt	2019-02-01	DNKonanov	[7ee31f] first commit

Read Me

GCB_package

Stand-alone version of the GCB-service

Installation manual

Install dependencies

sudo apt-get install graphviz graphviz-dev python3-graphviz python3-pygraphviz

pip3 install -r requirements.txt

Usage

Start server

To start GCB server on your computer type in GCB_package folder this:

python3 gcb_server.py

So, you can open 127.0.0.1:5000 or localhost:5000 adress in your web-browser and use GCB
There is no pre-computed datasets in the github-version of the GCB_package. You can download version with pre-computed Escherichia coli dataset from SourceForge

Add data

The easiest way to add dataset is full_container.py script

Just type:

python3 full_container.py -i /PATH_TO_ORTHOGROUPS_FILE/Orthogroups_file.txt -name NAME_WHAT_YOU_WANT

For exaple:

python3 full_container.py -i /home/user/data/Orthogroups_mycoplasma.txt -name mycoplasma_dataset1

This script consist of many steps and can be executed very long time

Parameters:
* -i - input txt file with orthogroups, generated by Orthofinder (ver 2.2.6)
* -name - name of output directory.
* --window - size of window (default 20)
* --iterations - number of iterations in probabilistic method (default 500)

Advanced

It is possible to create output folder with data manually. Firstly, you need to create folder with name what you want.
Next, you should create graph structure with geneGraph command-line tool. All geneGraph scripts are located in source folder.

python3 orthofinder_parse.py -i PATH_TO_THE_ORTHOGROUPS_TXT_FILE -o PATH_TO_THE_CREATED_FOLDER/name,

where name is same as the folder name

For example:

mkdir ~/data/results/Mycoplasma

python3 orthofinder_parse.py -i ~/data/orthofinderResults/Mycoplasma/Orthgroups.txt -o ~/data/results/Mycoplasma/Mycoplasma

This command creates list of files: .sif file with graph structure, database .db file and others

Next, you can add complexity tables for genomes what you want by

python3 start_computing.py -i PATH_TO_SIF_FILE -o PATH_TO_ANY_OUTPUT_FOLDER --reference CODE_NAME_OF_REFERENCE_GENOME --save_db PATH_TO_THE_CREATED_DB [other parameters]

For example:

python3 start_computing.py -i ~/data/results/Mycoplasma/Mycoplasma.sif -o ~/data/complexity_results/Mycoplasma/GCF_000027345.1_ASM2734v1_genomic --reference GCF_000027345.1_ASM2734v1_genomic --save_db ~/data/results/Mycoplasma/Mycoplasma.db --window 50

If you dont need complexity data in txt format, you can delete ~/data/complexity_results/Mycoplasma/GCF_000027345.1_ASM2734v1_genomic folder

After you add complexity values for all interesting genomes, you need to move or copy this folder to /GCB_package/data/. OK, the organism is uploaded, but we need to dump graph objects for fast access to them.
To do it just go to GCB_folder and execute in terminal this:

python3 dump_script.py

The last step is adding of real genomes names to database (optional). There is strains_decode.txt file in GCB_package folder, which contain all existing RefSeq genomes codes.
Type in terminal python3 add_names.py and all available names will be added to db automatically.
If some your genomes codes is not in starins_decode.txt, you can add them to this file or set values in db manually (table "genomes_table", column "genomes_name"). By default their names are set as 'none'.

So, if you update GCB_service web-page, you will see new organism in organisms list.

Links

Web-service Genome Complexity Browser

Python module gene-graph-lib

Command-line tool geneGraph

References

Wll be added