Menu

Tree [55c2a8] master /
 History

HTTPS access


File Date Author Commit
 app 2019-02-07 DNKonanov DNKonanov [7c6d24] user-defined colors was added
 build 2019-02-08 DNKonanov DNKonanov [55c2a8] submit form, limits
 source 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit
 .flaskenv 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit
 .gitignore 2019-02-08 DNKonanov DNKonanov [99f60b] fix colors uploading
 LICENSE 2019-02-01 Dmitry N. Konanov Dmitry N. Konanov [9789e7] Initial commit
 README.md 2019-02-05 Dmitry N. Konanov Dmitry N. Konanov [70caa9] Update README.md
 add_names.py 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit
 dump_script.py 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit
 export_data.py 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit
 favicon.ico 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit
 full_container.py 2019-02-07 DNKonanov DNKonanov [7c6d24] user-defined colors was added
 gcb_server.py 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit
 requirements.txt 2019-02-01 DNKonanov DNKonanov [72cb03] add readme
 strains_decode.txt 2019-02-01 DNKonanov DNKonanov [7ee31f] first commit

Read Me

GCB_package

Stand-alone version of the GCB-service

Installation manual

Install dependencies

sudo apt-get install graphviz graphviz-dev python3-graphviz python3-pygraphviz

pip3 install -r requirements.txt

Usage

Start server

To start GCB server on your computer type in GCB_package folder this:

python3 gcb_server.py

So, you can open 127.0.0.1:5000 or localhost:5000 adress in your web-browser and use GCB
There is no pre-computed datasets in the github-version of the GCB_package. You can download version with pre-computed Escherichia coli dataset from SourceForge

Add data

The easiest way to add dataset is full_container.py script

Just type:

python3 full_container.py -i /PATH_TO_ORTHOGROUPS_FILE/Orthogroups_file.txt -name NAME_WHAT_YOU_WANT

For exaple:

python3 full_container.py -i /home/user/data/Orthogroups_mycoplasma.txt -name mycoplasma_dataset1

This script consist of many steps and can be executed very long time

Parameters:
* -i - input txt file with orthogroups, generated by Orthofinder (ver 2.2.6)
* -name - name of output directory.
* --window - size of window (default 20)
* --iterations - number of iterations in probabilistic method (default 500)

Advanced

It is possible to create output folder with data manually. Firstly, you need to create folder with name what you want.
Next, you should create graph structure with geneGraph command-line tool. All geneGraph scripts are located in source folder.

python3 orthofinder_parse.py -i PATH_TO_THE_ORTHOGROUPS_TXT_FILE -o PATH_TO_THE_CREATED_FOLDER/name,

where name is same as the folder name

For example:

mkdir ~/data/results/Mycoplasma

python3 orthofinder_parse.py -i ~/data/orthofinderResults/Mycoplasma/Orthgroups.txt -o ~/data/results/Mycoplasma/Mycoplasma

This command creates list of files: .sif file with graph structure, database .db file and others

Next, you can add complexity tables for genomes what you want by

python3 start_computing.py -i PATH_TO_SIF_FILE -o PATH_TO_ANY_OUTPUT_FOLDER --reference CODE_NAME_OF_REFERENCE_GENOME --save_db PATH_TO_THE_CREATED_DB [other parameters]

For example:

python3 start_computing.py -i ~/data/results/Mycoplasma/Mycoplasma.sif -o ~/data/complexity_results/Mycoplasma/GCF_000027345.1_ASM2734v1_genomic --reference GCF_000027345.1_ASM2734v1_genomic --save_db ~/data/results/Mycoplasma/Mycoplasma.db --window 50

If you dont need complexity data in txt format, you can delete ~/data/complexity_results/Mycoplasma/GCF_000027345.1_ASM2734v1_genomic folder

After you add complexity values for all interesting genomes, you need to move or copy this folder to /GCB_package/data/. OK, the organism is uploaded, but we need to dump graph objects for fast access to them.
To do it just go to GCB_folder and execute in terminal this:

python3 dump_script.py

The last step is adding of real genomes names to database (optional). There is strains_decode.txt file in GCB_package folder, which contain all existing RefSeq genomes codes.
Type in terminal python3 add_names.py and all available names will be added to db automatically.
If some your genomes codes is not in starins_decode.txt, you can add them to this file or set values in db manually (table "genomes_table", column "genomes_name"). By default their names are set as 'none'.

So, if you update GCB_service web-page, you will see new organism in organisms list.

Web-service Genome Complexity Browser

Python module gene-graph-lib

Command-line tool geneGraph

References

Wll be added