File | Date | Author | Commit |
---|---|---|---|
app | 2019-02-07 | DNKonanov | [7c6d24] user-defined colors was added |
build | 2019-02-08 | DNKonanov | [55c2a8] submit form, limits |
source | 2019-02-01 | DNKonanov | [7ee31f] first commit |
.flaskenv | 2019-02-01 | DNKonanov | [7ee31f] first commit |
.gitignore | 2019-02-08 | DNKonanov | [99f60b] fix colors uploading |
LICENSE | 2019-02-01 | Dmitry N. Konanov | [9789e7] Initial commit |
README.md | 2019-02-05 | Dmitry N. Konanov | [70caa9] Update README.md |
add_names.py | 2019-02-01 | DNKonanov | [7ee31f] first commit |
dump_script.py | 2019-02-01 | DNKonanov | [7ee31f] first commit |
export_data.py | 2019-02-01 | DNKonanov | [7ee31f] first commit |
favicon.ico | 2019-02-01 | DNKonanov | [7ee31f] first commit |
full_container.py | 2019-02-07 | DNKonanov | [7c6d24] user-defined colors was added |
gcb_server.py | 2019-02-01 | DNKonanov | [7ee31f] first commit |
requirements.txt | 2019-02-01 | DNKonanov | [72cb03] add readme |
strains_decode.txt | 2019-02-01 | DNKonanov | [7ee31f] first commit |
Stand-alone version of the GCB-service
sudo apt-get install graphviz graphviz-dev python3-graphviz python3-pygraphviz
pip3 install -r requirements.txt
To start GCB server on your computer type in GCB_package folder this:
python3 gcb_server.py
So, you can open 127.0.0.1:5000 or localhost:5000 adress in your web-browser and use GCB
There is no pre-computed datasets in the github-version of the GCB_package. You can download version with pre-computed Escherichia coli dataset from SourceForge
The easiest way to add dataset is full_container.py
script
Just type:
python3 full_container.py -i /PATH_TO_ORTHOGROUPS_FILE/Orthogroups_file.txt -name NAME_WHAT_YOU_WANT
For exaple:
python3 full_container.py -i /home/user/data/Orthogroups_mycoplasma.txt -name mycoplasma_dataset1
This script consist of many steps and can be executed very long time
Parameters:
* -i
- input txt file with orthogroups, generated by Orthofinder (ver 2.2.6)
* -name
- name of output directory.
* --window
- size of window (default 20)
* --iterations
- number of iterations in probabilistic method (default 500)
It is possible to create output folder with data manually. Firstly, you need to create folder with name what you want.
Next, you should create graph structure with geneGraph
command-line tool. All geneGraph scripts are located in source
folder.
python3 orthofinder_parse.py -i PATH_TO_THE_ORTHOGROUPS_TXT_FILE -o PATH_TO_THE_CREATED_FOLDER/name
,
where name
is same as the folder name
For example:
mkdir ~/data/results/Mycoplasma
python3 orthofinder_parse.py -i ~/data/orthofinderResults/Mycoplasma/Orthgroups.txt -o ~/data/results/Mycoplasma/Mycoplasma
This command creates list of files: .sif file with graph structure, database .db file and others
Next, you can add complexity tables for genomes what you want by
python3 start_computing.py -i PATH_TO_SIF_FILE -o PATH_TO_ANY_OUTPUT_FOLDER --reference CODE_NAME_OF_REFERENCE_GENOME --save_db PATH_TO_THE_CREATED_DB [other parameters]
For example:
python3 start_computing.py -i ~/data/results/Mycoplasma/Mycoplasma.sif -o ~/data/complexity_results/Mycoplasma/GCF_000027345.1_ASM2734v1_genomic --reference GCF_000027345.1_ASM2734v1_genomic --save_db ~/data/results/Mycoplasma/Mycoplasma.db --window 50
If you dont need complexity data in txt format, you can delete ~/data/complexity_results/Mycoplasma/GCF_000027345.1_ASM2734v1_genomic
folder
After you add complexity values for all interesting genomes, you need to move or copy this folder to /GCB_package/data/
. OK, the organism is uploaded, but we need to dump graph objects for fast access to them.
To do it just go to GCB_folder and execute in terminal this:
python3 dump_script.py
The last step is adding of real genomes names to database (optional). There is strains_decode.txt
file in GCB_package folder, which contain all existing RefSeq genomes codes.
Type in terminal python3 add_names.py
and all available names will be added to db automatically.
If some your genomes codes is not in starins_decode.txt
, you can add them to this file or set values in db manually (table "genomes_table", column "genomes_name"). By default their names are set as 'none'.
So, if you update GCB_service web-page, you will see new organism in organisms list.
Web-service Genome Complexity Browser
Python module gene-graph-lib
Command-line tool geneGraph
Wll be added