Menu

Tree [9f01bb] master /
 History

HTTPS access


File Date Author Commit
 BGM_assembler 2022-06-01 Przemysław Stawczyk Przemysław Stawczyk [b219ee] update readmes
 analysis 2021-04-30 przestaw przestaw [3d0d4a] Create open repository
 compare 2021-04-30 przestaw przestaw [3d0d4a] Create open repository
 data 2021-04-30 przestaw przestaw [3d0d4a] Create open repository
 fasta 2021-04-30 przestaw przestaw [3d0d4a] Create open repository
 sandbox 2021-04-30 przestaw przestaw [3d0d4a] Create open repository
 valouev 2021-04-30 przestaw przestaw [3d0d4a] Create open repository
 .gitignore 2022-06-01 Przemysław Stawczyk Przemysław Stawczyk [b219ee] update readmes
 LICENSE 2021-04-30 przestaw przestaw [3d0d4a] Create open repository
 README.md 2022-06-01 Przemysław Stawczyk Przemysław Stawczyk [b219ee] update readmes

Read Me

Binary Genome Maps

This is repository containing a computer program to assembly Optical Mapping (OM) reads without reference genome.
In this algorithm we explore the possibility of using binary representation for genome maps.
We focused on the efficiency of data structures and algorithms, as well as the ability to scale on parallel platforms.
The algorithm consists of several steps, of which the most important are : (1) conversion of the restriction maps into binary strings, (2) detection of overlaps in set of restriction maps, (3) determining the layout of a restriction maps set, (4) creation of consensus genomic maps.

For more information see accompanying article at ...

To see license see LICENSE file

Directories :

  • BGM_assembler - contains BGM_utils program with C++ library, assembler and python utilities
  • fasta - place where analysed genome files are stored
  • data - contains raw data from analysis and test
  • analysis - contains IPython notebooks that were used to analyse genomes and create plots for papers
  • compare - contains necessary intermediate scripts to take output of valuev et. al. aligner into valuev et. al. assembler
  • sandbox - contains scripts used to amutomate some of the work and analysis
  • valuev - contains copy of repositories with source code of dynamic programing approach

How to get started:

  1. build the assembler tool:

    -> in BGM_assembler dir:

    ```
    $ mkdir build

    $ cd build

    $ cmake ..

    $ make -j 4 BGM-asm-program
    ```

    In case of problems make sure you have only ONE version of boost libraries installed on your system.
    Tool was tested build properly on various debian versions, ubuntu bionic (18.04) and focal (20.04).

  2. download example files:

    Datasets are avaiable on:
    https://sourceforge.net/projects/binary-genome-maps/files/datasets/

    For this step you can use e.g. this e.coli BspQI-Exp-150k 40.0 coverage dataset

  3. run the tool

    Place the inut in knonw location, for this example it was placed alongside program in /BGM_assembler/bin/ folder
    ```
    $ ls

    BGM-asm-program _bgm_module.so e.coli-enz-BspQI-cov-40.0-q-1000.binmaps libBGM-asm.a

    $ ./BGM-asm-program -i ./e.coli-enz-BspQI-cov-40.0-q-1000.binmaps -o ./my_result --margin_quant 10 --margin_part 0.65 --thresh 0.15 -f 4 -t 4
    ```

  4. Convert *.bnx file:

    To run scripts you may need to install some python libraries (they may have some additional dependencies):

    pip3 install bitarray numpy matplotlib

    Download example *.bnx file a.baumannii A22 bnx and use the bnx_convert.py script to convert it into fast01. It may take some time.

    ```
    $ cd ~/BGM_assembler/python

    $ python3 ./bnx_convert.py -f ./AB22.bnx -p ./
    ```

  5. Now you can use the assembler on the converted files. For further information on usage refer to pydoc manuals of BGM_assembler/python/bgm_util/ and BGM_assembler/README.md

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.