Name | Modified | Size | Downloads / Week |
---|---|---|---|
example.results | 2017-08-14 | ||
README.md | 2017-08-15 | 5.1 kB | |
MethyMer.example.results.tar.gz | 2017-08-14 | 536.5 kB | |
Totals: 3 Items | 541.5 kB | 0 |
MethyMer. Quick Start Guide
Design of combinations of specific primers for bisulfite sequencing of complete CpG islands
Short Description
MethyMer is a Python-based tool aimed at selecting primers for amplification of complete CpG islands. These regions are difficult in terms of selecting appropriate primers because of their low-complexity, polyN- and CG-richness. Moreover, bisulfite treatment in fact leads to the reduction of 4-letter alphabet (ATGC) to 3-letter one (ATG, except for methylated cytosines) and this also reduces region complexity and increases mispriming potential.
-
picking up optimal combination of PCR primer pairs with minimal overlap in order to perform amplification of large genomic loci, e.g. CpG islands
-
flexible and tweakable scoring system to optimize the balance between various characteristics such as nucleotide composition, thermodynamic features, presence of CpG sites in primers, and other parameters
-
primer specificity test based on bowtie alignment
-
integrated ENCODE genome annotation data (promoter/enhancer/insulator, etc.)
-
integrated CpG methylation data for 20 tumor types derived from The Cancer Genome Atlas (TCGA)
-
integrated TCGA RNA-Seq – CpG methylation associations study data
-
visualization: standard plots (CG content, score plot, specificity), ENCODE genome annotation, TCGA data
Prerequisites
MethyMer requires Python 3.5 or later with the following packages:
- BioPython
- Numpy
- SciPy
- Python Imaging Library (PIL)
- openpyxl
To install these dependencies, type
pip install numpy openpyxl biopython Pillow scipy
Or install SciPy from GitHub
pip install git+http://github.com/scipy/scipy/
If you have both Pythons installed and want to install this for Python3
python3 -m pip install numpy openpyxl biopython Pillow
python3 -m pip install git+http://github.com/scipy/scipy/
Alternatively, download Anaconda3 python with all required packages and lots of other useful stuff
Startup
MethyMer stores all parameters in the Excel workbook called PrimerParameters.xlsx.
1. Adjust scoring parameters or accept them by default
Scoring of component is described with three parameters:
Desired value - values that fit this requirement will not bring any penalty Limit value - if this requirement is violated, primer/pair is completely discarded Penalty - if a parameter fits Limit value but do not passes Desired threshold, a penalty will be calculated with the following way.
If Primer desired max. length = 26, Primer limit max. length = 33, Primer max. length penalty = 2, but primer real length is equal to 29 this penalty component will be calculated as:
penalty = (29-26) x 2 = 6
In the same way, all other parameters
2. Pay a special attention to the amplicon length parameter
This is critical aspect that strongly depends on your bisulfite conversion protocol. Typically, maximum acceptable amplicon length vary from 200 to 500 bp. Usage of excessive amplicon length will result to PCR failure. Thus, set appropriate desired and limit amplicon length.
3. Download genome sequence file
Human Genome assembly hg19 is preferred to hg38 since ENCODE genome annotation and TCGA CpG methylation are only available for hg19 assembly. The latest Ensembl hg19 release is 75 and can be downloaded here:
wget ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
Then fill Genome File Name field in PrimerParameters.xlsx with the appropriate file name
4. Specify genomic regions of interest
Look at sheet "target regions" in PrimerParameters.xlsx workbook and specify your genomic regions. You can lookup up them in UCSC Genome Browser. Don't forget to check genome assembly version (hg19/hg38)
5. Run MethyMer
Go to program folder and just type
python3 MethyMer.py
MethyMer will read parameters in PrimerParameters.xlsx workbook and begin primer picking. MethyMer workflow is split to 3 phases: picking primers and pairs, specificity test, and identifying optimal combination of pairs. After each step, MethyMer stores the derived results on the disk (in ./MethyMer_DB folder by default). Next time, the process of analysis will be started from the last incomplete step.
A single HTML report will be generated for all target loci (named like "Run_2017-08-13__UTC.09-07-35.html")
Typically, analysis of one locus (~2-3 kb length) without specificity test requires about 30..60 minutes. Specificity test may increase the analysis process to 1-3 hours.