Genomic Classification of CNVS Objectively
Genomic Classification of CNVS Objectively (GECCO) is intended to be used for proritizing
genomic copy number variants found in individuals with developmental delays to distinguish
clinically relevant aberrations from normal / benign variants.
GeCCO is release under a GNU GPLv3 licence.
GeCCO is a Java based program, and requires a JRE of 1.5 or higher to be installed.
To install GeCCO.
1. Download the file GeCCO_2.0.zip
2. Unzip the file GeCCO_2.0.zip
3. Update <GECCO_INSTALL>/bin/configfile.txt, the "DefaultPath" should be changed to the install location
To run GeCCO, execute the file <GECCO_INSTALL>/bin/CNVClassifierGUI.bat
(e.g. by double clicking on the file)
Affymetrix Segment Files (cn_segments) - these files are created by Affymetrix software packages
such as Genotyping Console (GTC), or Chromosome Analysis Suite (ChAS).
The following columns are listed in this file:
Sample - Sample ID
Copy Number State - HMM value
Loss/Gain - either "Loss" or "Gain" depending on the Copy number state
Chr - Chromosome
Cytoband_Start_Pos - Cytoband where the CNV starts
Cytoband_End_Pos - Cytoband where the CNV ends
Size(kb) - size of the CNV in kilo bases
#Markers - Number of probes contained in the CNV
Avg_DistBetweenMarkers(kb) - average distance between these probes
%CNV_Overlap - with the DGV
Start_Linear_Pos - start position of the CNV in base pairs
End_Linear_Position - end position of the CNV in base pairs
Start_Marker - start probe of the CNV
End_Marker - end probe of the CNV
CNV_Annotation - additionally annotation information (based on DGV)
(For this file type a CNV ID will be automatically generated, and the CNV status will be assumed
to be equal to "?")
Plain Text Aberration Files - is a tab separated file containing 7 columns, for which the column headers are mandatory
cnvID: a unique identifier for the CNV
Chr: the chromosome where the CNV occurs
Start Position: the base pair start position of the CNV
End Position: the base pair end position of the CNV
Genomic Size: the length of the CNV in base pairs
CNV Type: either 'Loss' or 'Gain' for deletion or duplication events
Status (?): ? indicates that the classifier needs to predict the CNV
Case Chr Start Position End Position Genomic Size CNV Type Status (?)
1 4 93769725 99612792 5843067 loss ?
In the output file the classification result of each CNV will be written with supporting data.
The initial characterstics of the CNV, the features contained in the CNV, and classification results are written.
Specifically the following columns:
CNVid - the CNV id (from the input file)
ChrNr - Chromosome of the CNV
StartPos - CNV start position
EndPos - CNV end position
Length - CNV length
Sort - Either Loss or Gain
NumberOfLINE - NUmber of LINE elements contained in the CNV
LINEDensity - a scaled density value of LINE elements in the CNV
NumberOfSINE - Number of SINE elements contained in the CNV
SINEDensity - a scaled density value of SINE elements in the CNV
NumberOfSegDup - Number of Segmental Duplication elements in the CNV
SegDupDensity - a scaled density value of Segmental Duplication elements in the CNV
NumberOfGenes - Number of Genes (UCSC) contained in the CNV
GeneDensity - a scaled density value of genes in the CNV
InPathway - True if the CNV contained a gene contained the configured KEGG pathway (default Pathway used is
the neuro degenerative pathway hsa01510
InPhenotypes - True if the CNV contains gene(s) whose mouse orthologues when knocked out result in a nervous system phenotype (MP:MP:0003631)
EvolutionDn - the mean
EvolutionDs - the mean
evolutionKDs/Dn - the mean
MR Distance - the distance that the classifier deems the CNV to be from the MR-assocaited CNV class
nonMR Distance - the distance that the classifier deems the CNV to be from the benign CNV class
Yes=MR - Yes if the CNV is classified as assocaited to MR (when the MR distance is > 0.5)
In the configuration file (configfile.txt) the following values can be changed
DefaultPath = D:\\GeCCO_2.0.0\\
UCSChost = jdbc:mysql://genome-mysql.cse.ucsc.edu:3306/hg19
UCSCuser = genome
UCSCdriver = com.mysql.jdbc.Driver
UCSC Using the internal cache instead of connecting to UCSC, set
UCSCcache = true
GenomeBuild = hg19
Note: only builds hg18 and hg19 are currently supported
To remove GECCO from your system, simply delete the directory containing the program. This will typically be called "GECCO_"version.
The program is completely self contained within this directory.
Q1. The batch of CNVs didn't process, with the error message "Error: Communications link failure due to underlying exception". What does this mean?
A1. The most common cause of this error, is that GECCO was unable to create a connection to the UCSC genome browser database. Check that the UCSC genome browser is accessable and confirm the connection details in the configuration file. See configuring the UCSC connection.
Q2. My CNVs are mapped to genome build hg 18, will that cause a problem?
A2. The genome build that GECCO uses can be configured in the configuration file, and is limited by the builds supported by UCSC.
Q3. Why is GECCO taking so long?
A3. For GECCO to be able to classify the CNVs, he first needs to annotate them with structural and functional features. For the annotation step a connection is made to the UCSC genome browsers databases and the features downloaded. The time taken to perform this step depends on the number of CNVs that you wish to classify and the length of the CNVs. Alternatively if you have a local copy of the UCSC database, GECCO can be configured to use this instead. In particular the tables, gene, rmsk and genomicSuperDups are used.