Name Modified Size InfoDownloads / Week
Parent folder
CNVClassifier.jar 2014-04-10 259.9 kB
log4j-1.2.13.jar 2013-07-09 358.2 kB
mysql-connector-java-3.1.8-bin.jar 2013-07-09 409.2 kB
weka.jar 2013-07-09 5.1 MB
Totals: 4 Items   6.1 MB 0
                                README

		                GECCO 
               Genomic Classification of CNVS Objectively
                            Version 2.0


Genomic Classification of CNVS Objectively (GECCO) is intended to be used for proritizing
genomic copy number variants found in individuals with developmental delays to distinguish 
clinically relevant aberrations from normal / benign variants.

GeCCO is release under a GNU GPLv3 licence.

Pre-requisites
==============
GeCCO is a Java based program, and requires a JRE of 1.5 or higher to be installed.

Installion
============
To install GeCCO.
1. Download the file GeCCO_2.0.zip
2. Unzip the file GeCCO_2.0.zip
3. Update <GECCO_INSTALL>/bin/configfile.txt, the "DefaultPath" should be changed to the install location

Run GeCCO
==========
To run GeCCO, execute the file <GECCO_INSTALL>/bin/CNVClassifierGUI.bat
(e.g. by double clicking on the file)


Input Files
============

Affymetrix Segment Files (cn_segments) - these files are created by Affymetrix software packages
such as Genotyping Console (GTC), or Chromosome Analysis Suite (ChAS).
The following columns are listed in this file:
Sample	- Sample ID
Copy Number State	- HMM value
Loss/Gain	- either "Loss" or "Gain" depending on the Copy number state
Chr	- Chromosome
Cytoband_Start_Pos - Cytoband where the CNV starts	
Cytoband_End_Pos	- Cytoband where the CNV ends
Size(kb)	- size of the CNV in kilo bases
#Markers	- Number of probes contained in the CNV
Avg_DistBetweenMarkers(kb)	- average distance between these probes
%CNV_Overlap	- with the DGV
Start_Linear_Pos	- start position of the CNV in base pairs
End_Linear_Position	- end position of the CNV in base pairs
Start_Marker	- start probe of the CNV
End_Marker	- end probe of the CNV
CNV_Annotation	- additionally annotation information (based on DGV)
(For this file type a CNV ID will be automatically generated, and the CNV status will be assumed
to be equal to "?")


Plain Text Aberration Files - is a tab separated file containing 7 columns, for which the column headers are mandatory
cnvID: a unique identifier for the CNV
Chr: the chromosome where the CNV occurs
Start Position: the base pair start position of the CNV
End Position: the base pair end position of the CNV
Genomic Size: the length of the CNV in base pairs
CNV Type: either 'Loss' or 'Gain' for deletion or duplication events
Status (?): ? indicates that the classifier needs to predict the CNV

For example:
Case	Chr	Start Position	End Position	Genomic Size	CNV Type	Status (?)
1	4	93769725	99612792	5843067	loss	?


Output File
=============
In the output file the classification result of each CNV will be written with supporting data.
The initial characterstics of the CNV, the features contained in the CNV, and classification results are written.

Specifically the following columns:

CNVid	- the CNV id (from the input file)
ChrNr	- Chromosome of the CNV
StartPos	- CNV start position
EndPos		- CNV end position
Length		- CNV length
Sort		- Either Loss or Gain

NumberOfLINE	- NUmber of LINE elements contained in the CNV
LINEDensity	- a scaled density value of LINE elements in the CNV
NumberOfSINE	- Number of SINE elements contained in the CNV
SINEDensity	- a scaled density value of SINE elements in the CNV
NumberOfSegDup	- Number of Segmental Duplication elements in the CNV
SegDupDensity	- a scaled density value of Segmental Duplication elements in the CNV
NumberOfGenes	- Number of Genes (UCSC) contained in the CNV
GeneDensity	- a scaled density value of genes in the CNV
InPathway	- True if the CNV contained a gene contained the configured KEGG pathway (default Pathway used is 
		  the neuro degenerative pathway hsa01510
InPhenotypes	- True if the CNV contains gene(s) whose mouse orthologues when knocked out result in a nervous system phenotype (MP:MP:0003631)
ExpressionGenes	- 
EvolutionDn	- the mean
EvolutionDs	- the mean
evolutionKDs/Dn	- the mean

MR Distance	- the distance that the classifier deems the CNV to be from the MR-assocaited CNV class 
nonMR Distance	- the distance that the classifier deems the CNV to be from the benign CNV class
Yes=MR		- Yes if the CNV is classified as assocaited to MR (when the MR distance is > 0.5)



Configuration
==============

In the configuration file (configfile.txt) the following values can be changed

Installation path
---------------------
DefaultPath = D:\\GeCCO_2.0.0\\

UCSC Connection
-----------------
UCSChost = jdbc:mysql://genome-mysql.cse.ucsc.edu:3306/hg19
UCSCuser = genome
UCSCpassword = 
UCSCdriver = com.mysql.jdbc.Driver

UCSC Using the internal cache instead of connecting to UCSC, set
UCSCcache = true

Genome Build
----------------
GenomeBuild = hg19
Note: only builds hg18 and hg19 are currently supported

Removing GECCO
==============
To remove GECCO from your system, simply delete the directory containing the program. This will typically be called "GECCO_"version.
The program is completely self contained within this directory.

FAQ
====

Q1. The batch of CNVs didn't process, with the error message "Error: Communications link failure due to underlying exception". What does this mean?
A1. The most common cause of this error, is that GECCO was unable to create a connection to the UCSC genome browser database. Check that the UCSC genome browser is accessable and confirm the connection details in the configuration file. See configuring the UCSC connection.

Q2. My CNVs are mapped to genome build hg 18, will that cause a problem?
A2. The genome build that GECCO uses can be configured in the configuration file, and is limited by the builds supported by UCSC.

Q3. Why is GECCO taking so long?
A3. For GECCO to be able to classify the CNVs, he first needs to annotate them with structural and functional features. For the annotation step a connection is made to the UCSC genome browsers databases and the features downloaded. The time taken to perform this step depends on the number of CNVs that you wish to classify and the length of the CNVs. Alternatively if you have a local copy of the UCSC database, GECCO can be configured to use this instead. In particular the tables, gene, rmsk and genomicSuperDups are used.
Source: README.txt, updated 2013-07-09