README GECCO Genomic Classification of CNVS Objectively Version 2.0 Genomic Classification of CNVS Objectively (GECCO) is intended to be used for proritizing genomic copy number variants found in individuals with developmental delays to distinguish clinically relevant aberrations from normal / benign variants. GeCCO is release under a GNU GPLv3 licence. Pre-requisites ============== GeCCO is a Java based program, and requires a JRE of 1.5 or higher to be installed. Installion ============ To install GeCCO. 1. Download the file GeCCO_2.0.zip 2. Unzip the file GeCCO_2.0.zip 3. Update <GECCO_INSTALL>/bin/configfile.txt, the "DefaultPath" should be changed to the install location Run GeCCO ========== To run GeCCO, execute the file <GECCO_INSTALL>/bin/CNVClassifierGUI.bat (e.g. by double clicking on the file) Input Files ============ Affymetrix Segment Files (cn_segments) - these files are created by Affymetrix software packages such as Genotyping Console (GTC), or Chromosome Analysis Suite (ChAS). The following columns are listed in this file: Sample - Sample ID Copy Number State - HMM value Loss/Gain - either "Loss" or "Gain" depending on the Copy number state Chr - Chromosome Cytoband_Start_Pos - Cytoband where the CNV starts Cytoband_End_Pos - Cytoband where the CNV ends Size(kb) - size of the CNV in kilo bases #Markers - Number of probes contained in the CNV Avg_DistBetweenMarkers(kb) - average distance between these probes %CNV_Overlap - with the DGV Start_Linear_Pos - start position of the CNV in base pairs End_Linear_Position - end position of the CNV in base pairs Start_Marker - start probe of the CNV End_Marker - end probe of the CNV CNV_Annotation - additionally annotation information (based on DGV) (For this file type a CNV ID will be automatically generated, and the CNV status will be assumed to be equal to "?") Plain Text Aberration Files - is a tab separated file containing 7 columns, for which the column headers are mandatory cnvID: a unique identifier for the CNV Chr: the chromosome where the CNV occurs Start Position: the base pair start position of the CNV End Position: the base pair end position of the CNV Genomic Size: the length of the CNV in base pairs CNV Type: either 'Loss' or 'Gain' for deletion or duplication events Status (?): ? indicates that the classifier needs to predict the CNV For example: Case Chr Start Position End Position Genomic Size CNV Type Status (?) 1 4 93769725 99612792 5843067 loss ? Output File ============= In the output file the classification result of each CNV will be written with supporting data. The initial characterstics of the CNV, the features contained in the CNV, and classification results are written. Specifically the following columns: CNVid - the CNV id (from the input file) ChrNr - Chromosome of the CNV StartPos - CNV start position EndPos - CNV end position Length - CNV length Sort - Either Loss or Gain NumberOfLINE - NUmber of LINE elements contained in the CNV LINEDensity - a scaled density value of LINE elements in the CNV NumberOfSINE - Number of SINE elements contained in the CNV SINEDensity - a scaled density value of SINE elements in the CNV NumberOfSegDup - Number of Segmental Duplication elements in the CNV SegDupDensity - a scaled density value of Segmental Duplication elements in the CNV NumberOfGenes - Number of Genes (UCSC) contained in the CNV GeneDensity - a scaled density value of genes in the CNV InPathway - True if the CNV contained a gene contained the configured KEGG pathway (default Pathway used is the neuro degenerative pathway hsa01510 InPhenotypes - True if the CNV contains gene(s) whose mouse orthologues when knocked out result in a nervous system phenotype (MP:MP:0003631) ExpressionGenes - EvolutionDn - the mean EvolutionDs - the mean evolutionKDs/Dn - the mean MR Distance - the distance that the classifier deems the CNV to be from the MR-assocaited CNV class nonMR Distance - the distance that the classifier deems the CNV to be from the benign CNV class Yes=MR - Yes if the CNV is classified as assocaited to MR (when the MR distance is > 0.5) Configuration ============== In the configuration file (configfile.txt) the following values can be changed Installation path --------------------- DefaultPath = D:\\GeCCO_2.0.0\\ UCSC Connection ----------------- UCSChost = jdbc:mysql://genome-mysql.cse.ucsc.edu:3306/hg19 UCSCuser = genome UCSCpassword = UCSCdriver = com.mysql.jdbc.Driver UCSC Using the internal cache instead of connecting to UCSC, set UCSCcache = true Genome Build ---------------- GenomeBuild = hg19 Note: only builds hg18 and hg19 are currently supported Removing GECCO ============== To remove GECCO from your system, simply delete the directory containing the program. This will typically be called "GECCO_"version. The program is completely self contained within this directory. FAQ ==== Q1. The batch of CNVs didn't process, with the error message "Error: Communications link failure due to underlying exception". What does this mean? A1. The most common cause of this error, is that GECCO was unable to create a connection to the UCSC genome browser database. Check that the UCSC genome browser is accessable and confirm the connection details in the configuration file. See configuring the UCSC connection. Q2. My CNVs are mapped to genome build hg 18, will that cause a problem? A2. The genome build that GECCO uses can be configured in the configuration file, and is limited by the builds supported by UCSC. Q3. Why is GECCO taking so long? A3. For GECCO to be able to classify the CNVs, he first needs to annotate them with structural and functional features. For the annotation step a connection is made to the UCSC genome browsers databases and the features downloaded. The time taken to perform this step depends on the number of CNVs that you wish to classify and the length of the CNVs. Alternatively if you have a local copy of the UCSC database, GECCO can be configured to use this instead. In particular the tables, gene, rmsk and genomicSuperDups are used.
Source: README.txt, updated 2013-07-09

