This code is a solution to collapsing duplicate FPKMs for a gene.

Problem/Issue:
In the cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I get more than one row for the same gene? It's like in some cases the FPKM values from the transcripts corresponding to the same gene do not get summed, although the transcripts are assigned to the same gene.

Reasons and Solution:
The multiple FPKM problem occurs when genes have transcripts that do not overlap with any other transcripts in the gene. For example, this occurs in the ENSG00000125388 gene from ENSEMBL/hg19. We are aware of this issue and will eventually change the behavior, but for now a simple solution is just to sum the FPKMs since the gene FPKMs are just the sum of the transcript FPKMs anyways.

The details on this is discussed at http://seqanswers.com/forums/showthread.php?t=5224

Project Activity

See All Activity >

Follow CollapseFPKM

CollapseFPKM Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of CollapseFPKM!

Additional Project Details

Registered

2014-05-18