Hi everyone,
I have 64 paired-end RNAseq samples (human). I used cuffmerge to obtain a merged.gtf file but there are 3700 entries in my merged.gtf file that contain '.' in the strand field instead of '+' or '-'. I checked these entries and each of these have a class-code "u" which stands for "unknown".
I used the merged.gtf file as input in my htseq-count command to get the counts (I eventually plan to do a differential gene expression analysis). However, I got the following error:
Error occured when processing GFF file (line 360833 of file ./merged_asm/merged.gtf):
Feature XLOC_003190 at chr1:[1285003,1285358)/. does not have strand information but you are running htseq-count in stranded mode. Use '--stranded=no'.
[Exception type: SystemExit, raised in count.py:59]
I know the error is due to missing strand information. The only options that I can think of are:
1) Use --stranded=no
2) Remove all the entries in the merged.gtf corresponding to missing strand information
Does anybody have a suggestion? Do you think removing entries with missing information is better than using --stranded=no? I am afraid that using --stranded=no will affect the resulting counts (may increase the number of ambiguous counts when there is an overlap between exons/genes).
Thanks,
Komal S Rathi
Bioinformatics Application Developer
Perelman School of Medicine
University of Pennsylvania
|