A potential bug for mirpro_feature_pro. Here is the reason and my reasoning:
I have one sam file named "9.gsnap.sam", then, I used samtools to get its bam files: "9.gsnap.bam" and "9.gsnap.sorted.bam".
Here is what I did:
(1)
htseq-count -f bam -s no -a 10 9.gsnap.sorted.bam ../ensembl/release-81/Gallus_gallus.Galgal4.81.gtf > 9.ht_count
(2)
mirpro_feature_pro -i 9.gsnap.sam -f sam -s no -a 10 -t ../ensembl/release-81/Gallus_gallus.Galgal4.81.gtf -o 9.mirpro_count
(3)
mirpro_feature_pro -i 9.gsnap.bam -f bam -s no -a 10 -t ../ensembl/release-81/Gallus_gallus.Galgal4.81.gtf -o 9.mirpro_bam_count
(4)
mirpro_feature_pro -i 9.gsnap.sorted.bam -f bam -s no -a 10 -t ../ensembl/release-81/Gallus_gallus.Galgal4.81.gtf -o 9.mirpro_sortedbam_count
What is interesting is that the file I got from (1) is identical to all three files with the name "htseq_count_output" in all of these folders. However, the result.individual.csv and result.summary.csv are slight different. So, I think mipro_feature_pro has a bug somewhere. But, I am not sure where is the bug.
We found that some aligners (e.g., GSNAP) will tag a read with "NH:i:1" even this read has a transplicing mapping (e.g., half of the read is mapped to one locus whereas the second half is mapped to another locus). In mirpro_feature_pro of mirPRo, we will treat this read as aligned, but not uniquely mapped read. Based on HTSeq package, mirpro_feature_pro can be used for mapped read cataloging for both RNA-Seq data and small RNA-Seq data. We believe this is more accurate way for read cataloging.