It is me again. I have been using APAtrap quite a lot recently, and came up with a few error messages on a few bedgraphs:
The identifyDistal3UTR.pl script completes the utr coverage section without any problems, but generates an error during the UTR identification process.
I don't yet fully understand how the APAtrap algorithm and code works, but the problem seems to be that the algorithm is trying to conduct some processing beyond the length of a given chromosome/scaffold.
In particular, when running identifyDistal3UTR.pl I print the following variables: $curr_utr_event and @curr_utr_structure - for the particular problematic transcript, the following information is returned:
It seems that you built your own gene model file from a gff/gtf annotation file, not downloaded it from the UCSC Table Browser. Could you please send me your gene model file and a part of your bedgraph file witch covers the regions of genes you mentioned, it will be helpful for me to figure out the problem (I assumed the error was caused by a wrong gene model file).
Best,
Congting Ye
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Dr Ye,
It is me again. I have been using APAtrap quite a lot recently, and came up with a few error messages on a few bedgraphs:
The identifyDistal3UTR.pl script completes the utr coverage section without any problems, but generates an error during the UTR identification process.
I don't yet fully understand how the APAtrap algorithm and code works, but the problem seems to be that the algorithm is trying to conduct some processing beyond the length of a given chromosome/scaffold.
In particular, when running identifyDistal3UTR.pl I print the following variables: $curr_utr_event and @curr_utr_structure - for the particular problematic transcript, the following information is returned:
ENST00000601199|NA|KI270713.1|+
KI270713.1 35407 45916 35407 35916 +
This is for the GTF annotation from release 92 of ensembl. However, as I understand it, this scaffold only has a length of 40745.
The error generated is actually an indexing error:
'Modification of non-creatable array value attempted, subscript -33683 at exe/identifyDistal3UTR.pl line 424'
When we look at the genomic region by printing the $extracted_utr_region variable, we see the following:
"35407 39116 39164 39305 39353 1724 1854 1902 2825 2867 2873 2900 2901 2915 2927 2948 2949 2975 2982 2985 3018 3030 3033 3066 ...."
So the genomic/utr locus seems to be going backwards, which I think is what is generating the indexing error.
Best wishes,
Thomas
Hi Thomas,
It seems that you built your own gene model file from a gff/gtf annotation file, not downloaded it from the UCSC Table Browser. Could you please send me your gene model file and a part of your bedgraph file witch covers the regions of genes you mentioned, it will be helpful for me to figure out the problem (I assumed the error was caused by a wrong gene model file).
Best,
Congting Ye
ENST00000601199 also happens to be the very last transcript recorded in the following bed file:
"Homo_sapiens.GRCh38.92.bed"
I am not sure if this is relevant for debugging or not?
If I delete the ENST00000601199 record from my bed file, I get an error at another annptated transcript for another scaffold. This time:
KI270726.1 26240 26534 ENST00000619729 0 + 26240 26534 0 1 294, 0,
The APAtrap output is as follows (with selected variables printed):
ENST00000619729|NA|KI270726.1|+
KI270726.1 26241 36534 26241 26534 +
extracted coverage: 0 0.25 0 0.25 0.25 0.5 0.25 0 0.25 0 0.25 0 0.25 0.5 0.25 0
extracted utr region: 26241 34043 34091 34266 34314 15070 15110 15118 15227 15275 15288 15336 34733 34764 34781 34812 36534
relative start: 26241
When I executed the same job with all scaffold bed records deleted, then the job finished successfully without any errors.