Hi Hugo, Our APAtrap pipeline does not currently support parallel computing. One way to speed up the process is, as you mentioned, to separate bedgraph files and gene model file based on their genomic positions, and process them separately and/or simultaneously. Since the result for each gene corresponds to only one line in the output file, you can use the "cat" command line tool to combine them. Best, Congting
Hi, I am trying to use your tools on a large set of samples (1000s of samples from public datasets). With such a large number of samples, the identifyUTR step is predicted to take ~20 days to run on a large HPC and has so far consumed ~400GB of RAM. This is not feasible to run on our HPC (we have a job time limit of up to 7 days). And in any case, it seems like this should be possible to run more efficiently by paralellising some of the steps. I'm not fluent in PERL and so cannot quite figure out...
Hi Marie, It's clear that you used an incorrect gene model file, the gene model file should be in bed12 format. You could go to this link http://genome.ucsc.edu/FAQ/FAQformat#format1manual for details, and go to the user manual page to see how to make a suitble gene model file from GTF file of genome annotation. Congting Ye
Hello, First, thank you a lot for your APAtrap tool, the applications are very interesting. However I have trouble running the first step: identifyDistal3UTR. I try to run it on my whole dataset (54 files, 18 conditions with 3 replicates each) and at the end of the UTR identification part I get this error message: UTR identification [| ] 0% doneModification of non-creatable array value attempted, subscript -1 at script/identifyDistal3UTR.pl line 420. I found a user with similar trouble on this forum,...
Hi Andrew, It would be better that you could send me the ICF.APA.txt file for figuring out the problem. Congting
Hi, I'm trying to run APA trap on some preliminary data and I've run into an error when I get to deAPA. When I try to run deAPA with my output from Predict APA, I get this error, which I attached a picture of. I was able to run the test data successfully so I'm wondering if there is a problem with my output file. I'll also attach a picture of that. Not sure if anyone has run into this problem before but any help would be much appreciated. Thanks in advance, Andrew
User Manual
Hello Dr. Ye, I didn't get how to treat several groups with biological duplications. I got 32 samples, including 5 healthy controls, 6 DieaseⅠ and 21 Dieas Ⅱ samples. According to your manual(predictAPA -i Sample1.bedgraph Sample2.bedgraph -g 2 -n 1 1 -u hg19.utr.bed -o output.txt), I run this: predictAPA -i Sample1.bedgraph Sample2.bedgraph …… Sample32.bedgraph -g 3 -n 5 6 21 -u hg19.utr.bed -o output.txt The bedgraph files were ordred by sample types, but I'm not sure if the script was right, and...
Hello, Just to add to this discussion, these type of errors tend to occur when your bedgraph file is not properly sorted. For example: chr1 100 200 5 chr1 90 180 6 Will generate this type of error because chr1 is not correctly sorted. EDIT: Looking at your bedgraph file - I can see this is not correctly sorted: chr4 165341916 165342067 42 chr4 165341760 165341911 42 For the same sequence feature , the first row end position can not be greater than the second row start position - as this would imply...
Hello, Just to add to this discussion, these type of errors tend to occur when your bedgraph file is not properly sorted. For example: chr1 100 200 5 chr1 90 180 6 Will generate this type of error because chr1 is not correctly sorted. EDIT: Looking at your bedgraph file - I can see this is not correctly sorted: chr4 165341916 165342067 42 chr4 165341760 165341911 42 For the same sequence feature , the first row end position can not be greater than the second row start position - as this would imply...
Hello, Just to add to this discussion, these type of errors tend to occur when your bedgraph file is not properly sorted. For example: chr1 100 200 5 chr1 90 180 6 Will generate this type of error because chr1 is not correctly sorted. Many thanks, Thomas
Hello, Just to add to this discussion, these type of errors tend to occur when your bedgraph file is not properly sorted. For example: chr1 100 200 5 chr1 90 180 6 Will generate this type of error because chr1 is not correctly sorted. Many thanks, Thomas
Hello, Just to add to this discussion, these type of errors tend to occur when your bedgraph file is not properly sorted. For example: chr1 100 200 5 chr1 90 180 6 Will generate this type of error because chr1 is not correctly sorted. Many thanks, Thomas
Hello Conting, Apologies for the late reply. Thank you, that is very much appreciated. Best, Thomas
This is a normal prompt, indicating there is a region with noncontinuous coverage of reads.
Frequent alerts "rescan the region." while running identifyDistal3UTR. Does this prompt prove that there is a problem with my input file, or does it have an effect on my results?
Hi Thomas, Thanks for your suggestions, I have added a sentence in the user mannual and ReadMe.md to specify that APAtrap follows specifications on website (http://creativecommons.org/licenses/by-nc-sa/3.0/). Hope this works! Best, Congting
User Manual
User Manual
Dear Dr. Ye, I hope you are well. One slight issue with the APAtrap tool is that I currently cannot find a license, which instructs people precisely what they are allowed and not allowed to do with the source code and binaries associated with the tool. It would help people who may want to use the APAtrap tool as a dependency in downstream applications, if they knew how they are permitted to use the tool. We have currently released the FilTar tool (https://academic.oup.com/bioinformatics/article/36/8/2410/5701647)...
Dear Dr. Ye, I hope you are well. One slight issue with the APAtrap tool is that I currently cannot find a license, which instructs people precisely what they are allowed and not allowed to do with the source code and binaries associated with the tool. It would help people who may want to use the APAtrap tool as a dependency in downstream applications, if they knew how they are permitted to use the tool. We have currently released the FilTar tool (https://academic.oup.com/bioinformatics/article/36/8/2410/5701647)...
Dear Dr. Ye, I hope you are well. One slight issue with the APAtrap tool is that I currently cannot find a license, which instructs people precisely what they are allowed and not allowed to do with the source code and binaries associated with the tool. It would help people who may want to use the APAtrap tool as a dependency in downstream applications, if they knew how they are permitted to use the tool. We have currently released the FilTar tool (https://academic.oup.com/bioinformatics/article/36/8/2410/5701647)...
User Manual
Hi Yue, It seems that the program cannot locate the file 'hg19.utr.bed' in the working directory 'C:\Users\lenovo\Documents'. Please make sure the file name and location you typed are correct. Is there any special character in your file name, such as space? Congting Ye
I don't know how to fix that.
Hi Dr. Ye, I got a problem when I run your sample data at step 3.2. C:\Users\lenovo\Documents>predictAPA -i Sample1.bedgraph Sample2.bedgraph -g 2 -n 1 1 -u hg19.utr.bed -o output.txt Error: (-u) 3'UTR annotation file does not exist! at script/predictAPA.pl line 64. I check perdictAPA.pl file and found that the line 64 is not die "Error: (-u) 3'UTR annotation file does not exist!" if !(-e $utrAnnoFile); I have attached the screen shot about that. Regards, Yue Li
Hi Yao-Chung, I have run the APAtrap with the annotation bed file downloaded from the above link and the test data (https://sourceforge.net/projects/apatrap/files/Test_Data.zip/download), and find out it could run successfully, could you check if there is anything wrong with your bedgraph file? The source code of APAtrap could be find at this website (https://sourceforge.net/projects/apatrap/files/Source%20Codes/). Congting Ye
Hello Dr. Ye, Thank you for your suggestion. I download the bed file from UCSC table browser which you provided. However, as I re-run the idenfigyDistal3UTR, I recieve this message in my terminal: UTR identification [| ] 1% doneModification of non-creatable array value attempted, subscript -18459924 at script/identifyDistal3UTR.pl line 424. The first four rows of the hg38 annotation bed file I download chr1 201283451 201332993 NM_000299 0 + 201283702 201328836 0 15 453,104,395,145,208,178,63,115,156,177,154,187,85,107,2920,...
Hello Dr. Ye, Thank you for your suggestion. I download the bed file from UCSC table browser which you provided. However, as I re-run the idenfigyDistal3UTR, I recieve this message in my terminal: UTR identification [| ] 1% doneModification of non-creatable array value attempted, subscript -18459924 at script/identifyDistal3UTR.pl line 424. The first four rows of the hg38 annotation bed file I download chr1 201283451 201332993 NM_000299 0 + 201283702 201328836 0 15 453,104,395,145,208,178,63,115,156,177,154,187,85,107,2920,...
Hello Dr. Ye, Thank you for your suggestion. I download the bed file from UCSC table browser which you provided. However, as I re-run the idenfigyDistal3UTR, I recieve this message in my terminal: UTR identification [| ] 1% doneModification of non-creatable array value attempted, subscript -18459924 at script/identifyDistal3UTR.pl line 424. The annotation bed file I download chr1 201283451 201332993 NM_000299 0 + 201283702 201328836 0 15 453,104,395,145,208,178,63,115,156,177,154,187,85,107,2920,...
Hi Yao-Chung, You used an incorrect annotation bed file in identifyDistal3UTR, you could find a correct annotation bed file from this link http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=787774005_ABQYAdCOlXgJmK8wIcJqn5Hwc2ad&clade=mammal&org=Human&db=hg38&hgta_group=genes&hgta_track=refSeqComposite&hgta_table=refGene&hgta_regionType=genome&position=chr1%3A11%2C102%2C837-11%2C267%2C747&hgta_outputType=bed&hgta_outFileName=test.txt Congting Ye
Hello Dr. Ye, I have installed apatrap to analyze 3'utr in human caner cell line (control vs. treatment), and I run identifyDistal3UTR and predictAPA by writing a bash script and run it using tmux. However, I cannot generated utr.bed file from using identifyDistal3UTR. Therefore, after using predictAPA, it generates an empty file. Can you help me or give me some ideas? Thank you. Yao-Chung Treatment bedgraph (base) urameshi@avis:~/projects/3utr/HFN53$ head -n 5 HFN53_treatment.bedgraph chr4 165341916...
User Manual
User Manual
User Manual
In case someone else comes across this issue, the problem seems to be that I had an underscore in all of my chromosome identifiers. Conting replied to an email and said "Any annotation with a chromosome id containing underline symbol '_' will be discarded for further analysis in our design, which is used to filter out some special annotations. " The code is now running, and while I haven't evaluated the output yet, I'm optimistic.
Hi Peter, It would be great that you could send me the annotation and coverage files for figuring out the problem. My email address is yec@xmu.edu.cn. Congting Ye
Hi Dr Ye, I'm trying to use APAtrap on some fly data that I have, but I suspect that I have a subtly wrong annotation file(?), because the output file ref.novel.utr.bed is totally blank: $ head -n 5 melsim/Reference/ucsc_annotation.bed dmel_4 251355 266500 CG1674-RB 0 + 252579 266389 0 11 166,43,570,81,81,81,291,85,320,115,695, 0,1205,1549,3535,4134,5665,6539,9584,12536,12904,14450, dmel_4 252055 266500 CG1674-RC 0 + 252579 266389 0 9 198,43,81,81,291,85,320,115,695, 0,505,2835,4965,5839,8884,11836,12204,13750,...
Hi Nik dAK, (1) The input -u in step 3.2 refer to the output of 3.1. (2) The APAtrap does not support multithreading currently. One possible way to speed up your process is to divide you sample data by chromosome, and then run the divided data separately. Congting Ye
Dear Dr Ye, I have a questions regarding the step 3.2 (predictAPA). Does the input -u (3'UTR annotation file in bed format) which in our example is "hg19.utr.bed" refer to A) the output of 3.1 ("novel.utr.bed") or B) an independent file from UCSC (e.g. GENECODEv29, knownGene, outputformat BED, 3UTR exons)? I am currently running both approaches on deep data (2 samples, 11498841 and 3797657 bedgraph lines). The APA estimation is expected to run for 127 hours (A) and 11 hours (B). Further I am only...
Hi Thomas, It seems that you built your own gene model file from a gff/gtf annotation file, not downloaded it from the UCSC Table Browser. Could you please send me your gene model file and a part of your bedgraph file witch covers the regions of genes you mentioned, it will be helpful for me to figure out the problem (I assumed the error was caused by a wrong gene model file). Best, Congting Ye
When I executed the same job with all scaffold bed records deleted, then the job finished successfully without any errors.
If I delete the ENST00000601199 record from my bed file, I get an error at another annptated transcript for another scaffold. This time: KI270726.1 26240 26534 ENST00000619729 0 + 26240 26534 0 1 294, 0, The APAtrap output is as follows (with selected variables printed): ENST00000619729|NA|KI270726.1|+ KI270726.1 26241 36534 26241 26534 + extracted coverage: 0 0.25 0 0.25 0.25 0.5 0.25 0 0.25 0 0.25 0 0.25 0.5 0.25 0 extracted utr region: 26241 34043 34091 34266 34314 15070 15110 15118 15227 15275...
ENST00000601199 also happens to be the very last transcript recorded in the following bed file: "Homo_sapiens.GRCh38.92.bed" I am not sure if this is relevant for debugging or not?
Dear Dr Ye, It is me again. I have been using APAtrap quite a lot recently, and came up with a few error messages on a few bedgraphs: The identifyDistal3UTR.pl script completes the utr coverage section without any problems, but generates an error during the UTR identification process. I don't yet fully understand how the APAtrap algorithm and code works, but the problem seems to be that the algorithm is trying to conduct some processing beyond the length of a given chromosome/scaffold. In particular,...
User Manual
Hi Thomas, Thanks for your interest in APAtrap! For we often generate several biological replicates of a treatment/condition in experiments, the 'number of groups' in APAtrap is used to represent 'the number of treatments/conditions' in one study. You can use the '–g' parameter to set the number of groups (treatment/condition) of your input files, and use the '–n' parameter to set the number of biological replicates in each group. For example, -g 3 -n 1 2 3 Indicates there are 6 input files divided...
Dear Mr. Ye, Firstly, I would like to thank you for the great tool. Secondly, I would just like to ask for some clarification on some terminology used in the documentation regarding the predictAPA tool: You write the following: “ -g number of groups in the input files, e.g. -g 2. -n number of files(duplications) in each group, e.g. -n 1 1. “ So does the -g option refer to number of samples/biological replicates? And -n refers to the number of technical replicates per sample/biological replicate?...
Hi Haifeng, There is no error in the gene model file 'hg19.genemodel.bed'. The 'NA' is caused from that you did not provide a gene symbol file in the step of identifyDistal3UTR using parameter '-s'. The content of a gene symbol file should be like as follows, gene_id gene_name NM_001198993 NADK NM_017900 AURKAIP1 NM_001014980 FAM132A NM_001198994 NADK NM_003820 TNFRSF14 NM_003036 SKI NM_001242672 TTC34 NM_152492 CCDC27 Thanks for your interest in APAtrap! Congting Ye
Gene NM_003641|NA|chr11|+ NM_002391|NA|chr11|+ NM_001184740|NA|chr11|+ NM_001134774|NA|chr11|+ NM_001134775|NA|chr11|+ Dear: The result is as beyond , use your example data is also the result . But i want to have 'NA' become ”genename“ such as Ythdc2 and so on. I think it's the "hg19.genemodel.bed" have some error ,but i don't know haow to get the right .bed file ? Thanks very much if u can reply me ! Haifeng Sun Email:haifeng4432@gmail.com Nanjing Medical University
User Manual
User Manual
User Manual
User Manual
User Manual
User Manual
User Manual
User Manual
User Manual