Thanks Julia, I will add min requirements to the readme file. Also, I just created a RaFAH repo on github: https://github.com/felipehcoutinho/RaFAH Its about time I center everything there anyway. Feel free to submit PRs through there. Best, F
Ok thanks. I ran it on AWS and it was fine. I think it would be helpful for users to know min requirements (or to improve the error handling). I cannot make a PR here on Sourceforge or else I would have submitted some propositions. Thanks for your help, Julia
My bad, I meant 80GB (considering a dataset with thousands of viral sequences and using the predict mode). It only uses this much for a very short time, but enough to get killed if there is not enough much memory available.
Thanks for the message. Tracking the memory usage of the Docker container appears to go up to 8Gigs and then is closer to 3 Gigs memory use when it has the file not found error.
Ok. And there is no "RaFAH_1_Host_Predictions.tsv" file generated, correct? It seems R is dying after loading the models into memory. This could happen if there is not enough RAM available. RaFAH needs at least 80 Mb of RAM to run properly, so it will most likely not work on a personal computer. Can you try running it on your server and let me know if you still get the same error?
yes they are protein sequences in fasta format. We can also see an issue (as in the original comment) using the test data - TS3_Toy_Genomes.fasta (downloaded from here https://sourceforge.net/projects/rafah/files/Data/Toy_Set.tgz/download). If I download the data to my "docker" folder and run the command as described in the readme I get this: docker run --mount type=bind,source=/Users/juliagustavsen/Documents/thesis_projects/Kuwait_gp20_analysis/docker,target=/results/ fhcoutinho/rafah perl /usr/bin/RaFAH.pl...
Is "/results/Total_CP.fasta" empty by any chance? If there is something in it, are they protein sequences in fasta format?
Thanks for the response! docker run --mount type=bind,source=/Users/juliagustavsen/Documents/viral_analysis/results,target=/results/ fhcoutinho/rafah perl /usr/bin/RaFAH.pl --predict --merged_cds_file_name /results/Total_CP.fasta --output_dir /results
Hi Julia, can you please specify the full command line you are using?
Hello, I think the comment refers to the Docker image as I also get the same error when running the Docker image "--predict --merged_cds_file_name". Do you have any suggestions? Thanks, Julia
thanks fro the explanation Eric
Hi Eric, Thanks for the info. The changes between versions were restricted to the scripts. While the HMM database stayed the same, so there should be no impact to the results.
FIY :: RaFAH--fetch grabs v1 data
Hi there! Which dependencies are you referring to specifically? There is no conda enviroment for RaFAH. Also, which version of RaFAH are you running and with which command?
No such file or directory at RaFAH.pl line 310
Can you try running the same command with the attached version instead?
Thanks for the reply Felipe. It still showing the same error. =/
Hi Pedro, can you try running specifying --output_dir? Something like: perl RaFAH.pl --train --genomes_dir /home/pedro/Virus_CProject/Train_RaFAH_CRISPR/fna --extension .fna --true_host /home/pedro/Virus_CProject/Train_RaFAH_CRISPR/CP_Virus_Host_CRISPR.txt --file_prefix CRISPR_DB --threads 30 --output_dir Test
Error: problem open alignment files
Hi Felipe, I was trying to running using an outdated R version (3.4.4). Somehow the server did not accept the update. I create a conda environment with R version 4.1.1 and it worked with no problem. Thank you for your thoughtful reply. Best, Pedro
Hi Pedro, Can you please specify which are the versions of R and Ranger that you are running?
Error : file ‘MMSeqs_Clusters_Ranger_Model_1+2+3_Clean.RData’ has magic number 'RDX3'