Huge memory used in intermed files
kSNP4 does SNP discovery and SNP annotation from whole genomes
Brought to you by:
barryghall,
shea0
Dear team ksnp3
I chosse to use these program KNSP3 because I need find SNP of 1550 genomes. I have experience using parsnp but with it is not possible analized my data.
PD: I use a machine with 30 cores, 128 RAM. 950GB of hard disk.
I have few question: My input files are draft genome assemblys,. not reads. 5Mb each file.
1. I run ksnp3 and I can see that it generated a lot of intermed files, for my data was more of 900GB of intermed files. Exist a option for remove the intermed files that the program not use ?
2. I can see that anny steps not work using all cores that I have. I use the option -CPU 25. For example in the step Removing kmers that occur less than freq=average of median and mean kmer frequency for that genome. It work one by one file. Is possible run these steps using CPUs ?
3. How many space in hard disk you think that I need for finish my analysis. ?
Finally:
I have these error running the program: I think that is only problem with space in hard disk. Or maybe RAM memory. What is your opinion ?
Concatenate results for each genome and sort by locus to create SNPs_all_labelLoci
Thu Oct 11 11:13:36 UTC 2018
.
.
.
genome: 790-97_Peru_2007_C in Dir.fsplit99
genome: GCA_002221085_1_ASM222108v1_2013_NA_C in Dir.fsplit990
genome: GCA_002221095_1_ASM222109v1_2013_New_England_C in Dir.fsplit991
genome: GCA_002221145_1_ASM222114v1_2013_NA_C in Dir.fsplit992
genome: GCA_002221165_1_ASM222116v1_2013_NA_C in Dir.fsplit993
genome: GCA_002221175_1_ASM222117v1_2013_NA_C in Dir.fsplit994
genome: GCA_002221185_1_ASM222118v1_2013_NA_C in Dir.fsplit995
genome: GCA_002221225_1_ASM222122v1_2013_NA_E in Dir.fsplit996
genome: GCA_002221245_1_ASM222124v1_2013_NA_E in Dir.fsplit997
genome: GCA_002221265_1_ASM222126v1_2013_NA_C in Dir.fsplit998
genome: GCA_002221285_1_ASM222128v1_2014_NA_C in Dir.fsplit999
sort: write failed: /tmp/sort5v6kCZ: No space left on device
Number_SNPs: 1
$count_snps: 0
Finished finding SNPs
Thu Oct 11 12:36:46 UTC 2018
rm: cannot remove 'TemporaryFilesToDelete': No such file or directory
mv: No match.
I hope your help. Thank you so much.
Orson.
Saludos
I have the same proglem, ~1300 genomes, more than 800GB of data generated on hard drive.
Is there a solution?
SOLUTION: I used a dedicated high capacity SSD
I ran into a second problem:
Now it runs smoothly, jellyfish runs but then there is a step where it has to write into dedicated directoryies. It runs ok the first 600 genomes than it reports for every other genomes
"awk: write failure (File too large) awk: close failed on file /dev/stdout (File too large)"
until the last one.
It generates a fasta matrix of ~600 sequences and then it strats building trees.
At these stage it plots "TOO FEW SPECIES" when doing the core stage.
I have already used ksnp several times in the past and recently, but this is the first time I encounter such problems.
INFO: All genomes are from the same species; input is, for every genome, the final assembly with multifasta containing multiple contigs generated from the assembly (~5MB files); machine is 32 core 64GB of RAM 2x1TB SSD.
SOLUTION: I have checked all my assemblies and some were not good quality, removing/correcting them made me finish the work with good results. It still uses nearly 1TB of disk space and still is not able to delete automatically the folder "TemporaryFilesToDelete".
Thanks for the support provided.
Last edit: Iowa 2019-03-26