Huge memory used in intermed files
kSNP4 does SNP discovery and SNP annotation from whole genomes
Brought to you by:
barryghall,
shea0
Dear team ksnp3
I chosse to use these program KNSP3 because I need find SNP of 1550 genomes. I have experience using parsnp but with it is not possible analized my data.
PD: I use a machine with 30 cores, 128 RAM. 950GB of hard disk.
I have few question: My input files are draft genome assemblys,. not reads. 5Mb each file.
1. I run ksnp3 and I can see that it generated a lot of intermed files, for my data was more of 900GB of intermed files. Exist a option for remove the intermed files that the program not use ?
2. I can see that anny steps not work using all cores that I have. I use the option -CPU 25. For example in the step Removing kmers that occur less than freq=average of median and mean kmer frequency for that genome. It work one by one file. Is possible run these steps using CPUs ?
3. How many space in hard disk you think that I need for finish my analysis. ?
Finally:
**I have these error running the program: I think that is only problem with space in hard disk. Or maybe RAM memory. What is your opinion ? **
Concatenate results for each genome and sort by locus to create SNPs_all_labelLoci
Thu Oct 11 11:13:36 UTC 2018
.
.
.
genome: 790-97_Peru_2007_C in Dir.fsplit99
genome: GCA_002221085_1_ASM222108v1_2013_NA_C in Dir.fsplit990
genome: GCA_002221095_1_ASM222109v1_2013_New_England_C in Dir.fsplit991
genome: GCA_002221145_1_ASM222114v1_2013_NA_C in Dir.fsplit992
genome: GCA_002221165_1_ASM222116v1_2013_NA_C in Dir.fsplit993
genome: GCA_002221175_1_ASM222117v1_2013_NA_C in Dir.fsplit994
genome: GCA_002221185_1_ASM222118v1_2013_NA_C in Dir.fsplit995
genome: GCA_002221225_1_ASM222122v1_2013_NA_E in Dir.fsplit996
genome: GCA_002221245_1_ASM222124v1_2013_NA_E in Dir.fsplit997
genome: GCA_002221265_1_ASM222126v1_2013_NA_C in Dir.fsplit998
genome: GCA_002221285_1_ASM222128v1_2014_NA_C in Dir.fsplit999
sort: write failed: /tmp/sort5v6kCZ: No space left on device
Number_SNPs: 1
$count_snps: 0
Finished finding SNPs
Thu Oct 11 12:36:46 UTC 2018
rm: cannot remove 'TemporaryFilesToDelete': No such file or directory
mv: No match.
I hope your help. Thank you so much.
Orson.
Saludos
I have the same proglem, ~1300 genomes, more than 800GB of data generated on hard drive.
Is there a solution?
SOLUTION: I used a dedicated high capacity SSD
I ran into a second problem:
Now it runs smoothly, jellyfish runs but then there is a step where it has to write into dedicated directoryies. It runs ok the first 600 genomes than it reports for every other genomes
"awk: write failure (File too large) awk: close failed on file /dev/stdout (File too large)"
until the last one.
It generates a fasta matrix of ~600 sequences and then it strats building trees.
At these stage it plots "TOO FEW SPECIES" when doing the core stage.
I have already used ksnp several times in the past and recently, but this is the first time I encounter such problems.
INFO: All genomes are from the same species; input is, for every genome, the final assembly with multifasta containing multiple contigs generated from the assembly (~5MB files); machine is 32 core 64GB of RAM 2x1TB SSD.
**SOLUTION: I have checked all my assemblies and some were not good quality, removing/correcting them made me finish the work with good results. It still uses nearly 1TB of disk space and still is not able to delete automatically the folder "TemporaryFilesToDelete". **
Thanks for the support provided.
Last edit: Iowa 2019-03-26