kSNP / Discussion / General Discussion: Huge memory used in intermed files

Huge memory used in intermed files

Forum: General Discussion

Creator: Orson

Created: 2018-10-11

Updated: 2019-03-19

Orson - 2018-10-11

Dear team ksnp3

I chosse to use these program KNSP3 because I need find SNP of 1550 genomes. I have experience using parsnp but with it is not possible analized my data.

PD: I use a machine with 30 cores, 128 RAM. 950GB of hard disk.

I have few question: My input files are draft genome assemblys,. not reads. 5Mb each file.

1. I run ksnp3 and I can see that it generated a lot of intermed files, for my data was more of 900GB of intermed files. Exist a option for remove the intermed files that the program not use ?
2. I can see that anny steps not work using all cores that I have. I use the option -CPU 25. For example in the step Removing kmers that occur less than freq=average of median and mean kmer frequency for that genome. It work one by one file. Is possible run these steps using CPUs ?
3. How many space in hard disk you think that I need for finish my analysis. ?

Finally:

**I have these error running the program: I think that is only problem with space in hard disk. Or maybe RAM memory. What is your opinion ? **

Concatenate results for each genome and sort by locus to create SNPs_all_labelLoci
Thu Oct 11 11:13:36 UTC 2018
.
.
.
genome: 790-97_Peru_2007_C in Dir.fsplit99
genome: GCA_002221085_1_ASM222108v1_2013_NA_C in Dir.fsplit990
genome: GCA_002221095_1_ASM222109v1_2013_New_England_C in Dir.fsplit991
genome: GCA_002221145_1_ASM222114v1_2013_NA_C in Dir.fsplit992
genome: GCA_002221165_1_ASM222116v1_2013_NA_C in Dir.fsplit993
genome: GCA_002221175_1_ASM222117v1_2013_NA_C in Dir.fsplit994
genome: GCA_002221185_1_ASM222118v1_2013_NA_C in Dir.fsplit995
genome: GCA_002221225_1_ASM222122v1_2013_NA_E in Dir.fsplit996
genome: GCA_002221245_1_ASM222124v1_2013_NA_E in Dir.fsplit997
genome: GCA_002221265_1_ASM222126v1_2013_NA_C in Dir.fsplit998
genome: GCA_002221285_1_ASM222128v1_2014_NA_C in Dir.fsplit999
sort: write failed: /tmp/sort5v6kCZ: No space left on device
Number_SNPs: 1
$count_snps: 0
Finished finding SNPs
Thu Oct 11 12:36:46 UTC 2018
rm: cannot remove 'TemporaryFilesToDelete': No such file or directory
mv: No match.

I hope your help. Thank you so much.

Orson.

Saludos

nohup.out

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Iowa - 2019-03-19

I have the same proglem, ~1300 genomes, more than 800GB of data generated on hard drive.

Is there a solution?

SOLUTION: I used a dedicated high capacity SSD

I ran into a second problem:

Now it runs smoothly, jellyfish runs but then there is a step where it has to write into dedicated directoryies. It runs ok the first 600 genomes than it reports for every other genomes
"awk: write failure (File too large) awk: close failed on file /dev/stdout (File too large)"
until the last one.

It generates a fasta matrix of ~600 sequences and then it strats building trees.
At these stage it plots "TOO FEW SPECIES" when doing the core stage.

I have already used ksnp several times in the past and recently, but this is the first time I encounter such problems.

INFO: All genomes are from the same species; input is, for every genome, the final assembly with multifasta containing multiple contigs generated from the assembly (~5MB files); machine is 32 core 64GB of RAM 2x1TB SSD.

**SOLUTION: I have checked all my assemblies and some were not good quality, removing/correcting them made me finish the work with good results. It still uses nearly 1TB of disk space and still is not able to delete automatically the folder "TemporaryFilesToDelete". **

Thanks for the support provided.

Last edit: Iowa 2019-03-26

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Huge memory used in intermed files

kSNP4 does SNP discovery and SNP annotation from whole genomes

Forums

Help

Huge memory used in intermed files

I ran into a second problem:

Huge memory used in intermed files

kSNP4 does SNP discovery and SNP annotation from whole genomes

Forums

Help

Huge memory used in intermed files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

I ran into a second problem:

Huge memory used in intermed files