I have been running into several memory related issues while running PBJelly. Most of these are in the mapping step...but I anticipate them coming around again during assembly. I have access to a cluster where nodes are limited to ~32Gb of memory. I am trying to incorporate pacbio data onto a 2.5 Gb illumina assembly that is highly fragmented.
What are some steps that I can take to reduce the memory requirements for the mapping/assembly stage? Submit a large number of jobs with very few pacBio reads (for example work with pacBio read files with 10 seqeunces vs 10,000)? Is there a way to use blasr's bwt-fm index? Any suggestions/best practices would be greatly appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
An easy and effective strategy is exactly what you described. Submit a large number of smaller jobs. For the assembly step, each gap assembly is attempted independently, so you won't see as large of a memory footprint.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have been running into several memory related issues while running PBJelly. Most of these are in the mapping step...but I anticipate them coming around again during assembly. I have access to a cluster where nodes are limited to ~32Gb of memory. I am trying to incorporate pacbio data onto a 2.5 Gb illumina assembly that is highly fragmented.
What are some steps that I can take to reduce the memory requirements for the mapping/assembly stage? Submit a large number of jobs with very few pacBio reads (for example work with pacBio read files with 10 seqeunces vs 10,000)? Is there a way to use blasr's bwt-fm index? Any suggestions/best practices would be greatly appreciated.
An easy and effective strategy is exactly what you described. Submit a large number of smaller jobs. For the assembly step, each gap assembly is attempted independently, so you won't see as large of a memory footprint.