ngopt / Tickets / #17 Error in detecting missasemblies

Aaron Darling - 2014-11-20

Hi Julian, you are probably using A5-miseq if the pipeline made it to the A5qc step with 300nt reads. What kind of organism are you trying to assemble? If it is a bacterium you can probably just reduce the number of reads for assembly to get a dataset that is small enough to assemble on your machine. If you are hitting memory limits it is likely that you have far more data than necessary or useful for a bacterial genome.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2014-11-20

Thanks for the answer!
Yes its bacterial genome, but the fact is that I did not want to haveto modify the original files or do any kind of subsampling of sequences? I tried to allow more memory for java to use, but got the same result, but not sure if I did it in the proper way. I also tried to see what is being done in the perl script, but I am not very very confortable with that language... I was just wondering if the script was setting its own memory usage limits, and if it was of any use trying to change it in my environement variable (because java default is kinda set to much lower memory that is available on my cumputer). What would you suggest if I just want to make it work without reducing number of reads? If I count the number of reads I have in the forward reads file I have 1 552 081 reads

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Darling - 2014-11-20

ok, your best bet is to check out the latest source code (important bugfixes), and edit around line 1473 to set the $mem variable to what you desire. Then build a linux package by running the script ./build_pipeline.sh

The default behavior is to use 2/3 of available system memory for java in the qc step.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2014-11-20
  
  Sorry to bother that much mr Darling, your hints are helping me very much at that moment..!
  
  I just downloaded the very recent update : a5_miseq_macOS_20141120.tar.gz
  
  Will this include the very latest source code your just mentioned me? Could you give me some hint on how to update a version of A5-Miseq I have with the new sources codes when they are available? (I am not sure of the procedure... And also, I dont understand the ./buil_pipeline.sh step... and why I do it... ) I know where the lines are, but I am not sure what I should write to be sure I use the most memory I can. (I dont just save the .pl file and use it in the command line). Obviously I have to get more knowledge on functionning with all the languages and good habits to have to make them work.
  
  I also got to read that having a lot of read can indeed make the genome alignment much worse that with few reads... Although I dont understand completely why it does that (Always thought that more reads and longer reads would make things better), I am wondering what could be a good amount of reads to deal with in order to have the optimal genome reconstruction and no problem with memory issues on A5-Miseq. How do you pick the reads to be sure the selection is not biaised? What do you do when you have too much sequence, more than you need?
  
  Last edit: Anonymous 2014-11-20
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Darling - 2014-11-20

Hi Julian, these are big questions about sequencing strategy. You're better off asking in a forum like seqanswers or biostars.

./build_pipeline.sh is only needed if you check out the source code with subversion. Since you downloaded an already packaged build you can proceed directly to editing the script at the aforementioned line.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Error in detecting missasemblies

de novo assembly & analysis of Illumina sequence data

Searches

Help

#17 Error in detecting missasemblies

Discussion