NGSEP / Discussion / Frequently Asked Questions: Map read process is complete (100%), but running for hours without an output

Anonymous - 2019-05-13

Hello,
Please find attached a fastq file of a bean (Phaseolus vulgaris) sample generated in Illumina MiSeq for GBS analysis and trimmed using cutadapt on Linux. I am trying to map the reads against the P. vulgaris (G19833) reference genome. The map read process show 100% progress, but it keeps running for hours and hours without an output. I would greatly appreciated your help with this matter.

Regards,
Mohammad Erfatpour
PhD candidate, Dry bean Breeding & Genetics, Department of Plant Agriculture
University of Guelph, ON, Canada
Email: merfatpo@uoguelph.ca

Samplet_1.fastq

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2019-05-14

Dear Mohammad

Many thanks for your interest in NGSEP. I could download the fastq and reproduce the error. Unfortunately it is a very weird error that we did not observe in our test data, so I do not have an immediate answer. In the mean time, I could successfully map the reads using the command line, so, if possible, try to map the reads using directly bowtie2 (you can also use bwa). Please let me know also if the error only ocurrs in this sample or if you have observed the same issue in other samples. I will update this forum as soon as I can find a better solution.

Best regards

Jorge

Last edit: Jorge Duitama 2019-05-14

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2019-05-14

Dear Mohammad

Looking back at the log of the test run that I made using the command line I found the issue. The preprocessing that you are performing produces empty reads and reads with only one nucleotide. See for example the read with id "M05499:20:000000000-CBWGL:1:1101:23232:3417" or id "M05499:20:000000000-CBWGL:1:1118:14554:1883". This issue is somehow confusing the graphical interface. For the next release we will try to improve the error message to avoid getting into an infinite loop in this case.

Because anyways those reads will not be useful, as a quick fix to eliminate these reads you can use the following awk command on each of your samples:

awk '{if(NR%4==1)id=$0;if(NR%4==2)r=$0;if(NR%4==0 && length(r)>20){print id;print r;print "+";print $0}}' Samplet_1.fastq > Samplet_1_L20.fastq

This command will only keep reads having length larger than 20. You can also take a look to our demultiplexing functionality as an alternative to preprocess your reads. In our demultiplexing procedure you can choose the minimum length to keep a read.

Let me know if you have further issues running NGSEP.

Best regards

Jorge

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Mohammad Erfatpour - 2019-05-15
  
  Hi Jorge,
  
  Thanks a lot for your quick reply. I added the awk command on the begining of the samlpe, but I got the error 'Map reads process has encountered a problem'. So far, with this file without the awk command, I've been able to run 'Wizard Single End' by which I get 3.63% overall alignment rate. I think I still need to work with the dataset to figure out the issue and improve the alignment rate. By the way, thanks very much for making this program free available anad taking the time to figure out the issue. I found the program very helpful and stright forward and I hope that I can take full adavantage of it soon.
  
  Best regards,
  Mohammad
  
  From: Jorge Duitama jduitama@users.sourceforge.net
  Sent: Tuesday, May 14, 2019 12:42:05 PM
  To: [ngsep:discussion]
  Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
  
  Dear Mohammad
  
  Looking back at the log of the test run that I made using the command line I found the issue. The preprocessing that you are performing produces empty reads and reads with only one nucleotide. See for example the read with id "M05499:20:000000000-CBWGL:1:1101:23232:3417" or id "M05499:20:000000000-CBWGL:1:1118:14554:1883". This issue is somehow confusing the graphical interface. For the next release we will try to improve the error message to avoid getting into an infinite loop in this case.
  
  Because anyways those reads will not be useful, as a quick fix to eliminate these reads you can use the following awk command on each of your samples:
  
  awk '{if(NR%4==1)id=$0;if(NR%4==2)r=$0;if(NR%4==0 && length(r)>20){print id;print r;print "+";print $0}}' Samplet_1.fastq > Samplet_1_L20.fastq
  
  This command will only keep reads having length larger than 20. You can also take a look to our demultiplexing functionality as an alternative to preprocess your reads. In our demultiplexing procedure you can choose the minimum length to keep a read.
  
  Let me know if you have further issues running NGSEP.
  
  Best regards
  
  Jorge
  
  Map read process is complete (100%), but running for hours without an output https://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/?limit=25#b0eb
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/ngsep/discussion/faq/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2019-05-17

Hi Mohammad

Great, I am glad to know that you are finding the software useful. Plesae find attached a text file with the awk command (sometimes some characters can get lost in the copy paste from the web page) and the fastq file I got. I aligned the file to the latest bean reference genome available in phytozome (v2.1) and got an alignment rate of 87.89%. You can double check with diff if the file you are getting from the awk command is equal to the attached fastq file.

Let me know how things go

Jorge

Samplet_1_L20.fastq

removeShortReadsAwkCommand.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Mohammad Erfatpour - 2019-05-20
  
  Dear Jorge,
  
  With your great help, I got the issue fixed. I am very happy with the progress in my GBS analysis and I would like to thank you again for your consideration and help.
  
  Best regards,
  
  Mohammad
  
  PhD Candidate, Dry Bean Breeding & Genetics
  
  Department of Plant Agriculture, University of Guelph
  
  Mobile: 519 760 5350
  
  From: Jorge Duitama jduitama@users.sourceforge.net
  Sent: Thursday, May 16, 2019 8:46:18 PM
  To: [ngsep:discussion]
  Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
  
  Hi Mohammad
  
  Great, I am glad to know that you are finding the software useful. Plesae find attached a text file with the awk command (sometimes some characters can get lost in the copy paste from the web page) and the fastq file I got. I aligned the file to the latest bean reference genome available in phytozome (v2.1) and got an alignment rate of 87.89%. You can double check with diff if the file you are getting from the awk command is equal to the attached fastq file.
  
  Let me know how things go
  
  Jorge
  
  Attachments:
  
  removeShortReadsAwkCommand.txthttps://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/ef74/attachment/removeShortReadsAwkCommand.txt (144 Bytes; text/plain)
  
  Samplet_1_L20.fastqhttps://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/ef74/attachment/Samplet_1_L20.fastq (5.1 MB; application/octet-stream)
  
  Map read process is complete (100%), but running for hours without an output https://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/?limit=25#ef74
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/ngsep/discussion/faq/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2019-05-21

No problem. Best regards

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Mohammad Erfatpour - 2019-07-24
  
  Mohammad Erfatpour has shared OneDrive for Business files with you. To view them, click the links below.
  https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4(1).fastq
  [https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4(1).fastq
  Sample_4(1).fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4(1).fastq
  
  https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4_L20.fastq
  [https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4_L20.fastq
  Sample_4_L20.fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4_L20.fastq
  
  https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9(2).fastq
  [https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9(2).fastq
  Sample_9(2).fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9(2).fastq
  
  https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9_L20.fastq
  [https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9_L20.fastq
  Sample_9_L20.fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9_L20.fastq
  
  Hello Jorge,
  
  Sorry to bother you again. I have got some results from my last run with Illumina MiSeq. Please find attached the fastq files of two bean genotypes before and after using the "awk command". As you will notice, there is no change in the file sizes before and after applying the "awk command"! As a result of that, I only get less than 2% alignment rate against the bean reference genome when I'm running them with NGSEP. I was wondering if you can provide me with some advice.
  
  Best,
  
  Mohammad
  
  PhD Candidate, Dry Bean Breeding & Genetics
  
  Department of Plant Agriculture, University of Guelph
  
  Mobile: 519 760 5350
  
  From: Jorge Duitama jduitama@users.sourceforge.net
  Sent: Tuesday, May 21, 2019 9:20 AM
  To: [ngsep:discussion] faq@discussion.ngsep.p.re.sourceforge.net
  Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
  
  No problem. Best regards
  
  Map read process is complete (100%), but running for hours without an output https://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/?limit=25#b1d2
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/ngsep/discussion/faq/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2019-07-24

Hi Mohammad

I am more than happy to help you but unfortunately I can not do instant replies because we ar not a for profit company. Also you do not need to spam the forum (or my e-mail for that matter).

These new reads are all long (over 200bp) but they definitely do not look like GBS reads. Almost all of them (99.46%) start with a long A mononucleotide run. You may want to run fastqc and see what comes up. Also, they are very unlikely to come from a bean sample, so you may want to check out for sample mix up or contamination.

Best regards

Jorge

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Map read process is complete (100%), but running for hours without an output

NGSEP (Next Generation Sequencing Experience Platform)

Forums

Help

Map read process is complete (100%), but running for hours without an output

Map read process is complete (100%), but running for hours without an output

NGSEP (Next Generation Sequencing Experience Platform)

Forums

Help

Map read process is complete (100%), but running for hours without an output document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Map read process is complete (100%), but running for hours without an output