Hello,
Please find attached a fastq file of a bean (Phaseolus vulgaris) sample generated in Illumina MiSeq for GBS analysis and trimmed using cutadapt on Linux. I am trying to map the reads against the P. vulgaris (G19833) reference genome. The map read process show 100% progress, but it keeps running for hours and hours without an output. I would greatly appreciated your help with this matter.
Regards,
Mohammad Erfatpour
PhD candidate, Dry bean Breeding & Genetics, Department of Plant Agriculture
University of Guelph, ON, Canada
Email: merfatpo@uoguelph.ca
Many thanks for your interest in NGSEP. I could download the fastq and reproduce the error. Unfortunately it is a very weird error that we did not observe in our test data, so I do not have an immediate answer. In the mean time, I could successfully map the reads using the command line, so, if possible, try to map the reads using directly bowtie2 (you can also use bwa). Please let me know also if the error only ocurrs in this sample or if you have observed the same issue in other samples. I will update this forum as soon as I can find a better solution.
Best regards
Jorge
Last edit: Jorge Duitama 2019-05-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Looking back at the log of the test run that I made using the command line I found the issue. The preprocessing that you are performing produces empty reads and reads with only one nucleotide. See for example the read with id "M05499:20:000000000-CBWGL:1:1101:23232:3417" or id "M05499:20:000000000-CBWGL:1:1118:14554:1883". This issue is somehow confusing the graphical interface. For the next release we will try to improve the error message to avoid getting into an infinite loop in this case.
Because anyways those reads will not be useful, as a quick fix to eliminate these reads you can use the following awk command on each of your samples:
This command will only keep reads having length larger than 20. You can also take a look to our demultiplexing functionality as an alternative to preprocess your reads. In our demultiplexing procedure you can choose the minimum length to keep a read.
Let me know if you have further issues running NGSEP.
Best regards
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks a lot for your quick reply. I added the awk command on the begining of the samlpe, but I got the error 'Map reads process has encountered a problem'. So far, with this file without the awk command, I've been able to run 'Wizard Single End' by which I get 3.63% overall alignment rate. I think I still need to work with the dataset to figure out the issue and improve the alignment rate. By the way, thanks very much for making this program free available anad taking the time to figure out the issue. I found the program very helpful and stright forward and I hope that I can take full adavantage of it soon.
Best regards,
Mohammad
From: Jorge Duitama jduitama@users.sourceforge.net
Sent: Tuesday, May 14, 2019 12:42:05 PM
To: [ngsep:discussion]
Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
Dear Mohammad
Looking back at the log of the test run that I made using the command line I found the issue. The preprocessing that you are performing produces empty reads and reads with only one nucleotide. See for example the read with id "M05499:20:000000000-CBWGL:1:1101:23232:3417" or id "M05499:20:000000000-CBWGL:1:1118:14554:1883". This issue is somehow confusing the graphical interface. For the next release we will try to improve the error message to avoid getting into an infinite loop in this case.
Because anyways those reads will not be useful, as a quick fix to eliminate these reads you can use the following awk command on each of your samples:
This command will only keep reads having length larger than 20. You can also take a look to our demultiplexing functionality as an alternative to preprocess your reads. In our demultiplexing procedure you can choose the minimum length to keep a read.
Let me know if you have further issues running NGSEP.
Great, I am glad to know that you are finding the software useful. Plesae find attached a text file with the awk command (sometimes some characters can get lost in the copy paste from the web page) and the fastq file I got. I aligned the file to the latest bean reference genome available in phytozome (v2.1) and got an alignment rate of 87.89%. You can double check with diff if the file you are getting from the awk command is equal to the attached fastq file.
With your great help, I got the issue fixed. I am very happy with the progress in my GBS analysis and I would like to thank you again for your consideration and help.
Best regards,
Mohammad
PhD Candidate, Dry Bean Breeding & Genetics
Department of Plant Agriculture, University of Guelph
Mobile: 519 760 5350
From: Jorge Duitama jduitama@users.sourceforge.net
Sent: Thursday, May 16, 2019 8:46:18 PM
To: [ngsep:discussion]
Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
Hi Mohammad
Great, I am glad to know that you are finding the software useful. Plesae find attached a text file with the awk command (sometimes some characters can get lost in the copy paste from the web page) and the fastq file I got. I aligned the file to the latest bean reference genome available in phytozome (v2.1) and got an alignment rate of 87.89%. You can double check with diff if the file you are getting from the awk command is equal to the attached fastq file.
Sorry to bother you again. I have got some results from my last run with Illumina MiSeq. Please find attached the fastq files of two bean genotypes before and after using the "awk command". As you will notice, there is no change in the file sizes before and after applying the "awk command"! As a result of that, I only get less than 2% alignment rate against the bean reference genome when I'm running them with NGSEP. I was wondering if you can provide me with some advice.
Best,
Mohammad
PhD Candidate, Dry Bean Breeding & Genetics
Department of Plant Agriculture, University of Guelph
I am more than happy to help you but unfortunately I can not do instant replies because we ar not a for profit company. Also you do not need to spam the forum (or my e-mail for that matter).
These new reads are all long (over 200bp) but they definitely do not look like GBS reads. Almost all of them (99.46%) start with a long A mononucleotide run. You may want to run fastqc and see what comes up. Also, they are very unlikely to come from a bean sample, so you may want to check out for sample mix up or contamination.
Best regards
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
Please find attached a fastq file of a bean (Phaseolus vulgaris) sample generated in Illumina MiSeq for GBS analysis and trimmed using cutadapt on Linux. I am trying to map the reads against the P. vulgaris (G19833) reference genome. The map read process show 100% progress, but it keeps running for hours and hours without an output. I would greatly appreciated your help with this matter.
Regards,
Mohammad Erfatpour
PhD candidate, Dry bean Breeding & Genetics, Department of Plant Agriculture
University of Guelph, ON, Canada
Email: merfatpo@uoguelph.ca
Dear Mohammad
Many thanks for your interest in NGSEP. I could download the fastq and reproduce the error. Unfortunately it is a very weird error that we did not observe in our test data, so I do not have an immediate answer. In the mean time, I could successfully map the reads using the command line, so, if possible, try to map the reads using directly bowtie2 (you can also use bwa). Please let me know also if the error only ocurrs in this sample or if you have observed the same issue in other samples. I will update this forum as soon as I can find a better solution.
Best regards
Jorge
Last edit: Jorge Duitama 2019-05-14
Dear Mohammad
Looking back at the log of the test run that I made using the command line I found the issue. The preprocessing that you are performing produces empty reads and reads with only one nucleotide. See for example the read with id "M05499:20:000000000-CBWGL:1:1101:23232:3417" or id "M05499:20:000000000-CBWGL:1:1118:14554:1883". This issue is somehow confusing the graphical interface. For the next release we will try to improve the error message to avoid getting into an infinite loop in this case.
Because anyways those reads will not be useful, as a quick fix to eliminate these reads you can use the following awk command on each of your samples:
awk '{if(NR%4==1)id=$0;if(NR%4==2)r=$0;if(NR%4==0 && length(r)>20){print id;print r;print "+";print $0}}' Samplet_1.fastq > Samplet_1_L20.fastq
This command will only keep reads having length larger than 20. You can also take a look to our demultiplexing functionality as an alternative to preprocess your reads. In our demultiplexing procedure you can choose the minimum length to keep a read.
Let me know if you have further issues running NGSEP.
Best regards
Jorge
Hi Jorge,
Thanks a lot for your quick reply. I added the awk command on the begining of the samlpe, but I got the error 'Map reads process has encountered a problem'. So far, with this file without the awk command, I've been able to run 'Wizard Single End' by which I get 3.63% overall alignment rate. I think I still need to work with the dataset to figure out the issue and improve the alignment rate. By the way, thanks very much for making this program free available anad taking the time to figure out the issue. I found the program very helpful and stright forward and I hope that I can take full adavantage of it soon.
Best regards,
Mohammad
From: Jorge Duitama jduitama@users.sourceforge.net
Sent: Tuesday, May 14, 2019 12:42:05 PM
To: [ngsep:discussion]
Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
Dear Mohammad
Looking back at the log of the test run that I made using the command line I found the issue. The preprocessing that you are performing produces empty reads and reads with only one nucleotide. See for example the read with id "M05499:20:000000000-CBWGL:1:1101:23232:3417" or id "M05499:20:000000000-CBWGL:1:1118:14554:1883". This issue is somehow confusing the graphical interface. For the next release we will try to improve the error message to avoid getting into an infinite loop in this case.
Because anyways those reads will not be useful, as a quick fix to eliminate these reads you can use the following awk command on each of your samples:
awk '{if(NR%4==1)id=$0;if(NR%4==2)r=$0;if(NR%4==0 && length(r)>20){print id;print r;print "+";print $0}}' Samplet_1.fastq > Samplet_1_L20.fastq
This command will only keep reads having length larger than 20. You can also take a look to our demultiplexing functionality as an alternative to preprocess your reads. In our demultiplexing procedure you can choose the minimum length to keep a read.
Let me know if you have further issues running NGSEP.
Best regards
Jorge
Map read process is complete (100%), but running for hours without an output https://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/?limit=25#b0eb
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/ngsep/discussion/faq/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Hi Mohammad
Great, I am glad to know that you are finding the software useful. Plesae find attached a text file with the awk command (sometimes some characters can get lost in the copy paste from the web page) and the fastq file I got. I aligned the file to the latest bean reference genome available in phytozome (v2.1) and got an alignment rate of 87.89%. You can double check with diff if the file you are getting from the awk command is equal to the attached fastq file.
Let me know how things go
Jorge
Dear Jorge,
With your great help, I got the issue fixed. I am very happy with the progress in my GBS analysis and I would like to thank you again for your consideration and help.
Best regards,
Mohammad
PhD Candidate, Dry Bean Breeding & Genetics
Department of Plant Agriculture, University of Guelph
Mobile: 519 760 5350
From: Jorge Duitama jduitama@users.sourceforge.net
Sent: Thursday, May 16, 2019 8:46:18 PM
To: [ngsep:discussion]
Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
Hi Mohammad
Great, I am glad to know that you are finding the software useful. Plesae find attached a text file with the awk command (sometimes some characters can get lost in the copy paste from the web page) and the fastq file I got. I aligned the file to the latest bean reference genome available in phytozome (v2.1) and got an alignment rate of 87.89%. You can double check with diff if the file you are getting from the awk command is equal to the attached fastq file.
Let me know how things go
Jorge
Attachments:
Map read process is complete (100%), but running for hours without an output https://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/?limit=25#ef74
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/ngsep/discussion/faq/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
No problem. Best regards
Mohammad Erfatpour has shared OneDrive for Business files with you. To view them, click the links below.
https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4(1).fastq
[https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4(1).fastq
Sample_4(1).fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4(1).fastq
https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4_L20.fastq
[https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4_L20.fastq
Sample_4_L20.fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_4_L20.fastq
https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9(2).fastq
[https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9(2).fastq
Sample_9(2).fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9(2).fastq
https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9_L20.fastq
[https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9_L20.fastq
Sample_9_L20.fastqhttps://uoguelphca-my.sharepoint.com/personal/merfatpo_uoguelph_ca/Documents/Attachments/Sample_9_L20.fastq
Hello Jorge,
Sorry to bother you again. I have got some results from my last run with Illumina MiSeq. Please find attached the fastq files of two bean genotypes before and after using the "awk command". As you will notice, there is no change in the file sizes before and after applying the "awk command"! As a result of that, I only get less than 2% alignment rate against the bean reference genome when I'm running them with NGSEP. I was wondering if you can provide me with some advice.
Best,
Mohammad
PhD Candidate, Dry Bean Breeding & Genetics
Department of Plant Agriculture, University of Guelph
Mobile: 519 760 5350
From: Jorge Duitama jduitama@users.sourceforge.net
Sent: Tuesday, May 21, 2019 9:20 AM
To: [ngsep:discussion] faq@discussion.ngsep.p.re.sourceforge.net
Subject: [ngsep:discussion] Map read process is complete (100%), but running for hours without an output
No problem. Best regards
Map read process is complete (100%), but running for hours without an output https://sourceforge.net/p/ngsep/discussion/faq/thread/ff47fc35a8/?limit=25#b1d2
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/ngsep/discussion/faq/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Hi Mohammad
I am more than happy to help you but unfortunately I can not do instant replies because we ar not a for profit company. Also you do not need to spam the forum (or my e-mail for that matter).
These new reads are all long (over 200bp) but they definitely do not look like GBS reads. Almost all of them (99.46%) start with a long A mononucleotide run. You may want to run fastqc and see what comes up. Also, they are very unlikely to come from a bean sample, so you may want to check out for sample mix up or contamination.
Best regards
Jorge