prinseq-news Mailing List for PRINSEQ
Brought to you by:
rschmieder
You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(6) |
Mar
|
Apr
(5) |
May
(4) |
Jun
(1) |
Jul
|
Aug
|
Sep
(2) |
Oct
(2) |
Nov
(2) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(8) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2014 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Kristina G. <kga...@bc...> - 2017-07-31 19:58:30
|
Hi, I am running prin-seq on paired end reads perl $prinseq -noniupac -ns_max_p 5 -lc_method dust -lc_threshold 50 -trim_qual_right 20 -stats_all \ -fastq <(zcat $fq1 | paste - - - - | sort -k1,1 -t " " |tr "\t" "\n" ) \ -fastq2 <(zcat $fq2 | paste - - - - | sort -k1,1 -t " " |tr "\t" "\n") The error I receive is the following: #------------------------------------------------------------------------------------------------------------- ERROR: The number of bases and quality scores are not the same for sequence "CCFFFFFHHHHGJJJJJJJJIJJJJGIIIIJJJJHGIJJJJJJJJJJJIJJJGHIIJHHHHFFFFFFDEDEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDCDACDDDDDDDDCDDDDDDDDDDDDDCDDDEEEDCA3<B>@A<@?BDDDB?BB@>328A@::<BB@BC@:>A(:>C@C###################################################################################################################". Try 'perl prinseq-lite.pl -h' for more information. Exit program. #----------------------------------------------------------------------------------------------------------- This is the read that contains the error, the length of the read string is the same as the length of the quality score: @MISEQ1_8:1:10:10002:19889/2 TGTTGTTTGTCGAAATCCAAAATATAGAGCGAATGTAGGCCAATATTTTGGGGTTTCGAGATTCAGGGCTTTGCGAGTACGCGAGCCAGAAATCAACAAAAAATATTTCCCGAAATTGCAACAAGATGTCGAGTATTTCAGGGTTTCACGGTTTGGGTTTTCGTGAACACAAAAGTCAATCATCAAAACACTATAACTCCCGAAAATGCAAAAGAGAATTAGTACTTGCTGAATTCAGAGTGCGGGGTTTTAAAAGTGAAAGGGCACAACAAACACAATATACAAACCACCCAAAAGAGG + @CCFFFFFHHHHGJJJJJJJJIJJJJGIIIIJJJJHGIJJJJJJJJJJJIJJJGHIIJHHHHFFFFFFDEDEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDCDACDDDDDDDDCDDDDDDDDDDDDDCDDDEEEDCA3<B>@A<@?BDDDB?BB@>328A@::<BB@BC@:>A(:>C@C################################################################################################################### The read quality starts however with "@" which is the start for the read name. I guess that there is an error in the parser. Prin-seq version: prinseq-lite-0.20.4 Thank you in advance for any help -- Kristina Gagalova Graduate Student Canada's Michael Smith Genome Sciences Centre Suite 100 - 570 West 7th Avenue Vancouver, BC V5Z 4S6 |
From: Robert S. <rsc...@gm...> - 2014-02-11 15:18:44
|
Hi Arnaud, We moved the forum to Google Groups. Would you mind posting your question there? http://groups.google.com/d/forum/EdwardsLabTools Thanks, Rob On Tue, Feb 11, 2014 at 6:23 AM, Arnaud Muller <Arn...@cr...>wrote: > Dear prinseq users, > > Prinseq offers plenty of pertinent metrics into a really nice graphical > output, using prinseq-graph. > It was orginally developed for 454 data, as far as I know but the latest > releases show good compatibility > with Illumina data as well. > I've tried to use it with single ended Illumina data without any succes > unfortunatelly. The prinseq-lite > crashes after ~24H (!) of runing using the following (notice the > -exact_only option): > > [prinseq-lite-0.20.4] [02/05/2014 09:44:33] Executing PRINSEQ with > command: "perl prinseq-lite.pl -fastq > ../raw/140110_SN7001136_0180_BC3C91ACXX_HS101/SAB-01R_10_S1.fastq > -exact_only -graph_data ./SAB-01R_10_S1.gd -log SAB-01R_10_S1.litelog > -verbose -out_good null -out_bad null" > > The following correspond to the rest of the log (notice the time spent): > [prinseq-lite-0.20.4] [02/05/2014 10:00:20] Parsing and processing input > data: "../raw/140110_SN7001136_0180_BC3C91ACXX_HS101/SAB-01R_10_S1.fastq" > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Done parsing and processing > input data > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Input sequences: 56,614,059 > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Input bases: 2,830,702,950 > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Input mean length: 50.00 > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Good sequences: 0 (0.00%) > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Bad sequences: 0 (0.00%) > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Sequences filtered by > specified parameters: > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] none > [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Generate graph data > [prinseq-lite-0.17.1] [02/06/2014 09:29:29] ERROR: please specify the > -derep option to remove forward (1) and/or reverse exact duplicates (4). > Exit program. > > The file size of the fastq file is 8.4 GB, and the log shows ~56 millions > of reads... > > Is there a known limitation with prinseq-lite? > If yes, what is your opinion about spliting randomly 1/5 of the input file > (~1.7GB)? 1/6 (1.4 B) ... up to which threshold shall prinseq-lite be ok? > Do you suggest other methods? > > Best regards, > > Arnaud > > > _________________________________ > Arnaud Muller > Bioinformatician > Genomics Research Unit > Centre de Recherche Public de la Santé (CRP-Santé) > 84, Val Fleuri, L-1526 Luxembourg > Luxembourg > Tel: +352 26970-305 > Fax: +352 26970-390 > Email: arn...@cr... > Website: http://www.crp-sante.lu > > Join us on Facebook and follow our activities on Twitter: > http://www.facebook.com/crpsante > http://twitter.com/crpsante > > http://www.microarray.lu > http://www.bioinformatics.lu > > This message (including any attachments) is intended for the addressee > only and may contain confidential and/or privileged information and/or > information protected by intellectual property rights. If you have received > this message by mistake, please notify the sender by return e-mail and > delete this message from your system. You should not use, alter, copy or > distribute this message or disclose its contents to anyone. E-mail > transmission can not be guaranteed to be secure or error free as > information could be intercepted, corrupted, lost, destroyed, arrive late > or incomplete, or contain viruses. CRP-Santé shall not be responsible nor > liable for the proper and complete transmission of the information > contained in this communication nor for any delay in its receipt or damage > to your system. > > > > ------------------------------------------------------------------------------ > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > _______________________________________________ > prinseq-news mailing list > pri...@li... > https://lists.sourceforge.net/lists/listinfo/prinseq-news > > |
From: Arnaud M. <Arn...@cr...> - 2014-02-11 14:38:20
|
Dear prinseq users, Prinseq offers plenty of pertinent metrics into a really nice graphical output, using prinseq-graph. It was orginally developed for 454 data, as far as I know but the latest releases show good compatibility with Illumina data as well. I've tried to use it with single ended Illumina data without any succes unfortunatelly. The prinseq-lite crashes after ~24H (!) of runing using the following (notice the -exact_only option): [prinseq-lite-0.20.4] [02/05/2014 09:44:33] Executing PRINSEQ with command: "perl prinseq-lite.pl -fastq ../raw/140110_SN7001136_0180_BC3C91ACXX_HS101/SAB-01R_10_S1.fastq -exact_only -graph_data ./SAB-01R_10_S1.gd -log SAB-01R_10_S1.litelog -verbose -out_good null -out_bad null" The following correspond to the rest of the log (notice the time spent): [prinseq-lite-0.20.4] [02/05/2014 10:00:20] Parsing and processing input data: "../raw/140110_SN7001136_0180_BC3C91ACXX_HS101/SAB-01R_10_S1.fastq" [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Done parsing and processing input data [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Input sequences: 56,614,059 [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Input bases: 2,830,702,950 [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Input mean length: 50.00 [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Good sequences: 0 (0.00%) [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Bad sequences: 0 (0.00%) [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Sequences filtered by specified parameters: [prinseq-lite-0.20.4] [02/05/2014 16:09:30] none [prinseq-lite-0.20.4] [02/05/2014 16:09:30] Generate graph data [prinseq-lite-0.17.1] [02/06/2014 09:29:29] ERROR: please specify the -derep option to remove forward (1) and/or reverse exact duplicates (4). Exit program. The file size of the fastq file is 8.4 GB, and the log shows ~56 millions of reads... Is there a known limitation with prinseq-lite? If yes, what is your opinion about spliting randomly 1/5 of the input file (~1.7GB)? 1/6 (1.4 B) ... up to which threshold shall prinseq-lite be ok? Do you suggest other methods? Best regards, Arnaud _________________________________ Arnaud Muller Bioinformatician Genomics Research Unit Centre de Recherche Public de la Sant? (CRP-Sant?) 84, Val Fleuri, L-1526 Luxembourg Luxembourg Tel: +352 26970-305 Fax: +352 26970-390 Email: arn...@cr...<mailto:arn...@cr...> Website: http://www.crp-sante.lu Join us on Facebook and follow our activities on Twitter: http://www.facebook.com/crpsante http://twitter.com/crpsante http://www.microarray.lu http://www.bioinformatics.lu This message (including any attachments) is intended for the addressee only and may contain confidential and/or privileged information and/or information protected by intellectual property rights. If you have received this message by mistake, please notify the sender by return e-mail and delete this message from your system. You should not use, alter, copy or distribute this message or disclose its contents to anyone. E-mail transmission can not be guaranteed to be secure or error free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. CRP-Sant? shall not be responsible nor liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. |
From: Adriana F. <dri...@gm...> - 2014-01-06 21:21:15
|
Hi, I have already used prinseq online and worked fine untill now. I'm working with illumina reads, paired end. After uploading the sequences and processing d the statistics, during the process of prinseq, appeared the following error: Use of uninitialized value in numeric gt (>) at prinseq-lite.pl line 2553 I opened the perl script "pinseq-lite.pl" and the line 2553 contained the following error: $html .= '<br />'.&insert_image($png); Has anyone experienced this problem before and knows how to solve it? Thanks -- Adriana M. Froes |
From: Adriana F. <dri...@gm...> - 2014-01-06 21:15:49
|
Hi, I have already used prinseq online and worked fine untill now. I'm working with illumina reads, paired end. After uploading the sequences and processing d the statistics, during the process of prinseq, appeared the following error: Use of uninitialized value in numeric gt (>) at prinseq-lite.pl line 2553 I opened the perl script "pinseq-lite.pl" and the line 2553 contained the following error: $html .= '<br />'.&insert_image($png); Has anyone experienced this problem before and knows how to solve it? Thanks -- Adriana M. Froes |
From: Robert S. <rsc...@gm...> - 2013-11-10 22:05:50
|
*Release of lite version 0.20.4:* Fixed error caused by empty lines at the end of paired-end datasets. Restricted sequences in graph data complexity statistics output to 1000bp to keep the .gd files small for inputs with very long sequences (e.g. whole genomes). Rob |
From: Robert S. <rsc...@gm...> - 2013-03-13 19:20:44
|
*Release of lite version 0.20.3:* Fixed issue of incorrect duplicate counts when a sequence is both an exact duplicate and reverse complement exact duplicate of another sequence. Rob |
From: Robert S. <rsc...@gm...> - 2013-01-27 04:55:05
|
Hi Luis, The singletons are separated from the reads that are still in pairs. The separation of singletons by their origin (left or right) might not be useful for each user, but I had requests to separate them. You can simply join the singleton files, if you do not need that information. Best, Rob On Sat, Jan 26, 2013 at 8:33 PM, <lma...@ir...> wrote: > Hi Rob,thank you for your soon answer, > i've checked my original files and your answer is true, I have another > question, why when the fastq paired end files are processed prinseq output > singletons files for the left and right files? > > it means that this singletons are dropped from the original left and right > files or it's only informative and this singletons still remain in the > original files? > > thanks a lot, > > Luis > CINVESTAV,Plant Biotecnology,Mexico > > > Hi Luis, > > > > PRINSEQ can be used for any type of sequence data (DNA, RNA, protein, > > etc.). You will have to adjust the parameters according to the type of > > data > > you are processing. > > > > To answer your second question. PRINSEQ outputs FASTQ files according to > > the main FASTQ specification, which includes the header for quality data. > > You can turn this off with the no_header parameter. Assuming that your > > input data did not have headers for the quality entries, you will get > > output files with a bigger file size. In order to check if anything went > > wrong, the best test would be to count the number of lines in each FASTQ > > file (wc -l file). If the output file has more lines, then something went > > wrong. > > > > Hope that answers your questions. > > > > Best, > > Rob > > > > > > > > > > > > On Sat, Jan 26, 2013 at 7:32 PM, <lma...@ir...> wrote: > > > >> hello everyone!! > >> > >> this is a curious trouble. i'm using prinseq for quality filtering of > >> illumina cDNA reads. the program looks great but i have 2 doubts: > >> > >> 1.-the parameters can be used in the same way for cDNA that for genomic > >> DNA? > >> > >> I mean, this tool is aplicable in the same way for rna sequencing that > >> for > >> DNA genomic projects? > >> > >> The other is a mystery that i found, why the input files are bigger than > >> the output files? the size of the bad files is about 10 MB but i can't > >> see > >> this reflected in the good files with respect to the original files > >> uploaded? > >> > >> for example, for my run 1 of illumina: > >> luis@luis-NV57H:~/RUN1$ ls -s > >> total 3679164 > >> 198020 10diasrep1left_run1.fastq 257200 20diasrep1left_run1.fastq > >> 228716 40diasrep1left_run1.fastq 261420 60diasrep1left_run1.fastq > >> 203228 10diasrep1right_run1.fastq 265600 20diasrep1right_run1.fastq > >> 241824 40diasrep1right_run1.fastq 268476 60diasrep1right_run1.fastq > >> 217552 10diasrep2left_run1.fastq 191320 20diasrep2left_run1.fastq > >> 220788 40diasrep2left_run1.fastq 234164 60diasrep2left_run1.fastq > >> 225276 10diasrep2right_run1.fastq 197124 20diasrep2right_run1.fastq > >> 226800 40diasrep2right_run1.fastq 241656 60diasrep2right_run1.fastq > >> > >> after processing and downloading this are the sizes.... > >> > >> luis@luis-NV57H:~/Downloads/RUN1-CLEANED2/GOOD$ ls -s > >> total 3908256 > >> 209680 10diasrep1left_run1_good_1.fastq 203588 > >> 20diasrep2left_run1_good_1.fastq 276984 > >> 60diasrep1left_run1_good_1.fastq > >> 214456 10diasrep1left_run1_good_2.fastq 208924 > >> 20diasrep2left_run1_good_2.fastq 283448 > >> 60diasrep1left_run1_good_2.fastq > >> 231936 10diasrep2left_run1_good_1.fastq 257108 > >> 40diasrep1right_run1_good_1.fastq 248364 > >> 60diasrep2left_run1_good_1.fastq > >> 239036 10diasrep2left_run1_good_2.fastq 244920 > >> 40diasrep1right_run1_good_2.fastq 255248 > >> 60diasrep2left_run1_good_2.fastq > >> 274124 20diasrep1left_run1_good_1.fastq 236520 > >> 40diasrep2left_run1_good_1.fastq > >> 281852 20diasrep1left_run1_good_2.fastq 242068 > >> 40diasrep2left_run1_good_2.fastq > >> > >> > >> > >> #Parameters used for data processing with PRINSEQ > >> (http://prinseq.sf.net) > >> #[01/23/2013 14:48:25] > >> ns_max_p 2 > >> out_format 3 > >> derep 1 > >> lc_method entropy > >> lc_threshold 70 > >> noniupac > >> out_good '10diasrep1left_run1_good' > >> out_bad '10diasrep1left_run1_bad' > >> min_qual_mean 25 > >> log 10diasrep1left_run1_log.txt > >> fastq 10diasrep1left_run1.fastq > >> fastq2 10diasrep1right_run1.fastq > >> > >> thanks a lot for your answer, > >> > >> Luis > >> > >> > >> -- > >> Este mensaje ha sido analizado por MailScanner > >> en busca de virus y otros contenidos peligrosos, > >> y se considera que está limpio. > >> For all your IT requirements visit: http://www.transtec.co.uk > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >> MVPs and experts. ON SALE this month only -- learn more at: > >> http://p.sf.net/sfu/learnnow-d2d > >> _______________________________________________ > >> prinseq-news mailing list > >> pri...@li... > >> https://lists.sourceforge.net/lists/listinfo/prinseq-news > >> > >> > > > > -- > > Este mensaje ha sido analizado por MailScanner > > en busca de virus y otros contenidos peligrosos, > > y se considera que está limpio. > > For all your IT requirements visit: http://www.transtec.co.uk > > > > > > > -- > Este mensaje ha sido analizado por MailScanner > en busca de virus y otros contenidos peligrosos, > y se considera que está limpio. > For all your IT requirements visit: http://www.transtec.co.uk > > |
From: Robert S. <rsc...@gm...> - 2013-01-27 04:52:04
|
*Release of lite version 0.20.2:* Added support for STDOUT output to paired-end processing. Rob |
From: Robert S. <rsc...@gm...> - 2013-01-27 04:08:26
|
Hi Luis, PRINSEQ can be used for any type of sequence data (DNA, RNA, protein, etc.). You will have to adjust the parameters according to the type of data you are processing. To answer your second question. PRINSEQ outputs FASTQ files according to the main FASTQ specification, which includes the header for quality data. You can turn this off with the no_header parameter. Assuming that your input data did not have headers for the quality entries, you will get output files with a bigger file size. In order to check if anything went wrong, the best test would be to count the number of lines in each FASTQ file (wc -l file). If the output file has more lines, then something went wrong. Hope that answers your questions. Best, Rob On Sat, Jan 26, 2013 at 7:32 PM, <lma...@ir...> wrote: > hello everyone!! > > this is a curious trouble. i'm using prinseq for quality filtering of > illumina cDNA reads. the program looks great but i have 2 doubts: > > 1.-the parameters can be used in the same way for cDNA that for genomic > DNA? > > I mean, this tool is aplicable in the same way for rna sequencing that for > DNA genomic projects? > > The other is a mystery that i found, why the input files are bigger than > the output files? the size of the bad files is about 10 MB but i can't see > this reflected in the good files with respect to the original files > uploaded? > > for example, for my run 1 of illumina: > luis@luis-NV57H:~/RUN1$ ls -s > total 3679164 > 198020 10diasrep1left_run1.fastq 257200 20diasrep1left_run1.fastq > 228716 40diasrep1left_run1.fastq 261420 60diasrep1left_run1.fastq > 203228 10diasrep1right_run1.fastq 265600 20diasrep1right_run1.fastq > 241824 40diasrep1right_run1.fastq 268476 60diasrep1right_run1.fastq > 217552 10diasrep2left_run1.fastq 191320 20diasrep2left_run1.fastq > 220788 40diasrep2left_run1.fastq 234164 60diasrep2left_run1.fastq > 225276 10diasrep2right_run1.fastq 197124 20diasrep2right_run1.fastq > 226800 40diasrep2right_run1.fastq 241656 60diasrep2right_run1.fastq > > after processing and downloading this are the sizes.... > > luis@luis-NV57H:~/Downloads/RUN1-CLEANED2/GOOD$ ls -s > total 3908256 > 209680 10diasrep1left_run1_good_1.fastq 203588 > 20diasrep2left_run1_good_1.fastq 276984 60diasrep1left_run1_good_1.fastq > 214456 10diasrep1left_run1_good_2.fastq 208924 > 20diasrep2left_run1_good_2.fastq 283448 60diasrep1left_run1_good_2.fastq > 231936 10diasrep2left_run1_good_1.fastq 257108 > 40diasrep1right_run1_good_1.fastq 248364 60diasrep2left_run1_good_1.fastq > 239036 10diasrep2left_run1_good_2.fastq 244920 > 40diasrep1right_run1_good_2.fastq 255248 60diasrep2left_run1_good_2.fastq > 274124 20diasrep1left_run1_good_1.fastq 236520 > 40diasrep2left_run1_good_1.fastq > 281852 20diasrep1left_run1_good_2.fastq 242068 > 40diasrep2left_run1_good_2.fastq > > > > #Parameters used for data processing with PRINSEQ (http://prinseq.sf.net) > #[01/23/2013 14:48:25] > ns_max_p 2 > out_format 3 > derep 1 > lc_method entropy > lc_threshold 70 > noniupac > out_good '10diasrep1left_run1_good' > out_bad '10diasrep1left_run1_bad' > min_qual_mean 25 > log 10diasrep1left_run1_log.txt > fastq 10diasrep1left_run1.fastq > fastq2 10diasrep1right_run1.fastq > > thanks a lot for your answer, > > Luis > > > -- > Este mensaje ha sido analizado por MailScanner > en busca de virus y otros contenidos peligrosos, > y se considera que está limpio. > For all your IT requirements visit: http://www.transtec.co.uk > > > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > prinseq-news mailing list > pri...@li... > https://lists.sourceforge.net/lists/listinfo/prinseq-news > > |
From: <lma...@ir...> - 2013-01-27 03:48:03
|
---------------------------- Mensaje original ---------------------------- Asunto: [Fwd: prinseq for rna-seq] De: lma...@ir... Fecha: Sab, 26 de Enero de 2013, 9:41 pm Para: pri...@li.... -------------------------------------------------------------------------- ---------------------------- Mensaje original ---------------------------- Asunto: prinseq for rna-seq De: lma...@ir... Fecha: Sab, 26 de Enero de 2013, 9:32 pm Para: pri...@li.... -------------------------------------------------------------------------- hello everyone!! this is a curious trouble. i'm using prinseq for quality filtering of illumina cDNA reads. the program looks great but i have 2 doubts: 1.-the parameters can be used in the same way for cDNA that for genomic DNA? I mean, this tool is aplicable in the same way for rna sequencing that for DNA genomic projects? The other is a mystery that i found, why the input files are bigger than the output files? the size of the bad files is about 10 MB but i can't see this reflected in the good files with respect to the original files uploaded? for example, for my run 1 of illumina: luis@luis-NV57H:~/RUN1$ ls -s total 3679164 198020 10diasrep1left_run1.fastq 257200 20diasrep1left_run1.fastq 228716 40diasrep1left_run1.fastq 261420 60diasrep1left_run1.fastq 203228 10diasrep1right_run1.fastq 265600 20diasrep1right_run1.fastq 241824 40diasrep1right_run1.fastq 268476 60diasrep1right_run1.fastq 217552 10diasrep2left_run1.fastq 191320 20diasrep2left_run1.fastq 220788 40diasrep2left_run1.fastq 234164 60diasrep2left_run1.fastq 225276 10diasrep2right_run1.fastq 197124 20diasrep2right_run1.fastq 226800 40diasrep2right_run1.fastq 241656 60diasrep2right_run1.fastq after processing and downloading this are the sizes.... luis@luis-NV57H:~/Downloads/RUN1-CLEANED2/GOOD$ ls -s total 3908256 209680 10diasrep1left_run1_good_1.fastq 203588 20diasrep2left_run1_good_1.fastq 276984 60diasrep1left_run1_good_1.fastq 214456 10diasrep1left_run1_good_2.fastq 208924 20diasrep2left_run1_good_2.fastq 283448 60diasrep1left_run1_good_2.fastq 231936 10diasrep2left_run1_good_1.fastq 257108 40diasrep1right_run1_good_1.fastq 248364 60diasrep2left_run1_good_1.fastq 239036 10diasrep2left_run1_good_2.fastq 244920 40diasrep1right_run1_good_2.fastq 255248 60diasrep2left_run1_good_2.fastq 274124 20diasrep1left_run1_good_1.fastq 236520 40diasrep2left_run1_good_1.fastq 281852 20diasrep1left_run1_good_2.fastq 242068 40diasrep2left_run1_good_2.fastq #Parameters used for data processing with PRINSEQ (http://prinseq.sf.net) #[01/23/2013 14:48:25] ns_max_p 2 out_format 3 derep 1 lc_method entropy lc_threshold 70 noniupac out_good '10diasrep1left_run1_good' out_bad '10diasrep1left_run1_bad' min_qual_mean 25 log 10diasrep1left_run1_log.txt fastq 10diasrep1left_run1.fastq fastq2 10diasrep1right_run1.fastq thanks a lot for your answer, Luis -- Este mensaje ha sido analizado por MailScanner en busca de virus y otros contenidos peligrosos, y se considera que estimpio. For all your IT requirements visit: http://www.transtec.co.uk |
From: <lma...@ir...> - 2013-01-27 03:46:54
|
---------------------------- Mensaje original ---------------------------- Asunto: prinseq for rna-seq De: lma...@ir... Fecha: Sab, 26 de Enero de 2013, 9:32 pm Para: pri...@li.... -------------------------------------------------------------------------- hello everyone!! this is a curious trouble. i'm using prinseq for quality filtering of illumina cDNA reads. the program looks great but i have 2 doubts: 1.-the parameters can be used in the same way for cDNA that for genomic DNA? I mean, this tool is aplicable in the same way for rna sequencing that for DNA genomic projects? The other is a mystery that i found, why the input files are bigger than the output files? the size of the bad files is about 10 MB but i can't see this reflected in the good files with respect to the original files uploaded? for example, for my run 1 of illumina: luis@luis-NV57H:~/RUN1$ ls -s total 3679164 198020 10diasrep1left_run1.fastq 257200 20diasrep1left_run1.fastq 228716 40diasrep1left_run1.fastq 261420 60diasrep1left_run1.fastq 203228 10diasrep1right_run1.fastq 265600 20diasrep1right_run1.fastq 241824 40diasrep1right_run1.fastq 268476 60diasrep1right_run1.fastq 217552 10diasrep2left_run1.fastq 191320 20diasrep2left_run1.fastq 220788 40diasrep2left_run1.fastq 234164 60diasrep2left_run1.fastq 225276 10diasrep2right_run1.fastq 197124 20diasrep2right_run1.fastq 226800 40diasrep2right_run1.fastq 241656 60diasrep2right_run1.fastq after processing and downloading this are the sizes.... luis@luis-NV57H:~/Downloads/RUN1-CLEANED2/GOOD$ ls -s total 3908256 209680 10diasrep1left_run1_good_1.fastq 203588 20diasrep2left_run1_good_1.fastq 276984 60diasrep1left_run1_good_1.fastq 214456 10diasrep1left_run1_good_2.fastq 208924 20diasrep2left_run1_good_2.fastq 283448 60diasrep1left_run1_good_2.fastq 231936 10diasrep2left_run1_good_1.fastq 257108 40diasrep1right_run1_good_1.fastq 248364 60diasrep2left_run1_good_1.fastq 239036 10diasrep2left_run1_good_2.fastq 244920 40diasrep1right_run1_good_2.fastq 255248 60diasrep2left_run1_good_2.fastq 274124 20diasrep1left_run1_good_1.fastq 236520 40diasrep2left_run1_good_1.fastq 281852 20diasrep1left_run1_good_2.fastq 242068 40diasrep2left_run1_good_2.fastq #Parameters used for data processing with PRINSEQ (http://prinseq.sf.net) #[01/23/2013 14:48:25] ns_max_p 2 out_format 3 derep 1 lc_method entropy lc_threshold 70 noniupac out_good '10diasrep1left_run1_good' out_bad '10diasrep1left_run1_bad' min_qual_mean 25 log 10diasrep1left_run1_log.txt fastq 10diasrep1left_run1.fastq fastq2 10diasrep1right_run1.fastq thanks a lot for your answer, Luis -- Este mensaje ha sido analizado por MailScanner en busca de virus y otros contenidos peligrosos, y se considera que estimpio. For all your IT requirements visit: http://www.transtec.co.uk |
From: <lma...@ir...> - 2013-01-27 03:37:16
|
hello everyone!! this is a curious trouble. i'm using prinseq for quality filtering of illumina cDNA reads. the program looks great but i have 2 doubts: 1.-the parameters can be used in the same way for cDNA that for genomic DNA? I mean, this tool is aplicable in the same way for rna sequencing that for DNA genomic projects? The other is a mystery that i found, why the input files are bigger than the output files? the size of the bad files is about 10 MB but i can't see this reflected in the good files with respect to the original files uploaded? for example, for my run 1 of illumina: luis@luis-NV57H:~/RUN1$ ls -s total 3679164 198020 10diasrep1left_run1.fastq 257200 20diasrep1left_run1.fastq 228716 40diasrep1left_run1.fastq 261420 60diasrep1left_run1.fastq 203228 10diasrep1right_run1.fastq 265600 20diasrep1right_run1.fastq 241824 40diasrep1right_run1.fastq 268476 60diasrep1right_run1.fastq 217552 10diasrep2left_run1.fastq 191320 20diasrep2left_run1.fastq 220788 40diasrep2left_run1.fastq 234164 60diasrep2left_run1.fastq 225276 10diasrep2right_run1.fastq 197124 20diasrep2right_run1.fastq 226800 40diasrep2right_run1.fastq 241656 60diasrep2right_run1.fastq after processing and downloading this are the sizes.... luis@luis-NV57H:~/Downloads/RUN1-CLEANED2/GOOD$ ls -s total 3908256 209680 10diasrep1left_run1_good_1.fastq 203588 20diasrep2left_run1_good_1.fastq 276984 60diasrep1left_run1_good_1.fastq 214456 10diasrep1left_run1_good_2.fastq 208924 20diasrep2left_run1_good_2.fastq 283448 60diasrep1left_run1_good_2.fastq 231936 10diasrep2left_run1_good_1.fastq 257108 40diasrep1right_run1_good_1.fastq 248364 60diasrep2left_run1_good_1.fastq 239036 10diasrep2left_run1_good_2.fastq 244920 40diasrep1right_run1_good_2.fastq 255248 60diasrep2left_run1_good_2.fastq 274124 20diasrep1left_run1_good_1.fastq 236520 40diasrep2left_run1_good_1.fastq 281852 20diasrep1left_run1_good_2.fastq 242068 40diasrep2left_run1_good_2.fastq #Parameters used for data processing with PRINSEQ (http://prinseq.sf.net) #[01/23/2013 14:48:25] ns_max_p 2 out_format 3 derep 1 lc_method entropy lc_threshold 70 noniupac out_good '10diasrep1left_run1_good' out_bad '10diasrep1left_run1_bad' min_qual_mean 25 log 10diasrep1left_run1_log.txt fastq 10diasrep1left_run1.fastq fastq2 10diasrep1right_run1.fastq thanks a lot for your answer, Luis -- Este mensaje ha sido analizado por MailScanner en busca de virus y otros contenidos peligrosos, y se considera que estimpio. For all your IT requirements visit: http://www.transtec.co.uk |
From: Robert S. <rsc...@gm...> - 2013-01-07 18:33:51
|
*Release of web version 0.20.1:* Release of web version files to run the web version on a local machine. Please let me know if you are using the web version and you have trouble installing it. This will help me to add information to the README file to ease the installation for other users. Thanks, Rob |
From: Robert S. <rsc...@gm...> - 2013-01-04 01:17:24
|
*Release of lite version 0.20.1:* Fixed issue with FASTA inputs that caused the program to exit. Rob |
From: Robert S. <rsc...@gm...> - 2012-12-25 22:38:37
|
*Release of lite version 0.20:* Fixed depricated use of 'defined' on aggregates. Added options "trim_left_p" and "trim_right_p" to trim reads by a percentage value in addition to options that trim by number of nucleotides. Added option "stats_assembly" to report N50, N90, etc contig size in the standalone version's summary statistics output. Added support for paired-end data (new options "fasta2" and "fastq2"). *Release of graphs version 0.6:* Added support for paired-end data. |
From: Robert S. <rsc...@gm...> - 2012-11-01 23:11:29
|
Hi Chih-Ming, PRINSEQ does not process FASTQ files with wrapped sequence or quality entries. The other two differences you listed won't effect the file parsing. Here is an excerpt from the FASTQ entry at Wikipedia ( http://en.wikipedia.org/wiki/FASTQ_format): The original Sanger FASTQ files also allowed the sequence and quality strings to be wrapped (split over multiple lines), but this is generally discouraged as it can make parsing complicated due to the unfortunate choice of "@" and "+" as markers (these characters can also occur in the quality string). There was a request on the SamTools developer page ( http://sourceforge.net/mailarchive/message.php?msg_id=28686995), but I am not sure if SamTools was actually modified. Heng wrote another tool that can convert the multi-line FASTQ file to the single line one. The tools can be found here: https://github.com/lh3/seqtk Command to use: seqtk seq -l0 in.fastq > out.fastq Best, Rob On Thu, Nov 1, 2012 at 3:23 AM, Hung Chih-Ming <ym...@gm...> wrote: > Hi, > > I want to convert the fastq file to fasta using prinseq. But I got a > error message: ERROR: input file for -fastq is in UNKNOWN format not in > FASTQ format. > My fastq file is generated from samtools. > After comparing my file with example1.fastq (come together with prinseq), > I find that my file (1) does not have length info after sequence names, (2) > no sequence name after + (followed by quality info), and that (3) the > sequences of one sample are not in the same lines (wrapped into several > lines). > What can I do to made prinseq read my fastq input? > > Thanks, > > Chih-Ming > > > > > ------------------------------------------------------------------------------ > LogMeIn Central: Instant, anywhere, Remote PC access and management. > Stay in control, update software, and manage PCs from one command center > Diagnose problems and improve visibility into emerging IT issues > Automate, monitor and manage. Do more in less time with Central > http://p.sf.net/sfu/logmein12331_d2d > _______________________________________________ > prinseq-news mailing list > pri...@li... > https://lists.sourceforge.net/lists/listinfo/prinseq-news > > |
From: Hung Chih-M. <ym...@gm...> - 2012-11-01 10:23:35
|
Hi, I want to convert the fastq file to fasta using prinseq. But I got a error message: ERROR: input file for -fastq is in UNKNOWN format not in FASTQ format. My fastq file is generated from samtools. After comparing my file with example1.fastq (come together with prinseq), I find that my file (1) does not have length info after sequence names, (2) no sequence name after + (followed by quality info), and that (3) the sequences of one sample are not in the same lines (wrapped into several lines). What can I do to made prinseq read my fastq input? Thanks, Chih-Ming |
From: Robert S. <rsc...@gm...> - 2012-10-03 03:58:49
|
Hi Ian, The current version does not handle paired-end data. Paired-end data will be supported with version 0.20 expected to be released within this month. Best, Rob On Tue, Oct 2, 2012 at 6:31 PM, Misner, Ian <im...@to...> wrote: > Hello, > > How does the duplicate remover handle paired end reads. Will it check for > a duplicate on both reads or will it create orphans? > > Cheers > Ian > > -------------------------------------------- > Ian Misner, Ph.D. > Postdoc > Computer and Information Sciences > Towson University > 7800 York Road, Rm 447 > Towson, MD 21252 > Ph. > Fax 410-704-3868 > -------------------------------------------- > > > > ------------------------------------------------------------------------------ > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > _______________________________________________ > prinseq-news mailing list > pri...@li... > https://lists.sourceforge.net/lists/listinfo/prinseq-news > |
From: Misner, I. <im...@to...> - 2012-10-03 01:31:10
|
Hello, How does the duplicate remover handle paired end reads. Will it check for a duplicate on both reads or will it create orphans? Cheers Ian -------------------------------------------- Ian Misner, Ph.D. Postdoc Computer and Information Sciences Towson University 7800 York Road, Rm 447 Towson, MD 21252 Ph. Fax 410-704-3868 -------------------------------------------- |
From: Robert S. <rsc...@gm...> - 2012-09-27 23:29:45
|
Fixed issue of incorrect quality trimming with arguments "min" and "max" for option -trim_qual_type. Rob |
From: Robert S. <rsc...@gm...> - 2012-09-05 19:35:55
|
Fixed issues related to the use of qw() in loops for Perl version 5.14+ (thanks to Evan Staton for pointing out the issue and providing the link with details: http://search.cpan.org/~jesse/perl-5.14.0/pod/perldelta.pod#Use_of_qw(...)_as_parentheses). Fixed issue with 5'/3' duplicate removal that forced option -exact_only (thanks to Stephanie Pierson for reporting the issue). Fixed issue with missing duplicate statistics in graph data output if -derep or -graph_stats was not specified. Suppressed output of PCA module when generating PCA plots. Rob |
From: Robert S. <rsc...@gm...> - 2012-06-27 03:10:42
|
*Release of lite version 0.19.3:* Added new output file option to keep track of sequence identifier renaming (option -seq_id_mappings). Fixed trim_qual_rule parameter listed twice in the log file. Fixed issue with sequences of length 3bp when calculating DUST scores. Fixed issue with exact_only parameter check. Rob |
From: Robert S. <rsc...@gm...> - 2012-05-28 21:32:37
|
*Release of lite version 0.19.2:* Increased memory efficiency for graph data calculation on big input files. Rob |
From: Robert S. <rsc...@gm...> - 2012-05-28 04:44:27
|
*Release of lite version 0.19.1:* Fixed rounding issue in sequence complexity calculation. Rob |