You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(29) |
May
(8) |
Jun
(5) |
Jul
(46) |
Aug
(16) |
Sep
(5) |
Oct
(6) |
Nov
(17) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(5) |
Feb
(2) |
Mar
(10) |
Apr
(13) |
May
(20) |
Jun
(7) |
Jul
(6) |
Aug
(14) |
Sep
(9) |
Oct
(19) |
Nov
(17) |
Dec
(3) |
2014 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(30) |
Jul
(10) |
Aug
(2) |
Sep
(18) |
Oct
(3) |
Nov
(4) |
Dec
(13) |
2015 |
Jan
(27) |
Feb
|
Mar
(19) |
Apr
(12) |
May
(10) |
Jun
(18) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
(9) |
2016 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Walenz, B. <bw...@jc...> - 2013-05-30 18:28:12
|
[Outlook is, at least, better than IBM/Lotus Notes which Celera switched over to near the end] On 5/30/13 2:11 PM, "Ole Kristian Tørresen" <o.k...@ib...> wrote: 1. 2. Use the same set of repetitive kmers for each assembly — save your favorite 0-mercounts/*fasta and pass that to future assemblies using obtFrequentMers and ovlFrequentMers (both are probably needed, even if OBT is turned off). singletons: They’re singletons because they have no overlaps. However, they’re also repeats so the number of overlaps will tend to explode if you hold try to hold the ‘repeat content’ constant across assemblies. It’s a difficult parameter to set optimally. Too low and you get no repeats; too high and you spend a lot of time computing overlaps and run out of disk. Luckily, I got access a lot of CPUs and a lot of disk space, so I can afford to set it a bit on the high side. Would it affect memory usage for bogart much? The run I have now, with 52x coverage of a 830 Mbp genome, 48 ovlMinLen, 5000 merThreshold, bogart use about 750 GB of memory. I can go up to 1 TB, but I shouldn't use more than a week at unitigging. Ole Bogart will pick out just the best overlaps to use so that it will fit into a specific memory size. By default, it will use all available memory on the machine. You can limit it with batMemory. To help your run time, bogart can pre-build this dataset in a small memory footprint. All it does here is read the overlapStore (several TB, probably) and write out a binary file (750gb). You’ll need to run bogart by hand, using the ‘-create’ option. Once created, the real run of bogart will memory map this 750gb-ish file and start processing. The real run might need to have option ‘-save’ in order to load the file. As usual, test this out on a trivial assembly. For salmon, building the binary file took longer than the bogart run. b |
From: Ole K. T. <o.k...@ib...> - 2013-05-30 18:11:15
|
On 30 May 2013 19:49, Walenz, Brian <bw...@jc...> wrote: > [Outlook really makes a mess of inline replies] > They're forcing you to use Outlook?! Fate is sadly cruel. > > mer thresholds: You have two options, both try to identify the set of > repetitive kmers for the genome. > > > 1. Use ‘fraction distinct’ to set the mer threshold. It’s the third > column in the meryl histogram output. The idea is to exclude the most > common x% of kmers from seeding overlaps. For example, if you pick > fraction distinct of 0.9995, you’d be excluding the most repetitive 0.05% > kmers from seeding overlaps, regardless of coverage. There sadly isn’t > runCA support for this. > > In the first assembly, I actually exclude 0.02 % of the most repetitive kmers: 5000 45 0.9998 0.8948 In the second, I "only" exclude the most 0.03 % of the repetitive kmers: 5000 49 0.9997 0.8656 I guess that might be enough difference to see the change in overlaps I see. I think I'll try to set merThreshold to 10000 and start a second assembly and see what happens. The assembly with merTreshold at 5000 might turn out to be pretty good. > > 1. > 2. Use the same set of repetitive kmers for each assembly — save your > favorite 0-mercounts/*fasta and pass that to future assemblies using > obtFrequentMers and ovlFrequentMers (both are probably needed, even if OBT > is turned off). > > > singletons: They’re singletons because they have no overlaps. However, > they’re also repeats so the number of overlaps will tend to explode if you > hold try to hold the ‘repeat content’ constant across assemblies. It’s a > difficult parameter to set optimally. Too low and you get no repeats; too > high and you spend a lot of time computing overlaps and run out of disk. > Luckily, I got access a lot of CPUs and a lot of disk space, so I can afford to set it a bit on the high side. Would it affect memory usage for bogart much? The run I have now, with 52x coverage of a 830 Mbp genome, 48 ovlMinLen, 5000 merThreshold, bogart use about 750 GB of memory. I can go up to 1 TB, but I shouldn't use more than a week at unitigging. Ole > > > b > > > > On 5/30/13 7:28 AM, "Ole Kristian Tørresen" <o.k...@ib...> > wrote: > > Hi Brian. > > > On 30 May 2013 13:09, Walenz, Brian <bw...@jc...> wrote: > > Hi, Ole- > > Check the mer thresholds. Adding reads will change the threshold, and if > the second assembly has a much lower threshold, it is possible the > additional singletons are just from more aggressive repeat masking. Off > the top of my head, I can’t think of an easy way to verify that the new > singletons are repeats. > > > I've set the mer threshold to 5000 in both assemblies. Maybe I should > increase it when I increase the coverage of the genome from 32x to 52x, but > that's should get repeats that are about 100 copies? Maybe set it to 7000? > > > > Unmated singletons aren’t used during scaffolding. > > Mated singletons can be placed within an existing contig, for better > consensus sequences (probably not useful) or in a gap. It would be better > to get them added during unitig construction though. CGW will probably run > slower; I don’t remember how inefficient it is at handling lots of > singletons. > > > "get them added during unitig construction"? How would I do that? Change > the mer threshold? > > Almost all singletons are mated, so they might get added during > scaffolding then. > > Thank you. > > Ole > > > > analyzeBest is counting, per library, the number of spurs (and contained > and container reads). It does seem to predate the best.singletons file, so > the singleton count will be zero. > > b > > > > > On 5/29/13 4:36 PM, "Ole Kristian Tørresen" <o.k...@ib... < > http://o.k...@ib...> > wrote: > > Hi, > I've got an assembly combining 454 and Illumina data (5k MP), using > bogart. Since this is a larger genome (830 Mbp), and relatively high > coverage, I tried setting ovlMinLen to 60 to avoid getting a large > ovlStore. The assembly didn't turn out as well as I had hoped, and it seems > that a lot of my Illumina reads are singletons (being between 64 and 100 bp > makes it harder to find overlaps). I guess that means that they don't > overlap with other reads. Of about 40 M reads (20 M pairs), 5 M are > singletons. This means that a larger fraction of this library is classified > as bothChaff or oneChaff (looking at the *mates file in 9-terminator) than > the other paired libraries (which are all 454 reads). > > I had hoped to reduce the amount of singletons by setting the ovlMinLen to > 48. Optimally, I would have only changed this with the assembly, but I > added a lot of Illumina PE (20x) too. > > If I now look at the best.singletons, more reads in each library are > singletons! This is the oposite of what I would expect, but maybe the > Illumina PE library affects this too. > > It would have been nice to run the analyzeBest program on both of these > assemblies to see if anything changes, but it can't handle the > best.singletons file (was written before bog and bogart output it I guess). > I guess it doesn't matter much though, but I'm not sure I understand the > output properly. > > So, I guess I can't automatically expect less singletons when I reduce the > ovlMinLen. I'm I correct in assuming that the singletons don't contribute > to the further assembly? They can't be used for scaffolding because they > don't have any overlaps with other reads. > > A bit rambling e-mail, but I hope it's mostly understandable. > > Thank you, > Ole > > > > > > > > |
From: Walenz, B. <bw...@jc...> - 2013-05-30 17:50:50
|
[Outlook really makes a mess of inline replies] mer thresholds: You have two options, both try to identify the set of repetitive kmers for the genome. 1. Use ‘fraction distinct’ to set the mer threshold. It’s the third column in the meryl histogram output. The idea is to exclude the most common x% of kmers from seeding overlaps. For example, if you pick fraction distinct of 0.9995, you’d be excluding the most repetitive 0.05% kmers from seeding overlaps, regardless of coverage. There sadly isn’t runCA support for this. 2. Use the same set of repetitive kmers for each assembly — save your favorite 0-mercounts/*fasta and pass that to future assemblies using obtFrequentMers and ovlFrequentMers (both are probably needed, even if OBT is turned off). singletons: They’re singletons because they have no overlaps. However, they’re also repeats so the number of overlaps will tend to explode if you hold try to hold the ‘repeat content’ constant across assemblies. It’s a difficult parameter to set optimally. Too low and you get no repeats; too high and you spend a lot of time computing overlaps and run out of disk. b On 5/30/13 7:28 AM, "Ole Kristian Tørresen" <o.k...@ib...> wrote: Hi Brian. On 30 May 2013 13:09, Walenz, Brian <bw...@jc...> wrote: Hi, Ole- Check the mer thresholds. Adding reads will change the threshold, and if the second assembly has a much lower threshold, it is possible the additional singletons are just from more aggressive repeat masking. Off the top of my head, I can’t think of an easy way to verify that the new singletons are repeats. I've set the mer threshold to 5000 in both assemblies. Maybe I should increase it when I increase the coverage of the genome from 32x to 52x, but that's should get repeats that are about 100 copies? Maybe set it to 7000? Unmated singletons aren’t used during scaffolding. Mated singletons can be placed within an existing contig, for better consensus sequences (probably not useful) or in a gap. It would be better to get them added during unitig construction though. CGW will probably run slower; I don’t remember how inefficient it is at handling lots of singletons. "get them added during unitig construction"? How would I do that? Change the mer threshold? Almost all singletons are mated, so they might get added during scaffolding then. Thank you. Ole analyzeBest is counting, per library, the number of spurs (and contained and container reads). It does seem to predate the best.singletons file, so the singleton count will be zero. b On 5/29/13 4:36 PM, "Ole Kristian Tørresen" <o.k...@ib... <http://o.k...@ib...> > wrote: Hi, I've got an assembly combining 454 and Illumina data (5k MP), using bogart. Since this is a larger genome (830 Mbp), and relatively high coverage, I tried setting ovlMinLen to 60 to avoid getting a large ovlStore. The assembly didn't turn out as well as I had hoped, and it seems that a lot of my Illumina reads are singletons (being between 64 and 100 bp makes it harder to find overlaps). I guess that means that they don't overlap with other reads. Of about 40 M reads (20 M pairs), 5 M are singletons. This means that a larger fraction of this library is classified as bothChaff or oneChaff (looking at the *mates file in 9-terminator) than the other paired libraries (which are all 454 reads). I had hoped to reduce the amount of singletons by setting the ovlMinLen to 48. Optimally, I would have only changed this with the assembly, but I added a lot of Illumina PE (20x) too. If I now look at the best.singletons, more reads in each library are singletons! This is the oposite of what I would expect, but maybe the Illumina PE library affects this too. It would have been nice to run the analyzeBest program on both of these assemblies to see if anything changes, but it can't handle the best.singletons file (was written before bog and bogart output it I guess). I guess it doesn't matter much though, but I'm not sure I understand the output properly. So, I guess I can't automatically expect less singletons when I reduce the ovlMinLen. I'm I correct in assuming that the singletons don't contribute to the further assembly? They can't be used for scaffolding because they don't have any overlaps with other reads. A bit rambling e-mail, but I hope it's mostly understandable. Thank you, Ole |
From: Walenz, B. <bw...@jc...> - 2013-05-30 11:40:23
|
Hi, Ben- Somehow, various flags on reads are inconsistent. Unless you’ve got a LOT of time invested in this cgw run, I’d recommend deleting the 7* directories and starting scaffolding again. Is this near the start of the cgw process? Can you get a stack trace? If not, what does the *timing file contain? Were there any oddities in the scaffold run so far? It’s failing because it wants to compute the insert size of a specific mate pair, but the two reads are in different contigs. We can get around the current failure by explicitly ignoring this pair. Right before the assert that fails, add: if (frag->contigID != mate->contigID) continue; I.e., Doctor, it hurts when I do this. Then don’t do that! b On 5/28/13 10:06 AM, "Ben Elsworth" <el...@gm...> wrote: Hi Brian, That didn't seem to help. I tried editing an earlier section of code in a similar way by adding this: if (mateContig == NULL){ continue; } below this: mateContig = GetGraphNode( ScaffoldGraph->ContigGraph, mate->contigID); But that led to another error: cgw: GraphCGW_T.C:3232: void ComputeMatePairStatisticsRestricted(int, int32, char*): Assertion `frag->contigID == mate->contigID' failed. Any idea what's going on? Cheers, Ben On 28 May 2013 11:00, Walenz, Brian <bw...@jc...> wrote: Hi Ben- I think this is harmless, and can be patched around. > To patch it up, and maybe avoid the crash here, add the following at line > 3206 in AS_CGW/GraphCGW_T.c > > if (extremeContig == NULL) > continue; > > Line 3206 is just after "extremeContig = GetGraphNode(...)" and before the > call to GetContigPositionInScaffold(). Line numbers above are relative to the latest code base. In 6.1, unless masurca fiddled with GraphCGW_T.c, you want line 3296, in between these two lines: extremeContig = GetGraphNode( ScaffoldGraph->ContigGraph, scaff->info.Scaffold.BEndCI); GetContigPositionInScaffold ( extremeContig, &contigLeftEnd, &contigRightEnd, &contigScaffoldOrientation); The one other assembly that failed here (it was recent, too) finished successfully after the patch. b On 5/27/13 7:43 AM, "Ben Elsworth" <el...@gm... <http://el...@gm...> > wrote: Hi, I'm running v6.1 within MaSuRCA and keep getting this error during the cgw step: cgw: GraphCGW_T.C:3275: void ComputeMatePairStatisticsRestricted(int, int32, char*): Assertion `(mateContig) != __null' failed. It occurs after a lot of warnings about negative variance. I've tried following the advice here - http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Scaffolder_failure but keep getting the error. Any ideas? Cheers, Ben |
From: Ole K. T. <o.k...@ib...> - 2013-05-30 11:28:17
|
Hi Brian. On 30 May 2013 13:09, Walenz, Brian <bw...@jc...> wrote: > Hi, Ole- > > Check the mer thresholds. Adding reads will change the threshold, and if > the second assembly has a much lower threshold, it is possible the > additional singletons are just from more aggressive repeat masking. Off > the top of my head, I can’t think of an easy way to verify that the new > singletons are repeats. > I've set the mer threshold to 5000 in both assemblies. Maybe I should increase it when I increase the coverage of the genome from 32x to 52x, but that's should get repeats that are about 100 copies? Maybe set it to 7000? > > Unmated singletons aren’t used during scaffolding. > > Mated singletons can be placed within an existing contig, for better > consensus sequences (probably not useful) or in a gap. It would be better > to get them added during unitig construction though. CGW will probably run > slower; I don’t remember how inefficient it is at handling lots of > singletons. > "get them added during unitig construction"? How would I do that? Change the mer threshold? Almost all singletons are mated, so they might get added during scaffolding then. Thank you. Ole > > analyzeBest is counting, per library, the number of spurs (and contained > and container reads). It does seem to predate the best.singletons file, so > the singleton count will be zero. > > b > > > > > On 5/29/13 4:36 PM, "Ole Kristian Tørresen" <o.k...@ib...> > wrote: > > Hi, > I've got an assembly combining 454 and Illumina data (5k MP), using > bogart. Since this is a larger genome (830 Mbp), and relatively high > coverage, I tried setting ovlMinLen to 60 to avoid getting a large > ovlStore. The assembly didn't turn out as well as I had hoped, and it seems > that a lot of my Illumina reads are singletons (being between 64 and 100 bp > makes it harder to find overlaps). I guess that means that they don't > overlap with other reads. Of about 40 M reads (20 M pairs), 5 M are > singletons. This means that a larger fraction of this library is classified > as bothChaff or oneChaff (looking at the *mates file in 9-terminator) than > the other paired libraries (which are all 454 reads). > > I had hoped to reduce the amount of singletons by setting the ovlMinLen to > 48. Optimally, I would have only changed this with the assembly, but I > added a lot of Illumina PE (20x) too. > > If I now look at the best.singletons, more reads in each library are > singletons! This is the oposite of what I would expect, but maybe the > Illumina PE library affects this too. > > It would have been nice to run the analyzeBest program on both of these > assemblies to see if anything changes, but it can't handle the > best.singletons file (was written before bog and bogart output it I guess). > I guess it doesn't matter much though, but I'm not sure I understand the > output properly. > > So, I guess I can't automatically expect less singletons when I reduce the > ovlMinLen. I'm I correct in assuming that the singletons don't contribute > to the further assembly? They can't be used for scaffolding because they > don't have any overlaps with other reads. > > A bit rambling e-mail, but I hope it's mostly understandable. > > Thank you, > Ole > > > > > > |
From: Walenz, B. <bw...@jc...> - 2013-05-30 11:10:11
|
Hi, Ole- Check the mer thresholds. Adding reads will change the threshold, and if the second assembly has a much lower threshold, it is possible the additional singletons are just from more aggressive repeat masking. Off the top of my head, I can’t think of an easy way to verify that the new singletons are repeats. Unmated singletons aren’t used during scaffolding. Mated singletons can be placed within an existing contig, for better consensus sequences (probably not useful) or in a gap. It would be better to get them added during unitig construction though. CGW will probably run slower; I don’t remember how inefficient it is at handling lots of singletons. analyzeBest is counting, per library, the number of spurs (and contained and container reads). It does seem to predate the best.singletons file, so the singleton count will be zero. b On 5/29/13 4:36 PM, "Ole Kristian Tørresen" <o.k...@ib...> wrote: Hi, I've got an assembly combining 454 and Illumina data (5k MP), using bogart. Since this is a larger genome (830 Mbp), and relatively high coverage, I tried setting ovlMinLen to 60 to avoid getting a large ovlStore. The assembly didn't turn out as well as I had hoped, and it seems that a lot of my Illumina reads are singletons (being between 64 and 100 bp makes it harder to find overlaps). I guess that means that they don't overlap with other reads. Of about 40 M reads (20 M pairs), 5 M are singletons. This means that a larger fraction of this library is classified as bothChaff or oneChaff (looking at the *mates file in 9-terminator) than the other paired libraries (which are all 454 reads). I had hoped to reduce the amount of singletons by setting the ovlMinLen to 48. Optimally, I would have only changed this with the assembly, but I added a lot of Illumina PE (20x) too. If I now look at the best.singletons, more reads in each library are singletons! This is the oposite of what I would expect, but maybe the Illumina PE library affects this too. It would have been nice to run the analyzeBest program on both of these assemblies to see if anything changes, but it can't handle the best.singletons file (was written before bog and bogart output it I guess). I guess it doesn't matter much though, but I'm not sure I understand the output properly. So, I guess I can't automatically expect less singletons when I reduce the ovlMinLen. I'm I correct in assuming that the singletons don't contribute to the further assembly? They can't be used for scaffolding because they don't have any overlaps with other reads. A bit rambling e-mail, but I hope it's mostly understandable. Thank you, Ole |
From: Ole K. T. <o.k...@ib...> - 2013-05-29 20:36:22
|
Hi, I've got an assembly combining 454 and Illumina data (5k MP), using bogart. Since this is a larger genome (830 Mbp), and relatively high coverage, I tried setting ovlMinLen to 60 to avoid getting a large ovlStore. The assembly didn't turn out as well as I had hoped, and it seems that a lot of my Illumina reads are singletons (being between 64 and 100 bp makes it harder to find overlaps). I guess that means that they don't overlap with other reads. Of about 40 M reads (20 M pairs), 5 M are singletons. This means that a larger fraction of this library is classified as bothChaff or oneChaff (looking at the *mates file in 9-terminator) than the other paired libraries (which are all 454 reads). I had hoped to reduce the amount of singletons by setting the ovlMinLen to 48. Optimally, I would have only changed this with the assembly, but I added a lot of Illumina PE (20x) too. If I now look at the best.singletons, more reads in each library are singletons! This is the oposite of what I would expect, but maybe the Illumina PE library affects this too. It would have been nice to run the analyzeBest program on both of these assemblies to see if anything changes, but it can't handle the best.singletons file (was written before bog and bogart output it I guess). I guess it doesn't matter much though, but I'm not sure I understand the output properly. So, I guess I can't automatically expect less singletons when I reduce the ovlMinLen. I'm I correct in assuming that the singletons don't contribute to the further assembly? They can't be used for scaffolding because they don't have any overlaps with other reads. A bit rambling e-mail, but I hope it's mostly understandable. Thank you, Ole |
From: Ben E. <el...@gm...> - 2013-05-28 14:07:24
|
Hi Brian, That didn't seem to help. I tried editing an earlier section of code in a similar way by adding this: if (mateContig == NULL){ continue; } below this: mateContig = GetGraphNode( ScaffoldGraph->ContigGraph, mate->contigID); But that led to another error: cgw: GraphCGW_T.C:3232: void ComputeMatePairStatisticsRestricted(int, int32, char*): Assertion `frag->contigID == mate->contigID' failed. Any idea what's going on? Cheers, Ben On 28 May 2013 11:00, Walenz, Brian <bw...@jc...> wrote: > Hi Ben- > > I think this is harmless, and can be patched around. > > > To patch it up, and maybe avoid the crash here, add the following at line > > 3206 in AS_CGW/GraphCGW_T.c > > > > if (extremeContig == NULL) > > continue; > > > > Line 3206 is just after "extremeContig = GetGraphNode(...)" and before > the > > call to GetContigPositionInScaffold(). > > Line numbers above are relative to the latest code base. In 6.1, unless > masurca fiddled with GraphCGW_T.c, you want line 3296, in between these two > lines: > > extremeContig = GetGraphNode( ScaffoldGraph->ContigGraph, > scaff->info.Scaffold.BEndCI); > GetContigPositionInScaffold ( extremeContig, &contigLeftEnd, > &contigRightEnd, &contigScaffoldOrientation); > > The one other assembly that failed here (it was recent, too) finished > successfully after the patch. > > b > > > > On 5/27/13 7:43 AM, "Ben Elsworth" <el...@gm...> wrote: > > Hi, > > I'm running v6.1 within MaSuRCA and keep getting this error during the cgw > step: > > cgw: GraphCGW_T.C:3275: void ComputeMatePairStatisticsRestricted(int, > int32, char*): Assertion `(mateContig) != __null' failed. > > It occurs after a lot of warnings about negative variance. I've tried > following the advice here - > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Scaffolder_failurebut keep getting the error. > > Any ideas? > > Cheers, > > Ben > > |
From: Francois S. <fra...@ir...> - 2013-05-28 11:11:34
|
Ok, it seems to me that it could be something like that... However, the host is generally the same, as it has the largest amount of memory. I'll try to bypass it using the pathMap file and the .bashrc Francois On 28/05/2013 11:46, Walenz, Brian wrote: > Hi- > > This doesn't look like a runCA problem, at least, a problem that could be > fixed in runCA. > > The first four lines are complaining about errors in /etc/profile ('id' == > '/usr/bin/id'). After that, I'm guessing that the environment is not set up > correctly, and now 'uname' (== '/bin/uname') can't be found. This results > in runCA being unable to figure out what kind of host it is on. > > Not having /bin or /usr/bin in your path seems quite odd! > > Can you add some debug reports to the shell scripts? 'echo $PATH' is the > only interesting debug I can think of. > > I can think of a couple hacks to get around the immediate problem, but if > the path is bad, then runCA will fail on 'mkdir', 'find', etc. It did find > 'perl' though. > > One last thought - there isn't anything special about the runCA*06.sh > script. All of the runCA.*.sh scripts are the same. If it ran fine before, > there might be one host with a bad OS configuration (/etc/profile) that > caused this job to fail. > > b > > > On 5/24/13 7:36 AM, "Francois Sabot" <fra...@ir...> wrote: > >> Hi everyone >> >> We are using runCA 7.0 from SVN, and we are confronted to a weird >> problem in the design of the script runCA.sge.out.06.sh, at eight step >> >> Here is the log file runCA.sge.out.06: >>> >>> /etc/profile: line 31: id : commande introuvable # unknown command in French >>> /etc/profile: line 61: id : commande introuvable >>> /etc/profile: line 61: id : commande introuvable >>> /etc/profile: line 79: /etc/init.d/msm_profile: Aucun fichier ou dossier de >>> ce type # No file or directory >>> /var/spool/gridengine/default/node1/job_scripts/7228: line 17: uname : >>> commande introuvable >>> /var/spool/gridengine/default/node1/job_scripts/7228: line 18: uname : >>> commande introuvable >>> /var/spool/gridengine/default/node1/job_scripts/7228: line 19: uname : >>> commande introuvable >>> Can't open perl script "/runCA": Aucun fichier ou dossier de ce type >> >> However, all the nodes on our grip answer correctly to uname or id >> command... And all other scripts are ok. >> >> We add the pathMap option, to give in a 'hard' way the path, but it did >> not succeed even... >> >> Here is the runCA.sge.out.06.sh script >> >> >>> #!/bin/sh >>> # >>> if [ "x$SGE_ROOT" != "x" ]; then >>> # Attempt to (re)configure SGE. For reasons Bri doesn't know, >>> # jobs submitted to SGE, and running under SGE, fail to read his >>> # .tcshrc (or .bashrc, limited testing), and so they don't setup >>> # SGE (or ANY other paths, etc) properly. For the record, >>> # interactive SGE logins (qlogin, etc) DO set the environment. >>> >>> . $SGE_ROOT/$SGE_CELL/common/settings.sh >>> fi >>> >>> # On the off chance that there is a pathMap, and the host we >>> # eventually get scheduled on doesn't see other hosts, we decide >>> # at run time where the binary is. >>> >>> syst=`uname -s` >>> arch=`uname -m` >>> name=`uname -n` >>> >>> if [ "$arch" = "x86_64" ] ; then >>> arch="amd64" >>> fi >>> if [ "$arch" = "Power Macintosh" ] ; then >>> arch="ppc" >>> fi >>> >>> bin="/home/sabotf/sources/wgs/$syst-$arch/bin" >>> >>> if [ "$name" = "master0.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node0.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node1.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node2.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node3.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node4.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node5.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node6.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node7.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node8.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node9.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node10.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "node11.alineos.net" ] ; then >>> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >>> fi >>> if [ "$name" = "" ] ; then >>> bin="" >>> fi >>> >>> /usr/bin/env perl $bin/runCA -d "/data/projects/assembling-glab/runCA_24mai" >>> -p "CG14_tog5681_test" "/data/projects/assembling-glab/Illumina_17mai.frg" -s >>> "/home/sabotf/CA_spec/asm_PACBIODEV.spec" >> >> >> Does someone have an idea ? >> >> Francois > > > -- -------------------------------------------------------- Francois Sabot, PhD Be realistic. Demand the Impossible. http://bioinfo.mpl.ird.fr/ http://www.mpl.ird.fr/rice ----------------------------------------- UMR DIversity, Adaptation & DEvelopment Centre IRD 911, Av Agropolis BP 64501 34394 Montpellier Cedex 5 France Phone: +33 4 67 41 64 18 ----------------------------------------- |
From: Walenz, B. <bw...@jc...> - 2013-05-28 10:00:27
|
Hi Ben- I think this is harmless, and can be patched around. > To patch it up, and maybe avoid the crash here, add the following at line > 3206 in AS_CGW/GraphCGW_T.c > > if (extremeContig == NULL) > continue; > > Line 3206 is just after "extremeContig = GetGraphNode(...)" and before the > call to GetContigPositionInScaffold(). Line numbers above are relative to the latest code base. In 6.1, unless masurca fiddled with GraphCGW_T.c, you want line 3296, in between these two lines: extremeContig = GetGraphNode( ScaffoldGraph->ContigGraph, scaff->info.Scaffold.BEndCI); GetContigPositionInScaffold ( extremeContig, &contigLeftEnd, &contigRightEnd, &contigScaffoldOrientation); The one other assembly that failed here (it was recent, too) finished successfully after the patch. b On 5/27/13 7:43 AM, "Ben Elsworth" <el...@gm...> wrote: Hi, I'm running v6.1 within MaSuRCA and keep getting this error during the cgw step: cgw: GraphCGW_T.C:3275: void ComputeMatePairStatisticsRestricted(int, int32, char*): Assertion `(mateContig) != __null' failed. It occurs after a lot of warnings about negative variance. I've tried following the advice here - http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Scaffolder_failure but keep getting the error. Any ideas? Cheers, Ben |
From: Walenz, B. <bw...@jc...> - 2013-05-28 09:47:02
|
Hi- This doesn't look like a runCA problem, at least, a problem that could be fixed in runCA. The first four lines are complaining about errors in /etc/profile ('id' == '/usr/bin/id'). After that, I'm guessing that the environment is not set up correctly, and now 'uname' (== '/bin/uname') can't be found. This results in runCA being unable to figure out what kind of host it is on. Not having /bin or /usr/bin in your path seems quite odd! Can you add some debug reports to the shell scripts? 'echo $PATH' is the only interesting debug I can think of. I can think of a couple hacks to get around the immediate problem, but if the path is bad, then runCA will fail on 'mkdir', 'find', etc. It did find 'perl' though. One last thought - there isn't anything special about the runCA*06.sh script. All of the runCA.*.sh scripts are the same. If it ran fine before, there might be one host with a bad OS configuration (/etc/profile) that caused this job to fail. b On 5/24/13 7:36 AM, "Francois Sabot" <fra...@ir...> wrote: > Hi everyone > > We are using runCA 7.0 from SVN, and we are confronted to a weird > problem in the design of the script runCA.sge.out.06.sh, at eight step > > Here is the log file runCA.sge.out.06: >> >> /etc/profile: line 31: id : commande introuvable # unknown command in French >> /etc/profile: line 61: id : commande introuvable >> /etc/profile: line 61: id : commande introuvable >> /etc/profile: line 79: /etc/init.d/msm_profile: Aucun fichier ou dossier de >> ce type # No file or directory >> /var/spool/gridengine/default/node1/job_scripts/7228: line 17: uname : >> commande introuvable >> /var/spool/gridengine/default/node1/job_scripts/7228: line 18: uname : >> commande introuvable >> /var/spool/gridengine/default/node1/job_scripts/7228: line 19: uname : >> commande introuvable >> Can't open perl script "/runCA": Aucun fichier ou dossier de ce type > > However, all the nodes on our grip answer correctly to uname or id > command... And all other scripts are ok. > > We add the pathMap option, to give in a 'hard' way the path, but it did > not succeed even... > > Here is the runCA.sge.out.06.sh script > > >> #!/bin/sh >> # >> if [ "x$SGE_ROOT" != "x" ]; then >> # Attempt to (re)configure SGE. For reasons Bri doesn't know, >> # jobs submitted to SGE, and running under SGE, fail to read his >> # .tcshrc (or .bashrc, limited testing), and so they don't setup >> # SGE (or ANY other paths, etc) properly. For the record, >> # interactive SGE logins (qlogin, etc) DO set the environment. >> >> . $SGE_ROOT/$SGE_CELL/common/settings.sh >> fi >> >> # On the off chance that there is a pathMap, and the host we >> # eventually get scheduled on doesn't see other hosts, we decide >> # at run time where the binary is. >> >> syst=`uname -s` >> arch=`uname -m` >> name=`uname -n` >> >> if [ "$arch" = "x86_64" ] ; then >> arch="amd64" >> fi >> if [ "$arch" = "Power Macintosh" ] ; then >> arch="ppc" >> fi >> >> bin="/home/sabotf/sources/wgs/$syst-$arch/bin" >> >> if [ "$name" = "master0.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node0.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node1.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node2.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node3.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node4.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node5.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node6.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node7.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node8.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node9.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node10.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "node11.alineos.net" ] ; then >> bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" >> fi >> if [ "$name" = "" ] ; then >> bin="" >> fi >> >> /usr/bin/env perl $bin/runCA -d "/data/projects/assembling-glab/runCA_24mai" >> -p "CG14_tog5681_test" "/data/projects/assembling-glab/Illumina_17mai.frg" -s >> "/home/sabotf/CA_spec/asm_PACBIODEV.spec" > > > Does someone have an idea ? > > Francois |
From: Ben E. <el...@gm...> - 2013-05-27 11:44:12
|
Hi, I'm running v6.1 within MaSuRCA and keep getting this error during the cgw step: cgw: GraphCGW_T.C:3275: void ComputeMatePairStatisticsRestricted(int, int32, char*): Assertion `(mateContig) != __null' failed. It occurs after a lot of warnings about negative variance. I've tried following the advice here - http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Scaffolder_failurebut keep getting the error. Any ideas? Cheers, Ben |
From: Francois S. <fra...@ir...> - 2013-05-24 11:36:29
|
Hi everyone We are using runCA 7.0 from SVN, and we are confronted to a weird problem in the design of the script runCA.sge.out.06.sh, at eight step Here is the log file runCA.sge.out.06: > > /etc/profile: line 31: id : commande introuvable # unknown command in French > /etc/profile: line 61: id : commande introuvable > /etc/profile: line 61: id : commande introuvable > /etc/profile: line 79: /etc/init.d/msm_profile: Aucun fichier ou dossier de ce type # No file or directory > /var/spool/gridengine/default/node1/job_scripts/7228: line 17: uname : commande introuvable > /var/spool/gridengine/default/node1/job_scripts/7228: line 18: uname : commande introuvable > /var/spool/gridengine/default/node1/job_scripts/7228: line 19: uname : commande introuvable > Can't open perl script "/runCA": Aucun fichier ou dossier de ce type However, all the nodes on our grip answer correctly to uname or id command... And all other scripts are ok. We add the pathMap option, to give in a 'hard' way the path, but it did not succeed even... Here is the runCA.sge.out.06.sh script > #!/bin/sh > # > if [ "x$SGE_ROOT" != "x" ]; then > # Attempt to (re)configure SGE. For reasons Bri doesn't know, > # jobs submitted to SGE, and running under SGE, fail to read his > # .tcshrc (or .bashrc, limited testing), and so they don't setup > # SGE (or ANY other paths, etc) properly. For the record, > # interactive SGE logins (qlogin, etc) DO set the environment. > > . $SGE_ROOT/$SGE_CELL/common/settings.sh > fi > > # On the off chance that there is a pathMap, and the host we > # eventually get scheduled on doesn't see other hosts, we decide > # at run time where the binary is. > > syst=`uname -s` > arch=`uname -m` > name=`uname -n` > > if [ "$arch" = "x86_64" ] ; then > arch="amd64" > fi > if [ "$arch" = "Power Macintosh" ] ; then > arch="ppc" > fi > > bin="/home/sabotf/sources/wgs/$syst-$arch/bin" > > if [ "$name" = "master0.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node0.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node1.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node2.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node3.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node4.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node5.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node6.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node7.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node8.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node9.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node10.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "node11.alineos.net" ] ; then > bin="/home/sabotf/sources/wgs/Linux-amd64/bin/" > fi > if [ "$name" = "" ] ; then > bin="" > fi > > /usr/bin/env perl $bin/runCA -d "/data/projects/assembling-glab/runCA_24mai" -p "CG14_tog5681_test" "/data/projects/assembling-glab/Illumina_17mai.frg" -s "/home/sabotf/CA_spec/asm_PACBIODEV.spec" Does someone have an idea ? Francois -- -------------------------------------------------------- Francois Sabot, PhD Be realistic. Demand the Impossible. http://bioinfo.mpl.ird.fr/ http://www.mpl.ird.fr/rice ----------------------------------------- UMR DIversity, Adaptation & DEvelopment Centre IRD 911, Av Agropolis BP 64501 34394 Montpellier Cedex 5 France Phone: +33 4 67 41 64 18 ----------------------------------------- |
From: Ben E. <el...@gm...> - 2013-05-23 14:35:29
|
Hi Brian, Thanks for the reply. I followed your advice, although I added the final unitig to version 2 not 5 as that didn't seem to work, and that has got things working. I'm not entirely sure what the difference is between the partitioned and non-partitioned formats though, and why is it working with v2 and not v5? Regards, Ben On 22 May 2013 18:51, Walenz, Brian <bw...@jc...> wrote: > Hi, Ben- > > I’ve seen something similar to this before. As I recall, the store is > getting confused between the partitioned (-up 3) and non-partitioned > formats. Try adding the final unitig to unpartiitoned version 5: > > tigStore –g *gkpStore –t *tigStore 2 –up 3 –d layout –u 8049 > > unitig8049.withcns > tigStore –g *gkpStore –t *tigStore 5 –R unitig8049.withcns > > The first retrieves the output of utgcns (check that it has consensus > sequence at the top, and that there is exactly one UTG line at the end) and > the second adds this to version 5 (the input to cgw). > > I think that what happened is that you added unitig8049 without consensus > to the unpartitioned store, (the middle command, without ‘–up3’) and that > is masking the untiig that is stored in partition 3. You can hopefully test > this hypothesis by removing ‘-up 3’ from the tigStore retrieval above – it > should be reporting a unitig without consensus. > > b > > [Ben, sorry for the duplicate, forgot to send to the list] > > > > On 5/22/13 11:28 AM, "Ben Elsworth" <el...@gm...> wrote: > > Hi, > > I am having an issue with the consensus step as part of MaSuRCA. One > unitig fails and I have tried a number of methods to correct it, including > rearranging, splitting and even removing it. All of these methods fix the > alignment issue and produces a successful outcome when I test it: > > utgcns -g genome.gkpStore -t genome.tigStore 1 3 -T unitig8049 > > Based on this I insert the updated tig and compute the new consensus > sequence: > > tigStore -g genome.gkpStore -t genome.tigStore/ 1 -up 3 -R unitig8049 > > > utgcns -g genome.gkpStore -t genome.tigStore 1 3 -u 8049 > > > However, when I rerun runCA I get the following error from the cgw step: > > > Reading unitigs. > > > ...processed 100000 unitigs. > ERROR: Unitig 8049 has no placement; probably not run through consensus. > > > cgw: Input_CGW.C:117: int ProcessInput(int, int, char**): Assertion `1 == > GetNumIntUnitigPoss(uma->u_list)' failed. > > > > > > How can I make sure the new unitig arrangement is found? > > Cheers, > > Ben > > > > > > > > |
From: Walenz, B. <bw...@jc...> - 2013-05-22 17:51:17
|
Hi, Ben- I’ve seen something similar to this before. As I recall, the store is getting confused between the partitioned (-up 3) and non-partitioned formats. Try adding the final unitig to unpartiitoned version 5: tigStore –g *gkpStore –t *tigStore 2 –up 3 –d layout –u 8049 > unitig8049.withcns tigStore –g *gkpStore –t *tigStore 5 –R unitig8049.withcns The first retrieves the output of utgcns (check that it has consensus sequence at the top, and that there is exactly one UTG line at the end) and the second adds this to version 5 (the input to cgw). I think that what happened is that you added unitig8049 without consensus to the unpartitioned store, (the middle command, without ‘–up3’) and that is masking the untiig that is stored in partition 3. You can hopefully test this hypothesis by removing ‘-up 3’ from the tigStore retrieval above – it should be reporting a unitig without consensus. b [Ben, sorry for the duplicate, forgot to send to the list] On 5/22/13 11:28 AM, "Ben Elsworth" <el...@gm...> wrote: Hi, I am having an issue with the consensus step as part of MaSuRCA. One unitig fails and I have tried a number of methods to correct it, including rearranging, splitting and even removing it. All of these methods fix the alignment issue and produces a successful outcome when I test it: utgcns -g genome.gkpStore -t genome.tigStore 1 3 -T unitig8049 Based on this I insert the updated tig and compute the new consensus sequence: tigStore -g genome.gkpStore -t genome.tigStore/ 1 -up 3 -R unitig8049 utgcns -g genome.gkpStore -t genome.tigStore 1 3 -u 8049 However, when I rerun runCA I get the following error from the cgw step: Reading unitigs. ...processed 100000 unitigs. ERROR: Unitig 8049 has no placement; probably not run through consensus. cgw: Input_CGW.C:117: int ProcessInput(int, int, char**): Assertion `1 == GetNumIntUnitigPoss(uma->u_list)' failed. How can I make sure the new unitig arrangement is found? Cheers, Ben |
From: Ben E. <el...@gm...> - 2013-05-22 15:29:13
|
Hi, I am having an issue with the consensus step as part of MaSuRCA. One unitig fails and I have tried a number of methods to correct it, including rearranging, splitting and even removing it. All of these methods fix the alignment issue and produces a successful outcome when I test it: utgcns -g genome.gkpStore -t genome.tigStore 1 3 -T unitig8049 Based on this I insert the updated tig and compute the new consensus sequence: tigStore -g genome.gkpStore -t genome.tigStore/ 1 -up 3 -R unitig8049 utgcns -g genome.gkpStore -t genome.tigStore 1 3 -u 8049 However, when I rerun runCA I get the following error from the cgw step: Reading unitigs. ...processed 100000 unitigs. ERROR: Unitig 8049 has no placement; probably not run through consensus. cgw: Input_CGW.C:117: int ProcessInput(int, int, char**): Assertion `1 == GetNumIntUnitigPoss(uma->u_list)' failed. How can I make sure the new unitig arrangement is found? Cheers, Ben |
From: Francois S. <fra...@ir...> - 2013-05-21 12:12:50
|
Oups... I forgot that ^^ I have relaunched using bogart, I'll keep in touch if any problem occurs again Thanks Francois On 21/05/2013 14:02, Ole Kristian Tørresen wrote: > Hi, > I think you just have to set the unitigger, because it is undefined by > default (not sure when that happened). Try to set > unitigger=bogart > in your spec file. > > If that does not work, could you show the content of your spec file? > > Ole > > > On 21 May 2013 13:07, Francois Sabot <fra...@ir... > <mailto:fra...@ir...>> wrote: > > Dear all, > > I am using runCA for assembling pacBio data on my cluster. > > The PacBioToCA script correction performed very well on our grid, > correcting 4 SMRTcells of C2 kit using a 25x coverage in Illumina (all > tests data) > > Then I launch runCA as followed, > /home/sabotf/sources/wgs/Linux-amd64/bin/runCA -d > /data/projects/assembling-glab/runCA_21mai -s > /home/sabotf/CA_spec/pacbio_SGE_PACBIODEV.spec -p CG14_tog5681_test > /data/projects/assembling-glab/Illumina_17mai.frg > > Every things seems fine, the data are accepted, overlapper run, store > bla bla bla... > > But it failed in weird manner at the end: > > > /home/sabotf/sources/wgs/Linux-amd64/bin/deduplicate \ > > -gkp > /data/projects/assembling-glab/runCA_21mai/CG14_tog5681_test.gkpStore \ > > -ovs > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.obtStore > \ > > -ovs > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.dupStore > \ > > -report > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.log > \ > > -summary > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.summary > \ > >> > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.err > 2>&1 > > ----------------------------------------END Tue May 21 10:55:10 > 2013 (2 seconds) > > > ================================================================================ > > > > runCA failed. > > > > ---------------------------------------- > > Stack trace: > > > > at /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 1390. > > main::caFailure('unknown unitigger \'\' during finalTrim', > undef) called at /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line > 4010 > > main::overlapTrim() called at > /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 6155 > > > > ---------------------------------------- > > Failure message: > > > > unknown unitigger '' during finalTrim > > > > > Can you help me ? I have no idea at all... > > My runCA version comes from thr subversion > > Francois > > -- > -------------------------------------------------------- > Francois Sabot, PhD > > Be realistic. Demand the Impossible. > http://bioinfo.mpl.ird.fr/ > http://www.mpl.ird.fr/rice > ----------------------------------------- > UMR DIversity, Adaptation & DEvelopment > Centre IRD > 911, Av Agropolis BP 64501 > 34394 Montpellier Cedex 5 > France > Phone: +33 4 67 41 64 18 <tel:%2B33%204%2067%2041%2064%2018> > ----------------------------------------- > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring > service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! > http://p.sf.net/sfu/newrelic_d2d_may > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > <mailto:wgs...@li...> > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > -- -------------------------------------------------------- Francois Sabot, PhD Be realistic. Demand the Impossible. http://bioinfo.mpl.ird.fr/ http://www.mpl.ird.fr/rice ----------------------------------------- UMR DIversity, Adaptation & DEvelopment Centre IRD 911, Av Agropolis BP 64501 34394 Montpellier Cedex 5 France Phone: +33 4 67 41 64 18 ----------------------------------------- |
From: Ole K. T. <ol...@st...> - 2013-05-21 12:03:00
|
Hi, I think you just have to set the unitigger, because it is undefined by default (not sure when that happened). Try to set unitigger=bogart in your spec file. If that does not work, could you show the content of your spec file? Ole On 21 May 2013 13:07, Francois Sabot <fra...@ir...> wrote: > Dear all, > > I am using runCA for assembling pacBio data on my cluster. > > The PacBioToCA script correction performed very well on our grid, > correcting 4 SMRTcells of C2 kit using a 25x coverage in Illumina (all > tests data) > > Then I launch runCA as followed, > /home/sabotf/sources/wgs/Linux-amd64/bin/runCA -d > /data/projects/assembling-glab/runCA_21mai -s > /home/sabotf/CA_spec/pacbio_SGE_PACBIODEV.spec -p CG14_tog5681_test > /data/projects/assembling-glab/Illumina_17mai.frg > > Every things seems fine, the data are accepted, overlapper run, store > bla bla bla... > > But it failed in weird manner at the end: > > > /home/sabotf/sources/wgs/Linux-amd64/bin/deduplicate \ > > -gkp > /data/projects/assembling-glab/runCA_21mai/CG14_tog5681_test.gkpStore \ > > -ovs > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.obtStore > \ > > -ovs > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.dupStore > \ > > -report > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.log > \ > > -summary > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.summary > \ > >> > /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.err > 2>&1 > > ----------------------------------------END Tue May 21 10:55:10 2013 (2 > seconds) > > > ================================================================================ > > > > runCA failed. > > > > ---------------------------------------- > > Stack trace: > > > > at /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 1390. > > main::caFailure('unknown unitigger \'\' during finalTrim', > undef) called at /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 4010 > > main::overlapTrim() called at > /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 6155 > > > > ---------------------------------------- > > Failure message: > > > > unknown unitigger '' during finalTrim > > > > > Can you help me ? I have no idea at all... > > My runCA version comes from thr subversion > > Francois > > -- > -------------------------------------------------------- > Francois Sabot, PhD > > Be realistic. Demand the Impossible. > http://bioinfo.mpl.ird.fr/ > http://www.mpl.ird.fr/rice > ----------------------------------------- > UMR DIversity, Adaptation & DEvelopment > Centre IRD > 911, Av Agropolis BP 64501 > 34394 Montpellier Cedex 5 > France > Phone: +33 4 67 41 64 18 > ----------------------------------------- > > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |
From: Francois S. <fra...@ir...> - 2013-05-21 11:27:21
|
Dear all, I am using runCA for assembling pacBio data on my cluster. The PacBioToCA script correction performed very well on our grid, correcting 4 SMRTcells of C2 kit using a 25x coverage in Illumina (all tests data) Then I launch runCA as followed, /home/sabotf/sources/wgs/Linux-amd64/bin/runCA -d /data/projects/assembling-glab/runCA_21mai -s /home/sabotf/CA_spec/pacbio_SGE_PACBIODEV.spec -p CG14_tog5681_test /data/projects/assembling-glab/Illumina_17mai.frg Every things seems fine, the data are accepted, overlapper run, store bla bla bla... But it failed in weird manner at the end: > /home/sabotf/sources/wgs/Linux-amd64/bin/deduplicate \ > -gkp /data/projects/assembling-glab/runCA_21mai/CG14_tog5681_test.gkpStore \ > -ovs /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.obtStore \ > -ovs /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.dupStore \ > -report /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.log \ > -summary /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.summary \ >> /data/projects/assembling-glab/runCA_21mai/0-overlaptrim/CG14_tog5681_test.deduplicate.err 2>&1 > ----------------------------------------END Tue May 21 10:55:10 2013 (2 seconds) > ================================================================================ > > runCA failed. > > ---------------------------------------- > Stack trace: > > at /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 1390. > main::caFailure('unknown unitigger \'\' during finalTrim', undef) called at /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 4010 > main::overlapTrim() called at /home/sabotf/sources/wgs/Linux-amd64/bin/runCA line 6155 > > ---------------------------------------- > Failure message: > > unknown unitigger '' during finalTrim Can you help me ? I have no idea at all... My runCA version comes from thr subversion Francois -- -------------------------------------------------------- Francois Sabot, PhD Be realistic. Demand the Impossible. http://bioinfo.mpl.ird.fr/ http://www.mpl.ird.fr/rice ----------------------------------------- UMR DIversity, Adaptation & DEvelopment Centre IRD 911, Av Agropolis BP 64501 34394 Montpellier Cedex 5 France Phone: +33 4 67 41 64 18 ----------------------------------------- |
From: Walenz, B. <bw...@jc...> - 2013-04-30 16:51:45
|
Yes, that’s exactly the correct thing to do. I now see in the logs you posted that bogart did detect 70.5 gb of memory, then tried to load 66.5 gb worth of overlaps. The accounting of memory usage was a bit off in CA7, which results in bogart trying to load a few too many overlaps. If you’ve got a hard limit on memory, then the OS will kill the job for exhausting memory. The solution is to artificially reduce the memory limit. On 4/30/13 8:51 AM, "Rob Syme" <rob...@gm...> wrote: Aha. Thanks! Do you think it would be OK to set batMemory, delete the unitigger dir and then restart runCA? -r On 30 Apr 2013 20:09, "Walenz, Brian" <bw...@jc...> wrote: Hi- Definitely out of memory. By default, bogart tries to load the entire ovlStore into memory. There is some 'compression' - I forget exactly how much - but it will want around 60% to 75% of the size of the ovlStore. Set batMemory=64 to tell bogart to use about 64gb memory. It will filter out the lesser overlaps and retain only the longer ones. b ________________________________________ From: Rob Syme [rob...@gm...] Sent: Monday, April 29, 2013 11:08 PM To: wgs...@li... Subject: [wgs-assembler-users] Bogart unitigger failing with "terminate called after throwing an instance of 'std::bad_alloc'" Hi all I'm assembling a fungal genome with two illumina libraries and a set of sanger reads. The spec file, stderr, stdout and unitigger.err are available at: https://gist.github.com/robsyme/5486299 The assembly was running on CA 7.0 with 70GB of memory. The overlapping and overlapcorrection stages run without issue, but the unitigger (bogart) failed with the error: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc I'm pretty sure CA still had 10GB of memory free when the error occured. Has anybody seen this error before or know a fix/workaround? Rob Syme PhD Student Curtin University Western Australia |
From: Rob S. <rob...@gm...> - 2013-04-30 12:51:32
|
Aha. Thanks! Do you think it would be OK to set batMemory, delete the unitigger dir and then restart runCA? -r On 30 Apr 2013 20:09, "Walenz, Brian" <bw...@jc...> wrote: > Hi- > > Definitely out of memory. By default, bogart tries to load the entire > ovlStore into memory. There is some 'compression' - I forget exactly how > much - but it will want around 60% to 75% of the size of the ovlStore. > > Set batMemory=64 to tell bogart to use about 64gb memory. It will filter > out the lesser overlaps and retain only the longer ones. > > b > > > > > ________________________________________ > From: Rob Syme [rob...@gm...] > Sent: Monday, April 29, 2013 11:08 PM > To: wgs...@li... > Subject: [wgs-assembler-users] Bogart unitigger failing with "terminate > called after throwing an instance of 'std::bad_alloc'" > > Hi all > I'm assembling a fungal genome with two illumina libraries and a set of > sanger reads. > The spec file, stderr, stdout and unitigger.err are available at: > https://gist.github.com/robsyme/5486299 > The assembly was running on CA 7.0 with 70GB of memory. > > The overlapping and overlapcorrection stages run without issue, but the > unitigger (bogart) failed with the error: > > terminate called after throwing an instance of 'std::bad_alloc' > what(): std::bad_alloc > > I'm pretty sure CA still had 10GB of memory free when the error occured. > > Has anybody seen this error before or know a fix/workaround? > > Rob Syme > PhD Student > Curtin University > Western Australia > > > |
From: Walenz, B. <bw...@jc...> - 2013-04-30 12:09:43
|
Hi- Definitely out of memory. By default, bogart tries to load the entire ovlStore into memory. There is some 'compression' - I forget exactly how much - but it will want around 60% to 75% of the size of the ovlStore. Set batMemory=64 to tell bogart to use about 64gb memory. It will filter out the lesser overlaps and retain only the longer ones. b ________________________________________ From: Rob Syme [rob...@gm...] Sent: Monday, April 29, 2013 11:08 PM To: wgs...@li... Subject: [wgs-assembler-users] Bogart unitigger failing with "terminate called after throwing an instance of 'std::bad_alloc'" Hi all I'm assembling a fungal genome with two illumina libraries and a set of sanger reads. The spec file, stderr, stdout and unitigger.err are available at: https://gist.github.com/robsyme/5486299 The assembly was running on CA 7.0 with 70GB of memory. The overlapping and overlapcorrection stages run without issue, but the unitigger (bogart) failed with the error: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc I'm pretty sure CA still had 10GB of memory free when the error occured. Has anybody seen this error before or know a fix/workaround? Rob Syme PhD Student Curtin University Western Australia |
From: Rob S. <rob...@gm...> - 2013-04-30 03:09:22
|
Hi all I'm assembling a fungal genome with two illumina libraries and a set of sanger reads. The spec file, stderr, stdout and unitigger.err are available at: https://gist.github.com/robsyme/5486299 The assembly was running on CA 7.0 with 70GB of memory. The overlapping and overlapcorrection stages run without issue, but the unitigger (bogart) failed with the error: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc I'm pretty sure CA still had 10GB of memory free when the error occured. Has anybody seen this error before or know a fix/workaround? Rob Syme PhD Student Curtin University Western Australia |
From: Walenz, B. <bw...@jc...> - 2013-04-29 09:04:58
|
Hi- Ouch! Did it really spend 18 days here!? Out of curiosity, how much disk is it using? A bit of background - The ‘mer’ overlapper uses a more sensitive seed for finding overlaps. It was developed specifically for homopolymer run problems 454 reads. It has two problems: first, it doesn’t scale much past a bacteria, and second, 454 reads have much less of a problem in homopolymer runs than they used to. The easiest solution might be to just restart with overlapper=ovl. It should run much much faster. Your run is also only finished computing overlaps just to trim the reads. The overlaps used for assembly still need to be computed. Some questions: Is it just job 79 that failed? Are there only 79 jobs? The number of jobs can be computed using the numbers at the start of olap-from-seeds.sh – or as ‘numReads’ / ‘runCA parameter merOverlapperExtendBatchSize’. Would you be happy restarting with overlapper=ovl? I’d prefer this. Debugging over email is quite difficult, and we might just end up restarting anyway based on my time availability and your programming experience. Lets try to debug it a bit. Rebuild the assembler with debug symbols: % cd /home/btw434/work-yearly/Celera/wgs-7.0 % mv Linux-amd64 Linux-amd64-save % cd src % gmake BUILDDEBUG=1 Then modify olap-from-seeds.sh and add ‘catchsegv’ before the olap-from-seeds command, and rerun job 79: % sh olap-from-seeds.sh 79 >& debug.err This will run the command, but catch the crash and report a bit of debug info. From there, I can see where in the code it is failing. b On 4/26/13 4:55 AM, "Richard Buggs" <r....@qm...> wrote: Hi all, I am running CABOG for the first time on 4.3X 454 coverage of an 880Mb plant genome. It has stopped working at the end of the 79th iteration of "olap-from-seeds", due to a "segmentation fault". Can anyone advise me how to resume the assembly, please? I have tried restarting it but the problem recurs. The end of my standard output was: /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olap-from-seeds.sh 77 > /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olaps/0077.err 2>&1 /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olap-from-seeds.sh 78 > /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olaps/0078.err 2>&1 /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olap-from-seeds.sh 79 > /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olaps/0079.err 2>&1 ----------------------------------------END CONCURRENT Wed Apr 24 12:32:33 2013 (1582382 seconds) ================================================================================ runCA failed. ---------------------------------------- Stack trace: at ./runCA line 1237. main::caFailure('olap-from-seeds failed. See *.err in /home/btw434/work-yearl...', undef) called at ./runCA line 3200 main::merOverlapper('trim') called at ./runCA line 3258 main::createOverlapJobs('trim') called at ./runCA line 3647 main::overlapTrim() called at ./runCA line 5876 ---------------------------------------- Failure message: olap-from-seeds failed. See *.err in /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap. The tail of /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olaps/0079.err read: Thread 1 processed 1170163 olaps at Wed Apr 24 12:21:07 2013 Thread 0 processed 1174288 olaps at Wed Apr 24 12:21:11 2013 Extracted 12711 of 12711 fragments in iid range 5900001 .. 5912794 Thread 1 processed 1374622 olaps at Wed Apr 24 12:29:16 2013 Thread 0 processed 1383950 olaps at Wed Apr 24 12:29:20 2013 Thread 1 processed 224905 olaps at Wed Apr 24 12:30:49 2013 Thread 0 processed 229187 olaps at Wed Apr 24 12:30:51 2013 Failed overlaps = 106922508 Start analyzing multi-alignments Num_Frags = 62794 /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olap-from-seeds.sh: line 63: 40962 Segmentation fault $bin/olap-from-seeds -a -b -t 2 -S /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/Ash1.merStore -G -o /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/0-overlaptrim-overlap/olaps/$jobid.ovb.WORKING.gz /home/btw434/work-yearly/Celera/wgs-7.0/Linux-amd64/bin/AshCelera1/Ash1.gkpStore $minid $maxid Do you know how I might solve this problem, please? many thanks Richard _______ Dr Richard Buggs | Senior Lecturer | School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, United Kingdom | email: r....@qm... | website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html | office: +44(0)207 882 3058 | mobile: +44(0)772 992 0401 | twitter: @RJABuggs |
From: Ole K. T. <o.k...@ib...> - 2013-04-26 14:27:25
|
Hi Sergey. On 25 April 2013 02:11, Sergey Koren <se...@um...> wrote: > Hi, > > That certainly doesn't look normal, especially the over 100% error rates. > Could you post one of your asm.*.lay and asm.*.log files. Either one that > failed or one that had over 100% error. > I uploaded a make-consensus-error.tgz to your public ftp with the log and lay files. MD5sum: 5d2cedc2c84fd41faeec3b064f37161a If I read the pacBioToCA correctly, it seems to first run make-consensus with the -x option, to exclude reads. If that fails for some reason, it will do make-consensus without excluding reads. I think the first try, with -x, failed because I've run it on grid without enough memory. Could that be the reason? It didn't seem to work much better with more memory, but I might have not given it enough. Haven't been able to look really well into it yet. Ole > > Sergey > > On Apr 23, 2013, at 1:32 PM, Ole Kristian Tørresen < > o.k...@ib...> wrote: > > > Hi, > > I'm correcting some XLXL PacBio reads, and hit upon some failed > assertions in make-consensus. I hope it's possible to solve. > > > > This is the first error (last 10 lines of the out file from the > runPartition.sh): > > with 100 errors (99.01% error) > > In contig 814 forced alignment of 65 bases of string 48036185 subscript > 344 to 2 bases of consensus > > with 64 errors (98.46% error) > > In contig 814 forced alignment of 398 bases of string 52752554 subscript > 345 to 1 bases of consensus > > with 397 errors (99.75% error) > > In contig 814 forced alignment of 227 bases of string 56213438 subscript > 346 to 18 bases of consensus > > with 214 errors (94.27% error) > > In contig 814 forced alignment of 112 bases of string 75693310 subscript > 347 to 2 bases of consensus > > with 110 errors (98.21% error) > > make-consensus: align.cc:4903: void Complete_Align(const char*, int, > int, const char*, int, int, int, int, int, int, int, Align_Score_Entry_t*, > Align_Score_Entry_t&, Alignment_t&): Assertion `t_lo < t_slip' failed. > > > > 98.21 % error is a bit weird. Got any idea what that assertion is? > > > > > > This is not an error, but even weirder: > > with 62 errors (151.22% error) > > In Reset_From_Votes in contig 1369 > > Forced alignment of string subscript 685 to consensus > > with 169 errors (174.23% error) > > In Reset_From_Votes in contig 1369 > > Forced alignment of string subscript 686 to consensus > > with 154 errors (179.07% error) > > In Reset_From_Votes in contig 1369 > > Forced alignment of string subscript 687 to consensus > > with 78 errors (185.71% error) > > > > 185.71 % error? That's a lot. > > > > This is another error: > > In contig 1135 forced alignment of 82 bases of string 52574042 subscript > 809 to 11 bases of consensus > > with 74 errors (90.24% error) > > In contig 1135 forced alignment of 225 bases of string 41625223 > subscript 810 to 3 bases of consensus > > with 222 errors (98.67% error) > > In contig 1135 forced alignment of 146 bases of string 55029895 > subscript 811 to 2 bases of consensus > > with 144 errors (98.63% error) > > In contig 1135 forced alignment of 140 bases of string 40943538 > subscript 812 to 3 bases of consensus > > with 137 errors (97.86% error) > > ERROR: Bad from = 3 in Global_Align > > in file align.cc at line 6872 errno = 0 > > > > How can I get this kind of errors in the alignments? > > > > The genome is about 830 Mbp (it's cod). Correcting with 454 reads, about > 27x total. This is about 3.5x coverage in PacBio reads: > > Running with 2536048074 bp for pblr_XLXL_cod. > > Correcting with 22252248087 bp. > > > > Could something have gone wrong in the correctPacBio step? > > > > Thank you. > > > > Ole > > > > > > > ------------------------------------------------------------------------------ > > Try New Relic Now & We'll Send You this Cool Shirt > > New Relic is the only SaaS-based application performance monitoring > service > > that delivers powerful full stack analytics. Optimize and monitor your > > browser, app, & servers with just a few lines of code. Try New Relic > > and get this awesome Nerd Life shirt! > http://p.sf.net/sfu/newrelic_d2d_apr_______________________________________________ > > wgs-assembler-users mailing list > > wgs...@li... > > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |