On Oct 11, 2011, at 7:56 AM, Keiran Raine wrote:
> Hi all,
>
> Can I draw your attention to the below email, especially this line:
>
>> * last_diff_pos is only initialised if is_diff is true. What should the value of last_diff_pos in a newly pushed gap entry be when gap_push() is called with is_diff = 0?
>
> This was submitted ~2 weeks ago and has had no response.
>
> Is this type of error not considered a serious bug? It would be good to know how this value should be initialised even if no-one has time to provide detailed insight as to the affect it has been having.
This is a bug, but not a serious one. I have luckily (and unluckily) never had any segfault caused by the bug. As to the correctness, this bug may very rarely lead to missing suboptimal hits, but the optimal hits are never affected. The mapping quality will be inaccurate in this case, but apparently the mismapping rate outweighs this bug most of time. On a couple of testing data sets, fixing the bug or not does not affect the result at all.
In all, no worry. But it is a bug anyway that will be fixed.
Thank you,
Heng
>
> Kind regards,
>
> Keiran Raine
> Senior Bioinformatician
> The Cancer Genome Project
> Ext: 7703
> kr2@...
>
>
>
>
>
>
> On 30 Sep 2011, at 13:01, John Marshall wrote:
>
>> The gap_push() function finds the right substack for a new gap_entry_t and initialises the new entry from all the arguments given to gap_push(). With two exceptions, all the gap_entry_t fields are filled in all the time:
>>
>> * n_seed_mm is never initialised, but it's never used either, so that doesn't matter;
>>
>> * last_diff_pos is only initialised if is_diff is true. What should the value of last_diff_pos in a newly pushed gap entry be when gap_push() is called with is_diff = 0?
>>
>> This is causing a segfault in bwa-aln for several of our data sets. Usually last_diff_pos will inherit the value from the previous gap_entry_t in the particular slot in memory, which may or may not be correct for the new gap. But when a gap with is_diff = 0 is pushed at the same time as the substack's m_entries is hit and a memory reallocation occurs, the new gap_entry_t is in freshly-allocated memory and its last_diff_pos field will be very very uninitialised. This leads to gap_shadow() invocations like:
>>
>> gap_shadow(1431, 100, 3095693981, 1212696904, 0x6312030)
>>
>> which segfaults, as 1212696904 well exceeds the array width[], which has only 100 entries.
>>
>> Thus gap_push() needs to initialise last_diff_pos when is_diff = 0 too. It's not really clear to me exactly what the width[] array represents, so I don't know what an appropriate value to initialise it to is though.
>>
>> Investigating this crash also brought up a couple of questions:
>>
>> 1. The first argument to gap_shadow(), l - k + 1, mostly ranged from 0 to perhaps 100 in all the previous invocations of gap_shadow() in this test case. What is the expected range of this argument, and is 1431 suspiciously large?
>>
>> 2. Even when it doesn't get uninitialised memory and crash, many many gaps get pushed with is_diff = 0 and receive a value for last_diff_pos that doesn't necessarily pertain to that gap itself. This means that gap_shadow() adjusts a shorter or longer slice of the width[] array than perhaps it should. Might this have a significant adverse effect on the mappings produced?
>>
>> Thanks,
>>
>> John
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research
>> Limited, a charity registered in England with number 1021457 and a
>> company registered in England with number 2742969, whose registered
>> office is 215 Euston Road, London, NW1 2BE.
>>
>> ------------------------------------------------------------------------------
>> All of the data generated in your IT infrastructure is seriously valuable.
>> Why? It contains a definitive record of application performance, security
>> threats, fraudulent activity, and more. Splunk takes this data and makes
>> sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy2
>> _______________________________________________
>> Bio-bwa-help mailing list
>> Bio-bwa-help@...
>> https://lists.sourceforge.net/lists/listinfo/bio-bwa-help
>
>
> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE.
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct_______________________________________________
> Bio-bwa-help mailing list
> Bio-bwa-help@...
> https://lists.sourceforge.net/lists/listinfo/bio-bwa-help
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|