Re: [MUMmer-help] dnadiff very slow for Human

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Davide,
In this context, "unique" means that there is only one alignment covering a
region. Since multiple alignments can overlap one another, this option
looks at a particular alignment and computes the fraction of its length
where it is the *only* alignment that exists at that reference/query
position.

Best,
-Adam

On Wed, Jun 18, 2014 at 3:55 AM, Davide VERZOTTO (GIS) <
ver...@gi...> wrote:

>  Hi Adam,
>
>
>
> May I kindly ask you how the -u option is actually working in
> delta-filter, more in detail than what is written in the manual (for
> example, what do you mean exactly with 'unique reference' and 'unique
> query')?
>
>
>
> Thanks and regards,
>
> Davide
>
>
>
>
>
> *From:* Adam Phillippy [mailto:aph...@gm...]
> *Sent:* Wednesday, May 28, 2014 4:27 AM
> *To:* Davide VERZOTTO (GIS)
> *Cc:* mum...@li...
> *Subject:* Re: [MUMmer-help] dnadiff very slow for Human
>
>
>
> Hi Davide,
>
> I'm not aware of a converter between Mummer formats and the UCSC format
> you referenced. However, all of the information required by that format is
> contained within the Nucmer delta format, so it would be relatively
> straightforward to write such a converter.
>
>
>
> Best,
>
> -Adam
>
>
>
>
>
> On Thu, May 22, 2014 at 11:20 PM, Davide VERZOTTO (GIS) <
> ver...@gi...> wrote:
>
> Hi Adam,
>
>
>
> Thank you for your kind reply and your hint on mum-reference, I am testing
> it now and seems indeed to dramatically reduce delta file sizes.  I was
> already increasing the minimum match length, but not the minimum cluster
> length (what is actually the meaning of this field?).  I also used the the
> -l and -i options by applying a delta-filter first, and the -d option in
> dnadiff.  The slowness problem seems to be with the large number of small
> contigs that we have, since it is not really affecting the big scaffolds.
>
>
>
> Just another question:  is it possible to use dnadiff (or another MUMmer
> suite) output to make the annotation lift-over from the Reference genome to
> a de novo Human genome assembly using UCSC liftOver tool, which
> requires first to chain the alignments found (see the chain format:
> https://genome.ucsc.edu/goldenPath/help/chain.html), or other tools that
> you may know?
>
>
>
> Thanks and regards,
>
> Davide
>
>
>
>
>
> On May 23, 2014, at 2:49 AM, Adam Phillippy wrote:
>
>
>
>  Hi Davide,
>
> dnadiff was primarily designed for microbial genome comparison and
> currently does not scale well for large genomes. The 'delta-filter' step is
> certainly one of the major bottlenecks. delta-filter scales by the number
> of matches it has to analyze, so you can speed things along by reducing the
> total number of matches. A few ways to do this:
>
>
>
> 1. Run nucmer in mum-reference mode to ignore repetitive alignment seeds
>
> 2. Increase the minimum match length and minimum cluster length (this will
> reduce sensitivity to low-identity alignments)
>
> 3. Run delta-filter with the -l and -i options to filter alignments by
> length and identity (these filters are quick, compared to -1/-m/-r/-q which
> all require a dynamic programming step)
>
>
>
> Once you have a filtered delta file using the above recommendations, you
> can pass it directly to dnadiff using the -d option and it will skip the
> alignment phase and process your delta filter directly--hopefully faster
> than before.
>
>
>
> Best,
>
> -Adam
>
>
>
>
>
>
>
> On Tue, May 20, 2014 at 1:34 AM, Davide VERZOTTO (GIS) <
> ver...@gi...> wrote:
>
> Dear MUMmer users,
>
> We are trying to apply dnadiff for the analysis of breakpoints between our
> de novo Human genome assembly and the Reference genome, the latter divided
> into multiple chromosomes / separate files.
>
> We have already computed a NUCmer comparison between the two assemblies
> and the related delta file.  After this, we tried to compare all our
> scaffolds versus hg19 chromosome 1 using dnadiff, and the tool lasted more
> than 12 days (1 single core used, peak of 24 Gb RAM) before crashing (for
> internal server reasons), without writing any temporary file (apart from
> the log line "Filtering alignments") and presumably just trying to run
> "delta-filter -1".  Did you already face this problem?  Is there a way or
> script to speed up dnadiff for the Human genome comparison?
>
> Thanks and regards,
> Davide
>
> -------------------------------
> This e-mail and any attachments are only for the use of the intended
> recipient and may be confidential and/or privileged. If you are not the
> recipient, please delete it or notify the sender immediately. Please do not
> copy or use it for any purpose or disclose the contents to any other person
> as it may be an offence under the Official Secrets Act.
> -------------------------------
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> MUMmer-help mailing list
> MUM...@li...
> https://lists.sourceforge.net/lists/listinfo/mummer-help
>
>
>
>
>
>
> -------------------------------
> This e-mail and any attachments are only for the use of the intended
> recipient and may be confidential and/or privileged. If you are not the
> recipient, please delete it or notify the sender immediately. Please do not
> copy or use it for any purpose or disclose the contents to any other person
> as it may be an offence under the Official Secrets Act.
> -------------------------------
>
>
>
> -------------------------------
> This e-mail and any attachments are only for the use of the intended
> recipient and may be confidential and/or privileged. If you are not the
> recipient, please delete it or notify the sender immediately. Please do not
> copy or use it for any purpose or disclose the contents to any other person
> as it may be an offence under the Official Secrets Act.
> -------------------------------
>