Re: [Exist-open] deduplicating mixed-content results

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Dear Matthew (cc eXist-open),

Bingo! I'm retrieving marked-up mixed content by matching on the string
value, so all returns will be string-equal and have the same
string-length. A useful place where they'll differ, as you point out, will
be in the string-length of the text() nodes before the <stress> element.
Some words won't have a <stress> element, and some will also have a
<secStress> element for secondary stress, but none will have more than one
<stress> element.

Thank you for the reminder that sometimes the easiest way to solve a
problem is to pretend that it's a different problem.

Best,

David
dj...@gm...
__

From:  "Matthew L. Avizinis" <ma...@gl...>
Organization:  Gleim Publications, Inc.
Date:  Thursday, April 4, 2013 8:31 PM
To:  David Birnbaum <dj...@pi...>
Cc:  "exi...@li..." <exi...@li...>
Subject:  Re: [Exist-open] deduplicating mixed-content results

Hello David,
I'm not really familiar with Russian at all, so I don't know whether your
data would have more than one <stress> element within each <form> element.
However, based on what you've given, how about this?
1) string-length(form/text()) will be equal for a given group of <form>
elements
2) you'll also have the same number of text nodes for a given group
Hence,
3) all you'll have to do is check if the string-length of all text nodes
for a given set of siblings is equal.  If they're all the same length,
bang!, you know it's not distinct.
Regards,
Matthew L. Avizinis
Gleim Publications, Inc <http://www.gleim.com>

Re: [Exist-open] deduplicating mixed-content results

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] deduplicating mixed-content results