From: Birnbaum, D. J <dj...@pi...> - 2013-04-06 15:28:58
|
Dear Matthew (cc eXist-open), Bingo! I'm retrieving marked-up mixed content by matching on the string value, so all returns will be string-equal and have the same string-length. A useful place where they'll differ, as you point out, will be in the string-length of the text() nodes before the <stress> element. Some words won't have a <stress> element, and some will also have a <secStress> element for secondary stress, but none will have more than one <stress> element. Thank you for the reminder that sometimes the easiest way to solve a problem is to pretend that it's a different problem. Best, David dj...@gm... __ From: "Matthew L. Avizinis" <ma...@gl...> Organization: Gleim Publications, Inc. Date: Thursday, April 4, 2013 8:31 PM To: David Birnbaum <dj...@pi...> Cc: "exi...@li..." <exi...@li...> Subject: Re: [Exist-open] deduplicating mixed-content results Hello David, I'm not really familiar with Russian at all, so I don't know whether your data would have more than one <stress> element within each <form> element. However, based on what you've given, how about this? 1) string-length(form/text()) will be equal for a given group of <form> elements 2) you'll also have the same number of text nodes for a given group Hence, 3) all you'll have to do is check if the string-length of all text nodes for a given set of siblings is equal. If they're all the same length, bang!, you know it's not distinct. Regards, Matthew L. Avizinis Gleim Publications, Inc <http://www.gleim.com> |