From: Robert H. <rh...@vi...> - 2014-04-11 21:08:52
|
<rant> The idea of providing a version of some data that should be used for sorting is a reasonable goal. The proposed solution of marking sections that shouldn't be used for sorting inline within the piece of data using new invisible control characters is a bad idea. The description demonstrating their use with visible characters is an even worse idea, . I'm sure someone, somewhere thought this would be a good idea, they are probably not the ones who will have to edit the data to specify the non-sorting sections, or to change how the editor works to support this or to modify the programs that will then have to with this newly introduced mis-feature. How do you type an invisible character that doesn't exist on a any keyboard? How do you allow a user to see where the invisible character they typed is? Simple. Choose some other visible character to represent that invisible character in the editor, and change those visible characters to the control characters as the record is saved. Just like many editors do for the invisible sub-field separator character, which they represent with a '$' or a '|'. And doubtlessly some implementations will gleefully transform /*all* /occurrences of the visible surrogates to the corresponding invisible control characters, even instances where the visible character is intended to represent itself rather than being a visible surrogate for the invisible control character. Which is exactly what some existing commercial ILS software packages do when the '|' is used as a visible surrogate for the invisible sub-field separator character. This seems a solution designed to maintain backward compatibility with design choices that were made decades ago in support of the then-requirements of : 1) Make sure the MARC records are small because memory and disk space are really expensive, and 2) Ensure that if you discard the control codes and labels, the record is suitable for printing as-is onto a small piece of card stock. The simple fact that we are saddled with decisions made decades ago for reasons that are no longer relevant, is no valid justification for making new bad decisions that adhere to those same reasons. </rant> If this extension cannot be smothered in the proverbial cradle, then Tod's proposed approach seems like the quickest way to handle this. A better way might be at a lower level in Marc4j, such that each field/subfield that supports this new "feature" would support getters such as "getSortableData" "getPrintableData" and "getRawData" instead of the current, simple "getData". However that would require extensive changes throughout the Marc4j library, and extensive changes by every program that uses the Marc4j library. All for very little benefit, since I foresee implementation of this proposal in actual records in actual libraries will be so slow and sparse that special-case string handling being folded into SolrMarc, as Tod proposes will likely be sufficient until binary Marc records finally disappear altogether. -Bob Haschart On 4/11/2014 1:12 PM, Joe Atzberger wrote: > This is hugely problematic for MARC toolchains. Not fun stuff. > > > On Thu, Apr 10, 2014 at 9:18 AM, Demian Katz > <dem...@vi... <mailto:dem...@vi...>> wrote: > > That's the first I've heard of this extension. I'm a bit surprised > that this change was approved -- seems like it's going to create a > lot of work for a lot of people! In any case, I agree with Tod's > proposed approach. I'm also copying solrmarc-tech in case anyone > there has already done work on this. > > - Demian > > > -----Original Message----- > > From: Tod Olson [mailto:to...@uc... <mailto:to...@uc...>] > > Sent: Thursday, April 10, 2014 8:03 AM > > To: Frédéric Demians > > Cc: vufind-tech > > Subject: Re: [VuFind-Tech] Non-Sorting Control Characters > > > > For title, the script that populates the title_sort field > currently honors > > the non-filing indicator. I suggest modify that script to also > honor the > > non-sorting characters and contribute that back as a patch. > > > > To apply this to author, you would want to establish a separate > Solr field > > for the sortable version of the author and arrange for it to be > populated > > by a script that trims the non-sorting characters. If more MARC > records > > starting to include those characters, that could also be a > useful patch to > > contribute. > > > > Best, > > > > -Tod > > > > On Apr 10, 2014, at 3:07 AM, Frédéric Demians > <f.d...@ta... <mailto:f.d...@ta...>> wrote: > > > > > Hi, > > > > > > Non-sorting characters are available in biblio records: > > > > > > http://www.loc.gov/marc/nonsorting.html > > > > > > As far as I know, there aren't taken into account in VuFind for > > > sorting by author/title. Is there anyone working on adding > non-sorting > > > characters support into VuFind ? If no, has anyone suggestions > on how > > > to implement this functionality? > > > > > > Kind regards, > > > -- > > > Frédéric DEMIANS > > > http://www.tamil.fr/fdemians > > > > > > > ---------------------------------------------------------------------------- > > -- > > > Put Bad Developers to Shame > > > Dominate Development with Jenkins Continuous Integration > > > Continuously Automate Build, Test & Deployment > > > Start a new project now. Try Jenkins in the cloud. > > > http://p.sf.net/sfu/13600_Cloudbees > > > _______________________________________________ > > > Vufind-tech mailing list > > > Vuf...@li... > <mailto:Vuf...@li...> > > > https://lists.sourceforge.net/lists/listinfo/vufind-tech > > > > > > > ------------------------------------------------------------------------------ > > Put Bad Developers to Shame > > Dominate Development with Jenkins Continuous Integration > > Continuously Automate Build, Test & Deployment > > Start a new project now. Try Jenkins in the cloud. > > http://p.sf.net/sfu/13600_Cloudbees > > _______________________________________________ > > Vufind-tech mailing list > > Vuf...@li... > <mailto:Vuf...@li...> > > https://lists.sourceforge.net/lists/listinfo/vufind-tech > > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > <mailto:Vuf...@li...> > https://lists.sourceforge.net/lists/listinfo/vufind-tech > > > -- > You received this message because you are subscribed to the Google > Groups "solrmarc-tech" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to sol...@go... > <mailto:sol...@go...>. > To post to this group, send email to sol...@go... > <mailto:sol...@go...>. > Visit this group at http://groups.google.com/group/solrmarc-tech. > For more options, visit https://groups.google.com/d/optout. |