Also, the non-sorting control characters have different code points in MARC ANSEL vs UCS, so that conversion must be added to marc4j.

-Tod

On Apr 11, 2014, at 4:08 PM, Robert Haschart <rh9ec@virginia.edu> wrote:

<rant>

The idea of providing a version of some data that should be used for sorting is a reasonable goal.  The proposed solution of marking sections that shouldn't be used for sorting inline within the piece of data using new invisible control characters is a bad idea.   The description demonstrating their use with visible characters is an even worse idea, .   I'm sure someone, somewhere thought this would be a good idea, they are probably not the ones who will have to edit the data to specify the non-sorting sections,  or to change how the editor works to support this or to modify the programs that will then have to with this newly introduced mis-feature.

How do you type an invisible character that doesn't exist on a any keyboard?  How do you allow a user to see where the invisible character they typed is?
Simple.  Choose some other visible character to represent that invisible character in the editor, and change those visible characters to the control characters as the record is saved.   Just like many editors do for the invisible sub-field separator character, which they represent with a '$' or a '|'.   And doubtlessly some implementations will gleefully transform all occurrences of the visible surrogates to the corresponding invisible control characters, even instances where the visible character is intended to represent itself rather than being a visible surrogate for the invisible control character.   Which is exactly what some existing commercial ILS software packages do when the '|' is used as a visible surrogate for the invisible sub-field separator character.  

This seems a solution designed to maintain backward compatibility with design choices that were made decades ago in support of the then-requirements of :  1) Make sure the MARC records are small because memory and disk space are really expensive, and
2) Ensure that if you discard the control codes and labels, the record is suitable for printing as-is onto a small piece of card stock.

The simple fact that we are saddled with decisions made decades ago for reasons that are no longer relevant, is no valid justification for making new bad decisions that adhere to those same reasons.
 
</rant>

If this extension cannot be smothered in the proverbial cradle, then Tod's proposed approach seems like the quickest way to handle this.  
A better way might be at a lower level in Marc4j, such that each field/subfield that supports this new "feature" would support getters such as "getSortableData" "getPrintableData" and "getRawData"  instead of the current, simple "getData".    However that would require extensive changes throughout the Marc4j library, and extensive changes by every program that uses the Marc4j library.   All for very little benefit, since I foresee implementation of this proposal in actual records in actual libraries will be so slow and sparse that special-case string handling being folded into SolrMarc, as Tod proposes will likely be sufficient until binary Marc records finally disappear altogether.

-Bob Haschart



On 4/11/2014 1:12 PM, Joe Atzberger wrote:
This is hugely problematic for MARC toolchains.  Not fun stuff.


On Thu, Apr 10, 2014 at 9:18 AM, Demian Katz <demian.katz@villanova.edu> wrote:
That's the first I've heard of this extension. I'm a bit surprised that this change was approved -- seems like it's going to create a lot of work for a lot of people! In any case, I agree with Tod's proposed approach. I'm also copying solrmarc-tech in case anyone there has already done work on this.

- Demian

> -----Original Message-----
> From: Tod Olson [mailto:tod@uchicago.edu]
> Sent: Thursday, April 10, 2014 8:03 AM
> To: Frédéric Demians
> Cc: vufind-tech
> Subject: Re: [VuFind-Tech] Non-Sorting Control Characters
>
> For title, the script that populates the title_sort field currently honors
> the non-filing indicator. I suggest modify that script to also honor the
> non-sorting characters and contribute that back as a patch.
>
> To apply this to author, you would want to establish a separate Solr field
> for the sortable version of the author and arrange for it to be populated
> by a script that trims the non-sorting characters. If more MARC records
> starting to include those characters, that could also be a useful patch to
> contribute.
>
> Best,
>
> -Tod
>
> On Apr 10, 2014, at 3:07 AM, Frédéric Demians <f.demians@tamil.fr> wrote:
>
> > Hi,
> >
> > Non-sorting characters are available in biblio records:
> >
> >  http://www.loc.gov/marc/nonsorting.html
> >
> > As far as I know, there aren't taken into account in VuFind for
> > sorting by author/title. Is there anyone working on adding non-sorting
> > characters support into VuFind ? If no, has anyone suggestions on how
> > to implement this functionality?
> >
> > Kind regards,
> > --
> > Frédéric DEMIANS
> > http://www.tamil.fr/fdemians
> >
> > ----------------------------------------------------------------------------
> --
> > Put Bad Developers to Shame
> > Dominate Development with Jenkins Continuous Integration
> > Continuously Automate Build, Test & Deployment
> > Start a new project now. Try Jenkins in the cloud.
> > http://p.sf.net/sfu/13600_Cloudbees
> > _______________________________________________
> > Vufind-tech mailing list
> > Vufind-tech@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/vufind-tech
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> Vufind-tech mailing list
> Vufind-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/vufind-tech

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Vufind-tech mailing list
Vufind-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/vufind-tech

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tech+unsubscribe@googlegroups.com.
To post to this group, send email to solrmarc-tech@googlegroups.com.
Visit this group at http://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tech+unsubscribe@googlegroups.com.
To post to this group, send email to solrmarc-tech@googlegroups.com.
Visit this group at http://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.