That makes sense to me.  Do you plan to add additional switches for these things or just handle them through environment variables?

- Demian

From: Tod Olson []
Sent: Friday, December 14, 2012 11:25 AM
To: Demian Katz
Cc: Tod Olson; Tech Mailinglist
Subject: Re: VF2.0 import scripts taking more that one file

Yes, I'm coming to the conclusion that the -p is a waste of time, but the logging, basepath, and no-move options are a good use of time.


On Dec 13, 2012, at 9:55 AM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote:

Even if you simply revert your changes, the scripts are not really inconsistent in the sense that they do not use the same syntax to do different things. supports a switch that does not, and supports a second parameter that does not.
If you want consistent interfaces between and, then the thing to do would be:
1.)    Add -p support to so that users can override the default PROPERTIES_FILE value
2.)    Add a second parameter to so that users can provide additional mappings to be appended onto the MAPPINGS_FILE list
It’s really a question of whether this offers any value for anyone.  It certainly wouldn’t hurt to add these things, but if nobody uses them, it’s a waste of your time.
Does that make sense?
- Demian
From: Tod Olson [] 
Sent: Thursday, December 13, 2012 10:48 AM
To: Demian Katz
Cc: Tod Olson; Tech Mailinglist
Subject: Re: VF2.0 import scripts taking more that one file
Re. JIRA: good, will do.
Re. parameter handling: aha! Thanks for seeing that. in this area, I can easily change the -p to be correct, or I could just revert. Do you have a sense of which makes more sense? 
For me, while I like consistency in the command-line scripts, the worst outcome would be altering the scripts in a way that would not be rolled back into the trunk.
On Dec 13, 2012, at 8:01 AM, Demian Katz <>

Thanks for sharing this.  A couple of comments:
- It probably wouldn’t hurt to open a JIRA ticket for this; I don’t want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that… so having a ticket will prevent it from getting lost and forgotten.
- The parameter handling in should probably be reverted or changed.  There are two different properties files used by SolrMarc: the “import properties” (which is general settings for the application) and the “marc properties” (which is the mappings for importing).  The -p parameter to is used to set “import properties,” but you have changed so that it instead sets “marc properties.”  If you want to implement -p in, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable.  The optional mapping overrides should probably remain an optional second parameter for backward compatibility.
From: Tod Olson [] 
Sent: Wednesday, December 12, 2012 3:18 PM
To: Demian Katz
Cc: Tod Olson; Tech Mailinglist
Subject: Re: VF2.0 import scripts taking more that one file

Here's a patch. Let me know if you'd prefer this as a JIRA ticket.

The path does the following:

- now takes a -p option to specify the properties file, matching
- batch-import-marc*.sh captures stderr to log file, stdout is not captured
- echoes the command to stderr, so it gets logged with the solrmarc messages
- per-input-file logs by default, setting LOG_FILE sends entire run to one log file
- output to log files now appends, so above is possible

The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this.

You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent.


On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote:

> I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else.  I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs...
> - Demian
>> -----Original Message-----
>> From: Tod Olson []
>> Sent: Tuesday, December 11, 2012 5:08 PM
>> To: Demian Katz
>> Cc: Tod Olson; Tech Mailinglist
>> Subject: Re: VF2.0 import scripts taking more that one file
>> Returning to capturing output from the harvest scripts, I'd like some input on
>> a minor point.
>> Currently stdout gets informative messages like so:
>> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf-
>> 8_00_121206230000.mrc ...
>> /usr/local/bin/java -Xms512m -Xmx512m -
>> /magma/vufind2/import/ -jar
>> /data/magma/vufind2/import/SolrMarc.jar
>> /data/magma/vufind2/local/import/
>> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf-
>> 8_00_121206230000.mrc
>> and all of the solrmarc messages (record number, stack traces on failure,
>> etc.) go to stderr.
>> I kind of think that the options are:
>> (a) everything goes to a log file,
>> (b) stdout can go to the terminal and stderr should go to the log file, or
>> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else
>> goes to stderr/the log.
>> Are there any strong feelings about which is the right way? Personally, I'm
>> kind of inclined towards (c), but maybe sites who are in production have a
>> different view.
>> -Tod
>> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote:
>>> I don't have a problem with changing the batch MARC import scripts to
>> capture stderr; I believe that when they were originally written, all SolrMarc
>> output was written to stdout -- it began using stderr more appropriately in
>> relatively recent updates.
>>> The only other refactoring you might need to do is to allow a way of
>> specifying a full directory path -- right now, the scripts assume that all
>> files live under VuFind's harvest directory, but in a situation not linked to
>> the OAI harvester, the files might be somewhere else.  You might also want to
>> add a switch to disable the "move to processed directory" functionality and/or
>> a switch to control logging (i.e. optionally disable by sending to null).
>>> - Demian
>>>> -----Original Message-----
>>>> From: Tod Olson []
>>>> Sent: Wednesday, October 31, 2012 1:14 PM
>>>> To: Demian Katz
>>>> Cc: Tod Olson; Tech Mailinglist
>>>> Subject: Re: VF2.0 import scripts taking more that one file
>>>> Yes, looking at the harvest/ scripts for marc records, I see see that
>> stdout
>>>> is directed to an output file, but stderr is not written to disk. Since
>> stderr
>>>> has all of the error info, I'm inclined to capture it. I can also see where
>>>> people would not want the error logs taking up disc space, since there's a
>>>> message for every record. But sending all output to a file is a little more
>>>> cron-friendly.
>>>> I may be willing to refactor a couple of those batch scripts (no commitment
>>>> yet), but I'd like a little input on what sort of requirements other sites
>>>> would have.
>>>> -Tod
>>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <>
>>>> wrote:
>>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for
>>>> correcting that.
>>>>> I'll pop a patch into JIRA when I can.
>>>>> -Tod
>>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <>
>>>>> wrote:
>>>>>> There are batch import scripts in the harvest directory -- you might be
>>>> able to use those.  If not, perhaps some refactoring can make all the
>> existing
>>>> tools more flexible.  Also, if you add -p support to the auth script,
>> please
>>>> submit a patch and I'll be happy to merge that into master.
>>>>>> thanks,
>>>>>> Demian
>>>>>> ________________________________________
>>>>>> From: Tod Olson []
>>>>>> Sent: Tuesday, October 30, 2012 8:06 PM
>>>>>> To: Tech Mailinglist
>>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file
>>>>>> I find that it would be useful for my site if the and
>>>> I could easily hack those two shell scripts to take
>> some
>>>> arbitrary number of files as arguments and loop over them, and submit a
>> patch.
>>>> Would that be of use to other sites?
>>>>>> Otherwise, I'll just write wrappers around them for local use.
>>>>>> The one interface change that I'd want to implement: it would be easier
>> if
>>>> I changed to take a profile file with a -p argument
>> like
>>>>>> -Tod
>>>>>> Tod Olson <>
>>>>>> Systems Librarian
>>>>>> University of Chicago Library
>>>>>> -------------------------------------------------------------------------
>> --
>>>> ---
>>>>>> Everyone hates slow websites. So do we.
>>>>>> Make your web apps faster with AppDynamics
>>>>>> Download AppDynamics Lite for free today:
>>>>>> _______________________________________________
>>>>>> Vufind-tech mailing list