From: Demian K. <dem...@vi...> - 2012-12-13 14:01:31
|
Thanks for sharing this. A couple of comments: - It probably wouldn't hurt to open a JIRA ticket for this; I don't want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that... so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the "import properties" (which is general settings for the application) and the "marc properties" (which is the mappings for importing). The -p parameter to import-marc.sh is used to set "import properties," but you have changed import-marc-auth.sh so that it instead sets "marc properties." If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li... Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li... Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li... Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li... Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |