From: Tod O. <to...@uc...> - 2012-10-31 00:06:38
|
I find that it would be useful for my site if the import-marc.sh and import-marc-auth.sh. I could easily hack those two shell scripts to take some arbitrary number of files as arguments and loop over them, and submit a patch. Would that be of use to other sites? Otherwise, I'll just write wrappers around them for local use. The one interface change that I'd want to implement: it would be easier if I changed import-marc-auth.sh to take a profile file with a -p argument like import-marc.sh. -Tod Tod Olson <to...@uc...> Systems Librarian University of Chicago Library |
From: Demian K. <dem...@vi...> - 2012-10-31 03:53:04
|
There are batch import scripts in the harvest directory -- you might be able to use those. If not, perhaps some refactoring can make all the existing tools more flexible. Also, if you add -p support to the auth script, please submit a patch and I'll be happy to merge that into master. thanks, Demian ________________________________________ From: Tod Olson [to...@uc...] Sent: Tuesday, October 30, 2012 8:06 PM To: vuf...@li... Tech Mailinglist Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file I find that it would be useful for my site if the import-marc.sh and import-marc-auth.sh. I could easily hack those two shell scripts to take some arbitrary number of files as arguments and loop over them, and submit a patch. Would that be of use to other sites? Otherwise, I'll just write wrappers around them for local use. The one interface change that I'd want to implement: it would be easier if I changed import-marc-auth.sh to take a profile file with a -p argument like import-marc.sh. -Tod Tod Olson <to...@uc...> Systems Librarian University of Chicago Library ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Vufind-tech mailing list Vuf...@li... https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Tod O. <to...@uc...> - 2012-10-31 16:22:48
|
Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for correcting that. I'll pop a patch into JIRA when I can. -Tod On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...> wrote: > There are batch import scripts in the harvest directory -- you might be able to use those. If not, perhaps some refactoring can make all the existing tools more flexible. Also, if you add -p support to the auth script, please submit a patch and I'll be happy to merge that into master. > > thanks, > Demian > ________________________________________ > From: Tod Olson [to...@uc...] > Sent: Tuesday, October 30, 2012 8:06 PM > To: vuf...@li... Tech Mailinglist > Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file > > I find that it would be useful for my site if the import-marc.sh and import-marc-auth.sh. I could easily hack those two shell scripts to take some arbitrary number of files as arguments and loop over them, and submit a patch. Would that be of use to other sites? > > Otherwise, I'll just write wrappers around them for local use. > > The one interface change that I'd want to implement: it would be easier if I changed import-marc-auth.sh to take a profile file with a -p argument like import-marc.sh. > > -Tod > > Tod Olson <to...@uc...> > Systems Librarian > University of Chicago Library > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Tod O. <to...@uc...> - 2012-10-31 17:14:31
|
Yes, looking at the harvest/ scripts for marc records, I see see that stdout is directed to an output file, but stderr is not written to disk. Since stderr has all of the error info, I'm inclined to capture it. I can also see where people would not want the error logs taking up disc space, since there's a message for every record. But sending all output to a file is a little more cron-friendly. I may be willing to refactor a couple of those batch scripts (no commitment yet), but I'd like a little input on what sort of requirements other sites would have. -Tod On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...> wrote: > Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for correcting that. > > I'll pop a patch into JIRA when I can. > > -Tod > > On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...> > wrote: > >> There are batch import scripts in the harvest directory -- you might be able to use those. If not, perhaps some refactoring can make all the existing tools more flexible. Also, if you add -p support to the auth script, please submit a patch and I'll be happy to merge that into master. >> >> thanks, >> Demian >> ________________________________________ >> From: Tod Olson [to...@uc...] >> Sent: Tuesday, October 30, 2012 8:06 PM >> To: vuf...@li... Tech Mailinglist >> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >> >> I find that it would be useful for my site if the import-marc.sh and import-marc-auth.sh. I could easily hack those two shell scripts to take some arbitrary number of files as arguments and loop over them, and submit a patch. Would that be of use to other sites? >> >> Otherwise, I'll just write wrappers around them for local use. >> >> The one interface change that I'd want to implement: it would be easier if I changed import-marc-auth.sh to take a profile file with a -p argument like import-marc.sh. >> >> -Tod >> >> Tod Olson <to...@uc...> >> Systems Librarian >> University of Chicago Library >> >> >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_sfd2d_oct >> _______________________________________________ >> Vufind-tech mailing list >> Vuf...@li... >> https://lists.sourceforge.net/lists/listinfo/vufind-tech > |
From: Demian K. <dem...@vi...> - 2012-10-31 17:32:15
|
I don't have a problem with changing the batch MARC import scripts to capture stderr; I believe that when they were originally written, all SolrMarc output was written to stdout -- it began using stderr more appropriately in relatively recent updates. The only other refactoring you might need to do is to allow a way of specifying a full directory path -- right now, the scripts assume that all files live under VuFind's harvest directory, but in a situation not linked to the OAI harvester, the files might be somewhere else. You might also want to add a switch to disable the "move to processed directory" functionality and/or a switch to control logging (i.e. optionally disable by sending to null). - Demian > -----Original Message----- > From: Tod Olson [mailto:to...@uc...] > Sent: Wednesday, October 31, 2012 1:14 PM > To: Demian Katz > Cc: Tod Olson; vuf...@li... Tech Mailinglist > Subject: Re: VF2.0 import scripts taking more that one file > > Yes, looking at the harvest/ scripts for marc records, I see see that stdout > is directed to an output file, but stderr is not written to disk. Since stderr > has all of the error info, I'm inclined to capture it. I can also see where > people would not want the error logs taking up disc space, since there's a > message for every record. But sending all output to a file is a little more > cron-friendly. > > I may be willing to refactor a couple of those batch scripts (no commitment > yet), but I'd like a little input on what sort of requirements other sites > would have. > > -Tod > > On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...> > wrote: > > > Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for > correcting that. > > > > I'll pop a patch into JIRA when I can. > > > > -Tod > > > > On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...> > > wrote: > > > >> There are batch import scripts in the harvest directory -- you might be > able to use those. If not, perhaps some refactoring can make all the existing > tools more flexible. Also, if you add -p support to the auth script, please > submit a patch and I'll be happy to merge that into master. > >> > >> thanks, > >> Demian > >> ________________________________________ > >> From: Tod Olson [to...@uc...] > >> Sent: Tuesday, October 30, 2012 8:06 PM > >> To: vuf...@li... Tech Mailinglist > >> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file > >> > >> I find that it would be useful for my site if the import-marc.sh and > import-marc-auth.sh. I could easily hack those two shell scripts to take some > arbitrary number of files as arguments and loop over them, and submit a patch. > Would that be of use to other sites? > >> > >> Otherwise, I'll just write wrappers around them for local use. > >> > >> The one interface change that I'd want to implement: it would be easier if > I changed import-marc-auth.sh to take a profile file with a -p argument like > import-marc.sh. > >> > >> -Tod > >> > >> Tod Olson <to...@uc...> > >> Systems Librarian > >> University of Chicago Library > >> > >> > >> > >> > >> --------------------------------------------------------------------------- > --- > >> Everyone hates slow websites. So do we. > >> Make your web apps faster with AppDynamics > >> Download AppDynamics Lite for free today: > >> http://p.sf.net/sfu/appdyn_sfd2d_oct > >> _______________________________________________ > >> Vufind-tech mailing list > >> Vuf...@li... > >> https://lists.sourceforge.net/lists/listinfo/vufind-tech > > |
From: Tod O. <to...@uc...> - 2012-12-11 22:08:02
|
Returning to capturing output from the harvest scripts, I'd like some input on a minor point. Currently stdout gets informative messages like so: Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf-8_00_121206230000.mrc ... /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority -Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data/magma/vufind2/import/marc_auth.properties -jar /data/magma/vufind2/import/SolrMarc.jar /data/magma/vufind2/local/import/import_auth.properties /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf-8_00_121206230000.mrc and all of the solrmarc messages (record number, stack traces on failure, etc.) go to stderr. I kind of think that the options are: (a) everything goes to a log file, (b) stdout can go to the terminal and stderr should go to the log file, or (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else goes to stderr/the log. Are there any strong feelings about which is the right way? Personally, I'm kind of inclined towards (c), but maybe sites who are in production have a different view. -Tod On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote: > I don't have a problem with changing the batch MARC import scripts to capture stderr; I believe that when they were originally written, all SolrMarc output was written to stdout -- it began using stderr more appropriately in relatively recent updates. > > The only other refactoring you might need to do is to allow a way of specifying a full directory path -- right now, the scripts assume that all files live under VuFind's harvest directory, but in a situation not linked to the OAI harvester, the files might be somewhere else. You might also want to add a switch to disable the "move to processed directory" functionality and/or a switch to control logging (i.e. optionally disable by sending to null). > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Wednesday, October 31, 2012 1:14 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li... Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Yes, looking at the harvest/ scripts for marc records, I see see that stdout >> is directed to an output file, but stderr is not written to disk. Since stderr >> has all of the error info, I'm inclined to capture it. I can also see where >> people would not want the error logs taking up disc space, since there's a >> message for every record. But sending all output to a file is a little more >> cron-friendly. >> >> I may be willing to refactor a couple of those batch scripts (no commitment >> yet), but I'd like a little input on what sort of requirements other sites >> would have. >> >> -Tod >> >> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...> >> wrote: >> >>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >> correcting that. >>> >>> I'll pop a patch into JIRA when I can. >>> >>> -Tod >>> >>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...> >>> wrote: >>> >>>> There are batch import scripts in the harvest directory -- you might be >> able to use those. If not, perhaps some refactoring can make all the existing >> tools more flexible. Also, if you add -p support to the auth script, please >> submit a patch and I'll be happy to merge that into master. >>>> >>>> thanks, >>>> Demian >>>> ________________________________________ >>>> From: Tod Olson [to...@uc...] >>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>> To: vuf...@li... Tech Mailinglist >>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>> >>>> I find that it would be useful for my site if the import-marc.sh and >> import-marc-auth.sh. I could easily hack those two shell scripts to take some >> arbitrary number of files as arguments and loop over them, and submit a patch. >> Would that be of use to other sites? >>>> >>>> Otherwise, I'll just write wrappers around them for local use. >>>> >>>> The one interface change that I'd want to implement: it would be easier if >> I changed import-marc-auth.sh to take a profile file with a -p argument like >> import-marc.sh. >>>> >>>> -Tod >>>> >>>> Tod Olson <to...@uc...> >>>> Systems Librarian >>>> University of Chicago Library >>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------------- >> --- >>>> Everyone hates slow websites. So do we. >>>> Make your web apps faster with AppDynamics >>>> Download AppDynamics Lite for free today: >>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>> _______________________________________________ >>>> Vufind-tech mailing list >>>> Vuf...@li... >>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>> > |
From: Demian K. <dem...@vi...> - 2012-12-12 14:58:57
|
I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... - Demian > -----Original Message----- > From: Tod Olson [mailto:to...@uc...] > Sent: Tuesday, December 11, 2012 5:08 PM > To: Demian Katz > Cc: Tod Olson; vuf...@li... Tech Mailinglist > Subject: Re: VF2.0 import scripts taking more that one file > > Returning to capturing output from the harvest scripts, I'd like some input on > a minor point. > > Currently stdout gets informative messages like so: > > Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- > 8_00_121206230000.mrc ... > /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - > Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data > /magma/vufind2/import/marc_auth.properties -jar > /data/magma/vufind2/import/SolrMarc.jar > /data/magma/vufind2/local/import/import_auth.properties > /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- > 8_00_121206230000.mrc > > and all of the solrmarc messages (record number, stack traces on failure, > etc.) go to stderr. > > I kind of think that the options are: > (a) everything goes to a log file, > (b) stdout can go to the terminal and stderr should go to the log file, or > (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else > goes to stderr/the log. > > Are there any strong feelings about which is the right way? Personally, I'm > kind of inclined towards (c), but maybe sites who are in production have a > different view. > > -Tod > > On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote: > > > I don't have a problem with changing the batch MARC import scripts to > capture stderr; I believe that when they were originally written, all SolrMarc > output was written to stdout -- it began using stderr more appropriately in > relatively recent updates. > > > > The only other refactoring you might need to do is to allow a way of > specifying a full directory path -- right now, the scripts assume that all > files live under VuFind's harvest directory, but in a situation not linked to > the OAI harvester, the files might be somewhere else. You might also want to > add a switch to disable the "move to processed directory" functionality and/or > a switch to control logging (i.e. optionally disable by sending to null). > > > > - Demian > > > >> -----Original Message----- > >> From: Tod Olson [mailto:to...@uc...] > >> Sent: Wednesday, October 31, 2012 1:14 PM > >> To: Demian Katz > >> Cc: Tod Olson; vuf...@li... Tech Mailinglist > >> Subject: Re: VF2.0 import scripts taking more that one file > >> > >> Yes, looking at the harvest/ scripts for marc records, I see see that > stdout > >> is directed to an output file, but stderr is not written to disk. Since > stderr > >> has all of the error info, I'm inclined to capture it. I can also see where > >> people would not want the error logs taking up disc space, since there's a > >> message for every record. But sending all output to a file is a little more > >> cron-friendly. > >> > >> I may be willing to refactor a couple of those batch scripts (no commitment > >> yet), but I'd like a little input on what sort of requirements other sites > >> would have. > >> > >> -Tod > >> > >> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...> > >> wrote: > >> > >>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for > >> correcting that. > >>> > >>> I'll pop a patch into JIRA when I can. > >>> > >>> -Tod > >>> > >>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...> > >>> wrote: > >>> > >>>> There are batch import scripts in the harvest directory -- you might be > >> able to use those. If not, perhaps some refactoring can make all the > existing > >> tools more flexible. Also, if you add -p support to the auth script, > please > >> submit a patch and I'll be happy to merge that into master. > >>>> > >>>> thanks, > >>>> Demian > >>>> ________________________________________ > >>>> From: Tod Olson [to...@uc...] > >>>> Sent: Tuesday, October 30, 2012 8:06 PM > >>>> To: vuf...@li... Tech Mailinglist > >>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file > >>>> > >>>> I find that it would be useful for my site if the import-marc.sh and > >> import-marc-auth.sh. I could easily hack those two shell scripts to take > some > >> arbitrary number of files as arguments and loop over them, and submit a > patch. > >> Would that be of use to other sites? > >>>> > >>>> Otherwise, I'll just write wrappers around them for local use. > >>>> > >>>> The one interface change that I'd want to implement: it would be easier > if > >> I changed import-marc-auth.sh to take a profile file with a -p argument > like > >> import-marc.sh. > >>>> > >>>> -Tod > >>>> > >>>> Tod Olson <to...@uc...> > >>>> Systems Librarian > >>>> University of Chicago Library > >>>> > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------------- > -- > >> --- > >>>> Everyone hates slow websites. So do we. > >>>> Make your web apps faster with AppDynamics > >>>> Download AppDynamics Lite for free today: > >>>> http://p.sf.net/sfu/appdyn_sfd2d_oct > >>>> _______________________________________________ > >>>> Vufind-tech mailing list > >>>> Vuf...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech > >>> > > |
From: Demian K. <dem...@vi...> - 2012-12-13 14:01:31
|
Thanks for sharing this. A couple of comments: - It probably wouldn't hurt to open a JIRA ticket for this; I don't want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that... so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the "import properties" (which is general settings for the application) and the "marc properties" (which is the mappings for importing). The -p parameter to import-marc.sh is used to set "import properties," but you have changed import-marc-auth.sh so that it instead sets "marc properties." If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li... Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li... Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li... Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li... Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |
From: Tod O. <to...@uc...> - 2012-12-13 15:48:37
|
Re. JIRA: good, will do. Re. parameter handling: aha! Thanks for seeing that. in this area, I can easily change the -p to be correct, or I could just revert. Do you have a sense of which makes more sense? For me, while I like consistency in the command-line scripts, the worst outcome would be altering the scripts in a way that would not be rolled back into the trunk. -Tod On Dec 13, 2012, at 8:01 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: Thanks for sharing this. A couple of comments: - It probably wouldn’t hurt to open a JIRA ticket for this; I don’t want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that… so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the “import properties” (which is general settings for the application) and the “marc properties” (which is the mappings for importing). The -p parameter to import-marc.sh is used to set “import properties,” but you have changed import-marc-auth.sh so that it instead sets “marc properties.” If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...<mailto:to...@uc...>> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...<mailto:to...@uc...>] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...<mailto:to...@uc...>> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li...<mailto:Vuf...@li...> >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |
From: Demian K. <dem...@vi...> - 2012-12-13 15:55:57
|
Even if you simply revert your changes, the scripts are not really inconsistent in the sense that they do not use the same syntax to do different things. import-marc.sh supports a switch that import-marc-auth.sh does not, and import-marc-auth.sh supports a second parameter that import-marc.sh does not. If you want consistent interfaces between import-marc.sh and import-marc-auth.sh, then the thing to do would be: 1.) Add -p support to import-marc-auth.sh so that users can override the default PROPERTIES_FILE value 2.) Add a second parameter to import-marc.sh so that users can provide additional mappings to be appended onto the MAPPINGS_FILE list It's really a question of whether this offers any value for anyone. It certainly wouldn't hurt to add these things, but if nobody uses them, it's a waste of your time. Does that make sense? - Demian From: Tod Olson [mailto:to...@uc...] Sent: Thursday, December 13, 2012 10:48 AM To: Demian Katz Cc: Tod Olson; vuf...@li... Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Re. JIRA: good, will do. Re. parameter handling: aha! Thanks for seeing that. in this area, I can easily change the -p to be correct, or I could just revert. Do you have a sense of which makes more sense? For me, while I like consistency in the command-line scripts, the worst outcome would be altering the scripts in a way that would not be rolled back into the trunk. -Tod On Dec 13, 2012, at 8:01 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: Thanks for sharing this. A couple of comments: - It probably wouldn't hurt to open a JIRA ticket for this; I don't want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that... so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the "import properties" (which is general settings for the application) and the "marc properties" (which is the mappings for importing). The -p parameter to import-marc.sh is used to set "import properties," but you have changed import-marc-auth.sh so that it instead sets "marc properties." If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...<mailto:to...@uc...>> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...<mailto:to...@uc...>] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...<mailto:to...@uc...>> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li...<mailto:Vuf...@li...> >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |
From: Tod O. <to...@uc...> - 2012-12-14 16:25:55
|
Yes, I'm coming to the conclusion that the -p is a waste of time, but the logging, basepath, and no-move options are a good use of time. -Tod On Dec 13, 2012, at 9:55 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: Even if you simply revert your changes, the scripts are not really inconsistent in the sense that they do not use the same syntax to do different things. import-marc.sh supports a switch that import-marc-auth.sh does not, and import-marc-auth.sh supports a second parameter that import-marc.sh does not. If you want consistent interfaces between import-marc.sh and import-marc-auth.sh, then the thing to do would be: 1.) Add -p support to import-marc-auth.sh so that users can override the default PROPERTIES_FILE value 2.) Add a second parameter to import-marc.sh so that users can provide additional mappings to be appended onto the MAPPINGS_FILE list It’s really a question of whether this offers any value for anyone. It certainly wouldn’t hurt to add these things, but if nobody uses them, it’s a waste of your time. Does that make sense? - Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu>] Sent: Thursday, December 13, 2012 10:48 AM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Re. JIRA: good, will do. Re. parameter handling: aha! Thanks for seeing that. in this area, I can easily change the -p to be correct, or I could just revert. Do you have a sense of which makes more sense? For me, while I like consistency in the command-line scripts, the worst outcome would be altering the scripts in a way that would not be rolled back into the trunk. -Tod On Dec 13, 2012, at 8:01 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: Thanks for sharing this. A couple of comments: - It probably wouldn’t hurt to open a JIRA ticket for this; I don’t want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that… so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the “import properties” (which is general settings for the application) and the “marc properties” (which is the mappings for importing). The -p parameter to import-marc.sh is used to set “import properties,” but you have changed import-marc-auth.sh so that it instead sets “marc properties.” If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...<mailto:to...@uc...>> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...<mailto:to...@uc...>] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...<mailto:to...@uc...>> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li...<mailto:Vuf...@li...> >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |
From: Demian K. <dem...@vi...> - 2012-12-14 16:29:49
|
That makes sense to me. Do you plan to add additional switches for these things or just handle them through environment variables? - Demian ________________________________ From: Tod Olson [to...@uc...] Sent: Friday, December 14, 2012 11:25 AM To: Demian Katz Cc: Tod Olson; vuf...@li... Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Yes, I'm coming to the conclusion that the -p is a waste of time, but the logging, basepath, and no-move options are a good use of time. -Tod On Dec 13, 2012, at 9:55 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: Even if you simply revert your changes, the scripts are not really inconsistent in the sense that they do not use the same syntax to do different things. import-marc.sh supports a switch that import-marc-auth.sh does not, and import-marc-auth.sh supports a second parameter that import-marc.sh does not. If you want consistent interfaces between import-marc.sh and import-marc-auth.sh, then the thing to do would be: 1.) Add -p support to import-marc-auth.sh so that users can override the default PROPERTIES_FILE value 2.) Add a second parameter to import-marc.sh so that users can provide additional mappings to be appended onto the MAPPINGS_FILE list It’s really a question of whether this offers any value for anyone. It certainly wouldn’t hurt to add these things, but if nobody uses them, it’s a waste of your time. Does that make sense? - Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu>] Sent: Thursday, December 13, 2012 10:48 AM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Re. JIRA: good, will do. Re. parameter handling: aha! Thanks for seeing that. in this area, I can easily change the -p to be correct, or I could just revert. Do you have a sense of which makes more sense? For me, while I like consistency in the command-line scripts, the worst outcome would be altering the scripts in a way that would not be rolled back into the trunk. -Tod On Dec 13, 2012, at 8:01 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: Thanks for sharing this. A couple of comments: - It probably wouldn’t hurt to open a JIRA ticket for this; I don’t want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that… so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the “import properties” (which is general settings for the application) and the “marc properties” (which is the mappings for importing). The -p parameter to import-marc.sh is used to set “import properties,” but you have changed import-marc-auth.sh so that it instead sets “marc properties.” If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...<mailto:to...@uc...>> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...<mailto:to...@uc...>] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...<mailto:to...@uc...>> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li...<mailto:Vuf...@li...> >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |
From: Tod O. <to...@uc...> - 2012-12-14 16:52:38
|
Could do either, or both. Internally they will be variables anyhow, and easy to take from the environment. But probably making switches available is more accessible for many people, even though getopt is annoying. May as well do both. Have a thought on whether the switch or the environment should take precedence? -Tod On Dec 14, 2012, at 10:29 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: That makes sense to me. Do you plan to add additional switches for these things or just handle them through environment variables? - Demian ________________________________ From: Tod Olson [to...@uc...<mailto:to...@uc...>] Sent: Friday, December 14, 2012 11:25 AM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Yes, I'm coming to the conclusion that the -p is a waste of time, but the logging, basepath, and no-move options are a good use of time. -Tod On Dec 13, 2012, at 9:55 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: Even if you simply revert your changes, the scripts are not really inconsistent in the sense that they do not use the same syntax to do different things. import-marc.sh supports a switch that import-marc-auth.sh does not, and import-marc-auth.sh supports a second parameter that import-marc.sh does not. If you want consistent interfaces between import-marc.sh and import-marc-auth.sh, then the thing to do would be: 1.) Add -p support to import-marc-auth.sh so that users can override the default PROPERTIES_FILE value 2.) Add a second parameter to import-marc.sh so that users can provide additional mappings to be appended onto the MAPPINGS_FILE list It’s really a question of whether this offers any value for anyone. It certainly wouldn’t hurt to add these things, but if nobody uses them, it’s a waste of your time. Does that make sense? - Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Thursday, December 13, 2012 10:48 AM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Re. JIRA: good, will do. Re. parameter handling: aha! Thanks for seeing that. in this area, I can easily change the -p to be correct, or I could just revert. Do you have a sense of which makes more sense? For me, while I like consistency in the command-line scripts, the worst outcome would be altering the scripts in a way that would not be rolled back into the trunk. -Tod On Dec 13, 2012, at 8:01 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: Thanks for sharing this. A couple of comments: - It probably wouldn’t hurt to open a JIRA ticket for this; I don’t want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that… so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the “import properties” (which is general settings for the application) and the “marc properties” (which is the mappings for importing). The -p parameter to import-marc.sh is used to set “import properties,” but you have changed import-marc-auth.sh so that it instead sets “marc properties.” If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...<mailto:to...@uc...>> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...<mailto:to...@uc...>] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...<mailto:to...@uc...>> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li...<mailto:Vuf...@li...> >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |
From: Demian K. <dem...@vi...> - 2012-12-14 17:18:30
|
I would be inclined to let switches take precedence -- a switch is an instance of the user explicitly telling the program to do something, which seems more important than a condition of the program's environment. - Demian ________________________________ From: Tod Olson [to...@uc...] Sent: Friday, December 14, 2012 11:52 AM To: Demian Katz Cc: Tod Olson; vuf...@li... Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Could do either, or both. Internally they will be variables anyhow, and easy to take from the environment. But probably making switches available is more accessible for many people, even though getopt is annoying. May as well do both. Have a thought on whether the switch or the environment should take precedence? -Tod On Dec 14, 2012, at 10:29 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: That makes sense to me. Do you plan to add additional switches for these things or just handle them through environment variables? - Demian ________________________________ From: Tod Olson [to...@uc...<mailto:to...@uc...>] Sent: Friday, December 14, 2012 11:25 AM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Yes, I'm coming to the conclusion that the -p is a waste of time, but the logging, basepath, and no-move options are a good use of time. -Tod On Dec 13, 2012, at 9:55 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: Even if you simply revert your changes, the scripts are not really inconsistent in the sense that they do not use the same syntax to do different things. import-marc.sh supports a switch that import-marc-auth.sh does not, and import-marc-auth.sh supports a second parameter that import-marc.sh does not. If you want consistent interfaces between import-marc.sh and import-marc-auth.sh, then the thing to do would be: 1.) Add -p support to import-marc-auth.sh so that users can override the default PROPERTIES_FILE value 2.) Add a second parameter to import-marc.sh so that users can provide additional mappings to be appended onto the MAPPINGS_FILE list It’s really a question of whether this offers any value for anyone. It certainly wouldn’t hurt to add these things, but if nobody uses them, it’s a waste of your time. Does that make sense? - Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Thursday, December 13, 2012 10:48 AM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Re. JIRA: good, will do. Re. parameter handling: aha! Thanks for seeing that. in this area, I can easily change the -p to be correct, or I could just revert. Do you have a sense of which makes more sense? For me, while I like consistency in the command-line scripts, the worst outcome would be altering the scripts in a way that would not be rolled back into the trunk. -Tod On Dec 13, 2012, at 8:01 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> wrote: Thanks for sharing this. A couple of comments: - It probably wouldn’t hurt to open a JIRA ticket for this; I don’t want to commit anything until I have time to port changes to the Windows batch versions for consistency, and it may be a while before I have time for that… so having a ticket will prevent it from getting lost and forgotten. - The parameter handling in import-marc-auth.sh should probably be reverted or changed. There are two different properties files used by SolrMarc: the “import properties” (which is general settings for the application) and the “marc properties” (which is the mappings for importing). The -p parameter to import-marc.sh is used to set “import properties,” but you have changed import-marc-auth.sh so that it instead sets “marc properties.” If you want to implement -p in import-marc-auth.sh, it should actually affect the PROPERTIES_FILE variable, not the MAPPINGS_FILE variable. The optional mapping overrides should probably remain an optional second parameter for backward compatibility. thanks, Demian From: Tod Olson [mailto:to...@uc...<http://uchicago.edu/>] Sent: Wednesday, December 12, 2012 3:18 PM To: Demian Katz Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist Subject: Re: VF2.0 import scripts taking more that one file Here's a patch. Let me know if you'd prefer this as a JIRA ticket. The path does the following: - import-marc-auth.sh now takes a -p option to specify the properties file, matching import-marc.sh - batch-import-marc*.sh captures stderr to log file, stdout is not captured - import-marc.sh echoes the command to stderr, so it gets logged with the solrmarc messages - per-input-file logs by default, setting LOG_FILE sends entire run to one log file - output to log files now appends, so above is possible The handling of LOG_FILE is a bit schizophrenic, with per-file logs vs. one big log, but it allows LOG_FILE=/dev/null to send all to the bit bucket. But the switch to appending may create log maintenance issues for some sites. I'm quite open to revising this. I could also create a command-line switch for this. You mentioned that the harvest directory is hard-coded, and allowing an override would be nice. The obvious way to do that would be to allow BASEPATH in the environment to take precedent. -Tod On Dec 12, 2012, at 8:58 AM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: > I don't have a strong preference, except that I think it would be wise to avoid merging the stdout/stderr streams when generating logs -- it's probably useful to keep that granularity in the form of multiple logs if nothing else. I do think you're right that there may be some value in leaving the "Now importing" stuff as the stdout stream and capturing the rest to logs... > > - Demian > >> -----Original Message----- >> From: Tod Olson [mailto:to...@uc...] >> Sent: Tuesday, December 11, 2012 5:08 PM >> To: Demian Katz >> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >> Subject: Re: VF2.0 import scripts taking more that one file >> >> Returning to capturing output from the harvest scripts, I'd like some input on >> a minor point. >> >> Currently stdout gets informative messages like so: >> >> Now Importing /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc ... >> /usr/local/bin/java -Xms512m -Xmx512m -Dsolr.core.name=authority - >> Dsolr.indexer.properties=/data/magma/vufind2/import/marc_auth.properties,/data >> /magma/vufind2/import/marc_auth.properties -jar >> /data/magma/vufind2/import/SolrMarc.jar >> /data/magma/vufind2/local/import/import_auth.properties >> /data/magma/vufind2/local/harvest/auth/auth_full_marc_utf- >> 8_00_121206230000.mrc >> >> and all of the solrmarc messages (record number, stack traces on failure, >> etc.) go to stderr. >> >> I kind of think that the options are: >> (a) everything goes to a log file, >> (b) stdout can go to the terminal and stderr should go to the log file, or >> (c) maybe that chatty "Now importing..." goes to stdout/terminal and all else >> goes to stderr/the log. >> >> Are there any strong feelings about which is the right way? Personally, I'm >> kind of inclined towards (c), but maybe sites who are in production have a >> different view. >> >> -Tod >> >> On Oct 31, 2012, at 12:32 PM, Demian Katz <demian.katz@VILLANOVA.EDU<mailto:demian.katz@VILLANOVA.EDU>> wrote: >> >>> I don't have a problem with changing the batch MARC import scripts to >> capture stderr; I believe that when they were originally written, all SolrMarc >> output was written to stdout -- it began using stderr more appropriately in >> relatively recent updates. >>> >>> The only other refactoring you might need to do is to allow a way of >> specifying a full directory path -- right now, the scripts assume that all >> files live under VuFind's harvest directory, but in a situation not linked to >> the OAI harvester, the files might be somewhere else. You might also want to >> add a switch to disable the "move to processed directory" functionality and/or >> a switch to control logging (i.e. optionally disable by sending to null). >>> >>> - Demian >>> >>>> -----Original Message----- >>>> From: Tod Olson [mailto:to...@uc...] >>>> Sent: Wednesday, October 31, 2012 1:14 PM >>>> To: Demian Katz >>>> Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>> Subject: Re: VF2.0 import scripts taking more that one file >>>> >>>> Yes, looking at the harvest/ scripts for marc records, I see see that >> stdout >>>> is directed to an output file, but stderr is not written to disk. Since >> stderr >>>> has all of the error info, I'm inclined to capture it. I can also see where >>>> people would not want the error logs taking up disc space, since there's a >>>> message for every record. But sending all output to a file is a little more >>>> cron-friendly. >>>> >>>> I may be willing to refactor a couple of those batch scripts (no commitment >>>> yet), but I'd like a little input on what sort of requirements other sites >>>> would have. >>>> >>>> -Tod >>>> >>>> On Oct 31, 2012, at 11:22 AM, Tod Olson <to...@uc...<mailto:to...@uc...>> >>>> wrote: >>>> >>>>> Aha, I'd dismissed harvest as exclusively the province of OAI. Thanks for >>>> correcting that. >>>>> >>>>> I'll pop a patch into JIRA when I can. >>>>> >>>>> -Tod >>>>> >>>>> On Oct 30, 2012, at 10:52 PM, Demian Katz <dem...@vi...<mailto:dem...@vi...>> >>>>> wrote: >>>>> >>>>>> There are batch import scripts in the harvest directory -- you might be >>>> able to use those. If not, perhaps some refactoring can make all the >> existing >>>> tools more flexible. Also, if you add -p support to the auth script, >> please >>>> submit a patch and I'll be happy to merge that into master. >>>>>> >>>>>> thanks, >>>>>> Demian >>>>>> ________________________________________ >>>>>> From: Tod Olson [to...@uc...<mailto:to...@uc...>] >>>>>> Sent: Tuesday, October 30, 2012 8:06 PM >>>>>> To: vuf...@li...<mailto:vuf...@li...> Tech Mailinglist >>>>>> Subject: [VuFind-Tech] VF2.0 import scripts taking more that one file >>>>>> >>>>>> I find that it would be useful for my site if the import-marc.sh and >>>> import-marc-auth.sh. I could easily hack those two shell scripts to take >> some >>>> arbitrary number of files as arguments and loop over them, and submit a >> patch. >>>> Would that be of use to other sites? >>>>>> >>>>>> Otherwise, I'll just write wrappers around them for local use. >>>>>> >>>>>> The one interface change that I'd want to implement: it would be easier >> if >>>> I changed import-marc-auth.sh to take a profile file with a -p argument >> like >>>> import-marc.sh. >>>>>> >>>>>> -Tod >>>>>> >>>>>> Tod Olson <to...@uc...<mailto:to...@uc...>> >>>>>> Systems Librarian >>>>>> University of Chicago Library >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >> -- >>>> --- >>>>>> Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics >>>>>> Download AppDynamics Lite for free today: >>>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li...<mailto:Vuf...@li...> >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>> > |