From: Adam B. <ab...@br...> - 2008-01-30 02:30:03
|
We're in the process of loading our data records into VUFind, but seem to be hitting a problem that I imagine others have encountered. We have a 2GB marc file we want to laod which in MarcXML is 6 GB, when we throw that against the import-solr script it dies because PHP can't address something that big. Has anyone found a solution besides just splitting it into multiple files? Thanks, adam |
From: Chris D. <ce...@ui...> - 2008-01-30 05:01:02
|
On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: > We're in the process of loading our data records into VUFind, but seem to > be hitting a problem that I imagine others have encountered. We have a > 2GB marc file we want to laod which in MarcXML is 6 GB, when we throw that > against the import-solr script it dies because PHP can't address something > that big. Has anyone found a solution besides just splitting it into > multiple files? One solution would be to use the Java importer (since it shouldn't have a problem with large files). Moreover, it reads MARC files natively (no need for yaz-marcdump or large catalog.xml files). I haven't looked at the wiki or docs or anything (I asked the author for help -- maybe now, there is an INSTALL or HOWTO you could read), but, for me, in order to use the Java importer, all I did was: 1. install ant 2. in the main directory of VuFind, compile the source code: ant compile 3. create a jar file: ant jar 4. create a "properties" file (which the Java importer uses as input), e.g., edit a file called import.properties with similar content as the following: solr.path=/vufind/solr marc.path=/path/to/input.mrc control.field=001 5. run it java -Xms1G -Xmx1G -jar ./vufind-0.8-dev.2008.01.28.10.35.00.jar import.properties (you may need to lessen the 1G to something less, if you don't have that much memory available. Maybe 512M or 256M would do.) It will create a SOLR (Lucene) index in ./solr/data BTW, the Java importer does not need a running SOLR instance. Instead, it creates the indexes itself natively (using the SOLR APIs). Once the index is created, you can configure SOLR to use it. If you don't want to go the Java importer route, you could modify the import-solr.php script to handle standard input as an argument: Instead of: // Load MARCXML File $fp = fopen('catalog.xml', 'r'); if (!$fp) { exit('Error: Cannot open Catalog.xml file for import'); } you could make it take an option argument of "-" to denote standard input: $input_file = 'catalog.xml'; if ($argc > 1) { $input_file = $argv[1]; } // Load MARCXML File if ($input_file == '-') { $fp = STDIN; } else { $fp = fopen($input_file, 'r'); if (!$fp) { exit('Error: Cannot open ' . $input_file . ' file for import'); } } Then, you can "cat" the huge catalog.xml file into the import-solr.php script: cat catalog.xml | php import-solr.php As a matter of fact, I used the second method up until I figured out (with the help of the author, Wayne Graham) how to use the Java importer (which is faster and has much greater potential). As I've mentioned earlier, the Java importer shouldn't have the problem with opening large files. --Chris P.S., I would have submitted patches for both methods, except that I have modified my versions too heavily (for personal reasons). > > Thanks, > > adam > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Chris D. <ce...@ui...> - 2008-01-30 05:04:57
|
On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: > On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: > Then, you can "cat" the huge catalog.xml file into the import-solr.php > script: > > cat catalog.xml | php import-solr.php > I forgot the "-" parameter. It should, instead, read: cat catalog.xml | php import-solr.php - --Chris |
From: Adam B. <ab...@br...> - 2008-02-01 15:06:33
|
Thanks. I've tried both, and both work. _____________________________________ Tri-Colleges Systems Coordinator Bryn Mawr | Haverford | Swarthmore 610.526.5294 -----Original Message----- From: vuf...@li... [mailto:vuf...@li...] On Behalf Of Chris Delis Sent: Wednesday, January 30, 2008 12:05 AM To: vuf...@li... Subject: Re: [VuFind-Tech] large catalog.xml file loads On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: > On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: > Then, you can "cat" the huge catalog.xml file into the import-solr.php > script: > > cat catalog.xml | php import-solr.php > I forgot the "-" parameter. It should, instead, read: cat catalog.xml | php import-solr.php - --Chris ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Vufind-tech mailing list Vuf...@li... https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Jeffrey B. <jef...@ya...> - 2008-02-05 16:16:37
Attachments:
jeffrey_barnett.vcf
|
I'm not ready to try a large load yet, my question is how to verify that an import really worked at all, or rather why when the import-solr.php script reports success, nothing shows up in the browse page. I tailed the solr request log, I see POSTs for imports and GETS for queries, no error messages, but also no results. FIND with no search terms returns a blank page. FIND for known content (e.g.) title returns "no items match your search". Where is the window to see what is actually happening on import? Adam Brin wrote: > Thanks. I've tried both, and both work. > > _____________________________________ > Tri-Colleges Systems Coordinator > Bryn Mawr | Haverford | Swarthmore > 610.526.5294 > > > -----Original Message----- > From: vuf...@li... > [mailto:vuf...@li...] On Behalf Of Chris Delis > Sent: Wednesday, January 30, 2008 12:05 AM > To: vuf...@li... > Subject: Re: [VuFind-Tech] large catalog.xml file loads > > On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: >> On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: > >> Then, you can "cat" the huge catalog.xml file into the import-solr.php >> script: >> >> cat catalog.xml | php import-solr.php >> > > > I forgot the "-" parameter. It should, instead, read: > > cat catalog.xml | php import-solr.php - > > --Chris > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Jeffrey B. <jef...@ya...> - 2008-02-05 16:53:53
Attachments:
jeffrey_barnett.vcf
|
I had been trying to maintain a "clean" 0.7 build, thinking that would be more stable, but apparently not. How can I get the java importer alone without waiting for version 0.8? Wayne Graham wrote: > I'm in the process of integrating log4j functionality to the Java > importer to to a better job at actually telling you what's going on, and > what (if anything) went wrong. Nothing in the repository yet, but its in > the pipeline... > > Wayne > > > Chris Delis wrote: >> On Tue, Feb 05, 2008 at 11:16:32AM -0500, Jeffrey Barnett wrote: >>> I'm not ready to try a large load yet, my question is how to verify >>> that an import really worked at all, or rather why when the >>> import-solr.php script reports success, nothing shows up in the >>> browse page. I tailed the solr request log, I see POSTs for imports >>> and GETS for queries, no error messages, but also no results. FIND >>> with no search terms returns a blank page. FIND for known content >>> (e.g.) title returns "no items match your search". Where is the >>> window to see what is actually happening on import? >> >> >> I wouldn't count on the php script to discern errors; I think it will >> report success on failure. More reason to use the java importer ;-) >> >> The java importer, too, could probably be a tad better in error >> reporting (but it is much much better than the php script!). >> Yesterday, I didn't come to realize an error until many hours later. >> I ran out of disk space, but the importer kept trying to add new >> records. If the SOLR APIs raise exceptions correctly -- and I haven't >> looked into this yet -- then it might be a good idea to check for >> fatal error messages during addToIndex calls. Also, I noticed that on >> success the java importer returns 1 (which is usually an error code) >> instead of a 0; these things matter much when you run a bunch of loads >> in a large script :-) >> >> BTW, I usually do a "nohup" on my scripts and run them in the >> background. Then I periodically check my nohup.out for errors. >> >> --Chris >> >> >> >>> Adam Brin wrote: >>>> Thanks. I've tried both, and both work. >>>> >>>> _____________________________________ >>>> Tri-Colleges Systems Coordinator >>>> Bryn Mawr | Haverford | Swarthmore >>>> 610.526.5294 >>>> >>>> >>>> -----Original Message----- >>>> From: vuf...@li... >>>> [mailto:vuf...@li...] On Behalf Of >>>> Chris Delis >>>> Sent: Wednesday, January 30, 2008 12:05 AM >>>> To: vuf...@li... >>>> Subject: Re: [VuFind-Tech] large catalog.xml file loads >>>> >>>> On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: >>>>> On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: >>>>> Then, you can "cat" the huge catalog.xml file into the import-solr.php >>>>> script: >>>>> >>>>> cat catalog.xml | php import-solr.php >>>>> >>>> >>>> I forgot the "-" parameter. It should, instead, read: >>>> >>>> cat catalog.xml | php import-solr.php - >>>> >>>> --Chris >>>> >>>> ------------------------------------------------------------------------- >>>> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Vufind-tech mailing list >>>> Vuf...@li... >>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Vufind-tech mailing list >>>> Vuf...@li... >>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >> >>> begin:vcard >>> fn:Jeffrey Barnett >>> n:Barnett;Jeffrey >>> org:Yale University Library;Integrated Library Technical Services >>> adr;dom:;;;New Haven;CT;06520-8240 >>> email;internet:jef...@ya... >>> title:Sr. Research Analyst >>> tel;work:(203) 432-1752 >>> x-mozilla-html:FALSE >>> version:2.1 >>> end:vcard >>> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Vufind-tech mailing list >> Vuf...@li... >> https://lists.sourceforge.net/lists/listinfo/vufind-tech >> > |
From: Wayne G. <ws...@wm...> - 2008-02-05 17:00:12
|
You can check the latest code out of the subversion repository: svn co https://vufind.svn.sourceforge.net/svnroot/vufind/trunk vufind You'll need to build the jar file with "ant jar" Wayne Jeffrey Barnett wrote: > WARNING!!! (from mailstore.wm.edu) > > The following message attachments were flagged by the antivirus scanner: > > Attachment [prolog] , scan failed: Antivirus scan engine offline. Action taken: incomplete scan > Attachment [2.1] , scan failed: Antivirus scan engine offline. Action taken: incomplete scan > Attachment [2.2] jeffrey_barnett.vcf, scan failed: Antivirus scan engine offline. Action taken: incomplete scan > > > ------------------------------------------------------------------------ > > I had been trying to maintain a "clean" 0.7 build, thinking that would > be more stable, but apparently not. How can I get the java importer > alone without waiting for version 0.8? > > Wayne Graham wrote: >> I'm in the process of integrating log4j functionality to the Java >> importer to to a better job at actually telling you what's going on, >> and what (if anything) went wrong. Nothing in the repository yet, but >> its in the pipeline... >> >> Wayne >> >> >> Chris Delis wrote: >>> On Tue, Feb 05, 2008 at 11:16:32AM -0500, Jeffrey Barnett wrote: >>>> I'm not ready to try a large load yet, my question is how to verify >>>> that an import really worked at all, or rather why when the >>>> import-solr.php script reports success, nothing shows up in the >>>> browse page. I tailed the solr request log, I see POSTs for imports >>>> and GETS for queries, no error messages, but also no results. FIND >>>> with no search terms returns a blank page. FIND for known content >>>> (e.g.) title returns "no items match your search". Where is the >>>> window to see what is actually happening on import? >>> >>> >>> I wouldn't count on the php script to discern errors; I think it will >>> report success on failure. More reason to use the java importer ;-) >>> >>> The java importer, too, could probably be a tad better in error >>> reporting (but it is much much better than the php script!). >>> Yesterday, I didn't come to realize an error until many hours later. >>> I ran out of disk space, but the importer kept trying to add new >>> records. If the SOLR APIs raise exceptions correctly -- and I haven't >>> looked into this yet -- then it might be a good idea to check for >>> fatal error messages during addToIndex calls. Also, I noticed that on >>> success the java importer returns 1 (which is usually an error code) >>> instead of a 0; these things matter much when you run a bunch of loads >>> in a large script :-) >>> >>> BTW, I usually do a "nohup" on my scripts and run them in the >>> background. Then I periodically check my nohup.out for errors. >>> >>> --Chris >>> >>> >>> >>>> Adam Brin wrote: >>>>> Thanks. I've tried both, and both work. >>>>> >>>>> _____________________________________ >>>>> Tri-Colleges Systems Coordinator >>>>> Bryn Mawr | Haverford | Swarthmore >>>>> 610.526.5294 >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: vuf...@li... >>>>> [mailto:vuf...@li...] On Behalf Of >>>>> Chris Delis >>>>> Sent: Wednesday, January 30, 2008 12:05 AM >>>>> To: vuf...@li... >>>>> Subject: Re: [VuFind-Tech] large catalog.xml file loads >>>>> >>>>> On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: >>>>>> On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: >>>>>> Then, you can "cat" the huge catalog.xml file into the >>>>>> import-solr.php >>>>>> script: >>>>>> >>>>>> cat catalog.xml | php import-solr.php >>>>>> >>>>> >>>>> I forgot the "-" parameter. It should, instead, read: >>>>> >>>>> cat catalog.xml | php import-solr.php - >>>>> >>>>> --Chris >>>>> >>>>> ------------------------------------------------------------------------- >>>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Vufind-tech mailing list >>>>> Vuf...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Vufind-tech mailing list >>>>> Vuf...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>> >>>> begin:vcard >>>> fn:Jeffrey Barnett >>>> n:Barnett;Jeffrey >>>> org:Yale University Library;Integrated Library Technical Services >>>> adr;dom:;;;New Haven;CT;06520-8240 >>>> email;internet:jef...@ya... >>>> title:Sr. Research Analyst >>>> tel;work:(203) 432-1752 >>>> x-mozilla-html:FALSE >>>> version:2.1 >>>> end:vcard >>>> >>> >>> >>> ------------------------------------------------------------------------- >>> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Vufind-tech mailing list >>> Vuf...@li... >>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>> >> -- /** * Wayne Graham * Earl Gregg Swem Library * PO Box 8794 * Williamsburg, VA 23188 * 757.221.3112 * http://swem.wm.edu/blogs/waynegraham/ */ |
From: Antonio B. <abarrera@Princeton.EDU> - 2008-02-06 20:24:35
|
I had problems with a newer version of yaz-marcdump, with the php-importer. Essentially, the dumped xml file didn't match the expected format. I didn't get deep into trying to figure out why, instead I used xml files from my prior server which did have the expected format. Both are VuFind 0.7, but the new Solaris server, uses yaz version 3.20 I believe. A quick check of the solr stats will tell you all you need to know. When I had bad xml files, Solr reported only 1 Document available: http://localhost:8080/solr/admin/stats.jsp Check that, if that comes up zero, it's likely the xml dump from Yaz. Antonio -----Original Message----- From: vuf...@li... [mailto:vuf...@li...] On Behalf Of Jeffrey Barnett Sent: Tuesday, February 05, 2008 11:54 AM To: Wayne Graham Cc: vuf...@li... Subject: Re: [VuFind-Tech] large catalog.xml file loads I had been trying to maintain a "clean" 0.7 build, thinking that would be more stable, but apparently not. How can I get the java importer alone without waiting for version 0.8? Wayne Graham wrote: > I'm in the process of integrating log4j functionality to the Java > importer to to a better job at actually telling you what's going on, > and what (if anything) went wrong. Nothing in the repository yet, but > its in the pipeline... > > Wayne > > > Chris Delis wrote: >> On Tue, Feb 05, 2008 at 11:16:32AM -0500, Jeffrey Barnett wrote: >>> I'm not ready to try a large load yet, my question is how to verify >>> that an import really worked at all, or rather why when the >>> import-solr.php script reports success, nothing shows up in the >>> browse page. I tailed the solr request log, I see POSTs for imports >>> and GETS for queries, no error messages, but also no results. FIND >>> with no search terms returns a blank page. FIND for known content >>> (e.g.) title returns "no items match your search". Where is the >>> window to see what is actually happening on import? >> >> >> I wouldn't count on the php script to discern errors; I think it will >> report success on failure. More reason to use the java importer ;-) >> >> The java importer, too, could probably be a tad better in error >> reporting (but it is much much better than the php script!). >> Yesterday, I didn't come to realize an error until many hours later. >> I ran out of disk space, but the importer kept trying to add new >> records. If the SOLR APIs raise exceptions correctly -- and I >> haven't looked into this yet -- then it might be a good idea to check >> for fatal error messages during addToIndex calls. Also, I noticed >> that on success the java importer returns 1 (which is usually an >> error code) instead of a 0; these things matter much when you run a >> bunch of loads in a large script :-) >> >> BTW, I usually do a "nohup" on my scripts and run them in the >> background. Then I periodically check my nohup.out for errors. >> >> --Chris >> >> >> >>> Adam Brin wrote: >>>> Thanks. I've tried both, and both work. >>>> >>>> _____________________________________ >>>> Tri-Colleges Systems Coordinator >>>> Bryn Mawr | Haverford | Swarthmore >>>> 610.526.5294 >>>> >>>> >>>> -----Original Message----- >>>> From: vuf...@li... >>>> [mailto:vuf...@li...] On Behalf Of >>>> Chris Delis >>>> Sent: Wednesday, January 30, 2008 12:05 AM >>>> To: vuf...@li... >>>> Subject: Re: [VuFind-Tech] large catalog.xml file loads >>>> >>>> On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: >>>>> On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: >>>>> Then, you can "cat" the huge catalog.xml file into the >>>>> import-solr.php >>>>> script: >>>>> >>>>> cat catalog.xml | php import-solr.php >>>>> >>>> >>>> I forgot the "-" parameter. It should, instead, read: >>>> >>>> cat catalog.xml | php import-solr.php - >>>> >>>> --Chris >>>> >>>> ------------------------------------------------------------------- >>>> ------ >>>> >>>> This SF.net email is sponsored by: Microsoft Defy all challenges. >>>> Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Vufind-tech mailing list >>>> Vuf...@li... >>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>> >>>> >>>> ------------------------------------------------------------------- >>>> ------ >>>> >>>> This SF.net email is sponsored by: Microsoft Defy all challenges. >>>> Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Vufind-tech mailing list >>>> Vuf...@li... >>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >> >>> begin:vcard >>> fn:Jeffrey Barnett >>> n:Barnett;Jeffrey >>> org:Yale University Library;Integrated Library Technical Services >>> adr;dom:;;;New Haven;CT;06520-8240 >>> email;internet:jef...@ya... >>> title:Sr. Research Analyst >>> tel;work:(203) 432-1752 >>> x-mozilla-html:FALSE >>> version:2.1 >>> end:vcard >>> >> >> >> --------------------------------------------------------------------- >> ---- This SF.net email is sponsored by: Microsoft Defy all >> challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Vufind-tech mailing list >> Vuf...@li... >> https://lists.sourceforge.net/lists/listinfo/vufind-tech >> > |
From: Chris D. <ce...@ui...> - 2008-02-05 16:27:26
|
On Tue, Feb 05, 2008 at 11:16:32AM -0500, Jeffrey Barnett wrote: > I'm not ready to try a large load yet, my question is how to verify that > an import really worked at all, or rather why when the import-solr.php > script reports success, nothing shows up in the browse page. I tailed > the solr request log, I see POSTs for imports and GETS for queries, no > error messages, but also no results. FIND with no search terms returns > a blank page. FIND for known content (e.g.) title returns "no items > match your search". Where is the window to see what is actually > happening on import? I wouldn't count on the php script to discern errors; I think it will report success on failure. More reason to use the java importer ;-) The java importer, too, could probably be a tad better in error reporting (but it is much much better than the php script!). Yesterday, I didn't come to realize an error until many hours later. I ran out of disk space, but the importer kept trying to add new records. If the SOLR APIs raise exceptions correctly -- and I haven't looked into this yet -- then it might be a good idea to check for fatal error messages during addToIndex calls. Also, I noticed that on success the java importer returns 1 (which is usually an error code) instead of a 0; these things matter much when you run a bunch of loads in a large script :-) BTW, I usually do a "nohup" on my scripts and run them in the background. Then I periodically check my nohup.out for errors. --Chris > > Adam Brin wrote: > >Thanks. I've tried both, and both work. > > > >_____________________________________ > >Tri-Colleges Systems Coordinator > >Bryn Mawr | Haverford | Swarthmore > >610.526.5294 > > > > > >-----Original Message----- > >From: vuf...@li... > >[mailto:vuf...@li...] On Behalf Of Chris Delis > >Sent: Wednesday, January 30, 2008 12:05 AM > >To: vuf...@li... > >Subject: Re: [VuFind-Tech] large catalog.xml file loads > > > >On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: > >>On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: > > > >>Then, you can "cat" the huge catalog.xml file into the import-solr.php > >>script: > >> > >>cat catalog.xml | php import-solr.php > >> > > > > > >I forgot the "-" parameter. It should, instead, read: > > > >cat catalog.xml | php import-solr.php - > > > >--Chris > > > >------------------------------------------------------------------------- > >This SF.net email is sponsored by: Microsoft > >Defy all challenges. Microsoft(R) Visual Studio 2008. > >http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >_______________________________________________ > >Vufind-tech mailing list > >Vuf...@li... > >https://lists.sourceforge.net/lists/listinfo/vufind-tech > > > > > >------------------------------------------------------------------------- > >This SF.net email is sponsored by: Microsoft > >Defy all challenges. Microsoft(R) Visual Studio 2008. > >http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >_______________________________________________ > >Vufind-tech mailing list > >Vuf...@li... > >https://lists.sourceforge.net/lists/listinfo/vufind-tech > begin:vcard > fn:Jeffrey Barnett > n:Barnett;Jeffrey > org:Yale University Library;Integrated Library Technical Services > adr;dom:;;;New Haven;CT;06520-8240 > email;internet:jef...@ya... > title:Sr. Research Analyst > tel;work:(203) 432-1752 > x-mozilla-html:FALSE > version:2.1 > end:vcard > |
From: Wayne G. <ws...@wm...> - 2008-02-05 16:31:49
|
I'm in the process of integrating log4j functionality to the Java importer to to a better job at actually telling you what's going on, and what (if anything) went wrong. Nothing in the repository yet, but its in the pipeline... Wayne Chris Delis wrote: > On Tue, Feb 05, 2008 at 11:16:32AM -0500, Jeffrey Barnett wrote: >> I'm not ready to try a large load yet, my question is how to verify that >> an import really worked at all, or rather why when the import-solr.php >> script reports success, nothing shows up in the browse page. I tailed >> the solr request log, I see POSTs for imports and GETS for queries, no >> error messages, but also no results. FIND with no search terms returns >> a blank page. FIND for known content (e.g.) title returns "no items >> match your search". Where is the window to see what is actually >> happening on import? > > > I wouldn't count on the php script to discern errors; I think it will > report success on failure. More reason to use the java importer ;-) > > The java importer, too, could probably be a tad better in error > reporting (but it is much much better than the php script!). > Yesterday, I didn't come to realize an error until many hours later. > I ran out of disk space, but the importer kept trying to add new > records. If the SOLR APIs raise exceptions correctly -- and I haven't > looked into this yet -- then it might be a good idea to check for > fatal error messages during addToIndex calls. Also, I noticed that on > success the java importer returns 1 (which is usually an error code) > instead of a 0; these things matter much when you run a bunch of loads > in a large script :-) > > BTW, I usually do a "nohup" on my scripts and run them in the > background. Then I periodically check my nohup.out for errors. > > --Chris > > > >> Adam Brin wrote: >>> Thanks. I've tried both, and both work. >>> >>> _____________________________________ >>> Tri-Colleges Systems Coordinator >>> Bryn Mawr | Haverford | Swarthmore >>> 610.526.5294 >>> >>> >>> -----Original Message----- >>> From: vuf...@li... >>> [mailto:vuf...@li...] On Behalf Of Chris Delis >>> Sent: Wednesday, January 30, 2008 12:05 AM >>> To: vuf...@li... >>> Subject: Re: [VuFind-Tech] large catalog.xml file loads >>> >>> On Tue, Jan 29, 2008 at 11:01:00PM -0600, Chris Delis wrote: >>>> On Tue, Jan 29, 2008 at 09:29:49PM -0500, Adam Brin wrote: >>>> Then, you can "cat" the huge catalog.xml file into the import-solr.php >>>> script: >>>> >>>> cat catalog.xml | php import-solr.php >>>> >>> >>> I forgot the "-" parameter. It should, instead, read: >>> >>> cat catalog.xml | php import-solr.php - >>> >>> --Chris >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Vufind-tech mailing list >>> Vuf...@li... >>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Vufind-tech mailing list >>> Vuf...@li... >>> https://lists.sourceforge.net/lists/listinfo/vufind-tech > >> begin:vcard >> fn:Jeffrey Barnett >> n:Barnett;Jeffrey >> org:Yale University Library;Integrated Library Technical Services >> adr;dom:;;;New Haven;CT;06520-8240 >> email;internet:jef...@ya... >> title:Sr. Research Analyst >> tel;work:(203) 432-1752 >> x-mozilla-html:FALSE >> version:2.1 >> end:vcard >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech > -- /** * Wayne Graham * Earl Gregg Swem Library * PO Box 8794 * Williamsburg, VA 23188 * 757.221.3112 * http://swem.wm.edu/blogs/waynegraham/ */ |
From: Jon G. <jon...@gm...> - 2008-02-05 16:36:56
|
First, the vufind.sh should be outputting solr results, telling you when a document is added. Second, try doing a query for [* TO *] directly to solr. http://localhost:8080/solr/select/?q=[* TO *] (urlencode that, although your browser should do that automatically if you just copy and paste it I believe. Otherwise go to http://localhost:8983/solr/admin/form.jsp. You should get results. It may be that certain records are being rejected. Again, I wouldn't necessarily trust the output of the import script. Keep an eye on vufind.sh instead. Of course, the actual VUfind folks might have better advice. Jon Gorman On Feb 5, 2008 10:16 AM, Jeffrey Barnett <jef...@ya...> wrote: > I'm not ready to try a large load yet, my question is how to verify that > an import really worked at all, or rather why when the import-solr.php > script reports success, nothing shows up in the browse page. I tailed > the solr request log, I see POSTs for imports and GETS for queries, no > error messages, but also no results. FIND with no search terms returns > a blank page. FIND for known content (e.g.) title returns "no items > match your search". Where is the window to see what is actually > happening on import? |
From: Wayne G. <ws...@wm...> - 2008-02-05 16:55:31
|
If you really want to see what's going behind the scenes, start the Solr instance (vufind.sh) with "run" (sh vufind.sh run). This will dump everything that's going on with the Solr instance to the screen. Would folks like some type of report to be generated after indexing id finished? If so, what would you like on it? It probably won't be ready for the 1.0 release, but could get in shortly thereafter... Wayne Jon Gorman wrote: > First, the vufind.sh should be outputting solr results, telling you > when a document is added. Second, try doing a query for [* TO *] > directly to solr. > > http://localhost:8080/solr/select/?q=[* TO *] > > (urlencode that, although your browser should do that automatically if > you just copy and paste it I believe. > > Otherwise go to http://localhost:8983/solr/admin/form.jsp. > > You should get results. It may be that certain records are being > rejected. Again, I wouldn't > necessarily trust the output of the import script. Keep an eye on > vufind.sh instead. > > Of course, the actual VUfind folks might have better advice. > > Jon Gorman > > > On Feb 5, 2008 10:16 AM, Jeffrey Barnett <jef...@ya...> wrote: >> I'm not ready to try a large load yet, my question is how to verify that >> an import really worked at all, or rather why when the import-solr.php >> script reports success, nothing shows up in the browse page. I tailed >> the solr request log, I see POSTs for imports and GETS for queries, no >> error messages, but also no results. FIND with no search terms returns >> a blank page. FIND for known content (e.g.) title returns "no items >> match your search". Where is the window to see what is actually >> happening on import? > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech > -- /** * Wayne Graham * Earl Gregg Swem Library * PO Box 8794 * Williamsburg, VA 23188 * 757.221.3112 * http://swem.wm.edu/blogs/waynegraham/ */ |
From: Andrew N. <and...@vi...> - 2008-01-30 15:04:06
|
> One solution would be to use the Java importer (since it shouldn't > have a problem with large files). Moreover, it reads MARC files > natively (no need for yaz-marcdump or large catalog.xml files). Adam - I agree that you should use the java importer. With the next releas= e the official import method will be through the java importer. We are als= o refactoring vufind a bit to keep the full marc record inside Solr so that= vufind no longer creates local marcxml files for each record. The code fo= r the 0.8 release is not 100% complete yet but we are getting closer. Feel= free to try out the code as is now from the trunk - I'd appreciate the fee= dback :) Andrew |
From: Adam B. <ab...@br...> - 2008-01-30 16:11:19
|
I'll try .8 (I had .7 before). Is there a quick/easy way to delete everything from the catalog. I sent an XML doc [1] to SOLR and it didn't seem to take. [1] <delete><query>*:*</query></delete>\n<optimize/> Thanks, adam _____________________________________ Tri-Colleges Systems Coordinator Bryn Mawr | Haverford | Swarthmore 610.526.5294 -----Original Message----- From: vuf...@li... [mailto:vuf...@li...] On Behalf Of Andrew Nagy Sent: Wednesday, January 30, 2008 10:04 AM To: Chris Delis; vuf...@li... Subject: Re: [VuFind-Tech] large catalog.xml file loads > One solution would be to use the Java importer (since it shouldn't > have a problem with large files). Moreover, it reads MARC files > natively (no need for yaz-marcdump or large catalog.xml files). Adam - I agree that you should use the java importer. With the next release the official import method will be through the java importer. We are also refactoring vufind a bit to keep the full marc record inside Solr so that vufind no longer creates local marcxml files for each record. The code for the 0.8 release is not 100% complete yet but we are getting closer. Feel free to try out the code as is now from the trunk - I'd appreciate the feedback :) Andrew ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Vufind-tech mailing list Vuf...@li... https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Andrew N. <and...@vi...> - 2008-01-30 16:23:34
|
> I'll try .8 (I had .7 before). 0.8 is not out yet, but you can get the latest from SVN. > Is there a quick/easy way to delete > everything from the catalog. I sent an XML doc [1] to SOLR and it > didn't > seem to take. To clean out the index, you can simply delete everything in the vufind/solr= /data directory. Make sure to stop solr first. Andrew |
From: Jeffrey B. <jef...@ya...> - 2008-02-05 18:36:44
Attachments:
jeffrey_barnett.vcf
|
Thanks to all who answered! This being the most specific response, I'll continue the thread here: The response to the direct solr query was: <response> − <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">7</int> </lst> <result name="response" numFound="0" start="0"/> </response> and the same for the solradmin. So now we know where the records are *not*. How would you suggest finding the place/reason for the disappearing records? Jon Gorman wrote: > First, the vufind.sh should be outputting solr results, telling you > when a document is added. Second, try doing a query for [* TO *] > directly to solr. > > http://localhost:8080/solr/select/?q=[* TO *] > > (urlencode that, although your browser should do that automatically if > you just copy and paste it I believe. > > Otherwise go to http://localhost:8983/solr/admin/form.jsp. > > You should get results. It may be that certain records are being > rejected. Again, I wouldn't > necessarily trust the output of the import script. Keep an eye on > vufind.sh instead. > > Of course, the actual VUfind folks might have better advice. > > Jon Gorman > > > On Feb 5, 2008 10:16 AM, Jeffrey Barnett <jef...@ya...> wrote: >> I'm not ready to try a large load yet, my question is how to verify that >> an import really worked at all, or rather why when the import-solr.php >> script reports success, nothing shows up in the browse page. I tailed >> the solr request log, I see POSTs for imports and GETS for queries, no >> error messages, but also no results. FIND with no search terms returns >> a blank page. FIND for known content (e.g.) title returns "no items >> match your search". Where is the window to see what is actually >> happening on import? > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Jon G. <jon...@gm...> - 2008-02-05 19:03:28
|
I'm working on VUfind as a personal project so my code is on my home computer and I don't have access to it right now. So some of the following is off the top of my head. They're probably not "disappearing" but never being added to the index. First, if I remember correctly there's a bug where the first record is not being indexed properly if you follow the readme as it stands in the vufind distribution. Essentially, it's assuming a catalog element, which yaz does not produce in the MARCXML output. But all the files after that should be ok. I'd also share the process you use to create your marcxml file and some of the records from it with the list. Maybe someone can see something. I suspect that vufind may not be working always with namespaces as it should. I need to do some actual testing to be positive of that though ;). I wish I had taken better notes, but here are some things I'd do. Watch the output from vufind.sh and use a small catalog.xml sample size at first to make it easy. (Say five records). Look for errors. If you're desperate, comment out some of the php-specific parts of marcxml2solr.xsl and run a transformation manually. Then try to manually upload the created file to solr. I remember doing this once or twice, but I don't have specific instructions I can give here besides the following rough ones. Comment out php-specific functions in the file. Convert the template so that there's a template that matches on the record level. Use your favorite xsl engine like saxon or xsltproc to transform the marcxml file into a solr file. Upload the solr fille manually to solr and see what happens. (If I get time in the next day or two I'll dig around and see if I still have files from when I did this before and upload it. It should left you at least test it without using the php scrpt). Jon Gorman On Feb 5, 2008 12:36 PM, Jeffrey Barnett <jef...@ya...> wrote: > Thanks to all who answered! This being the most specific response, I'll > continue the thread here: > The response to the direct solr query was: > <response> > − > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">7</int> > </lst> > <result name="response" numFound="0" start="0"/> > </response> > and the same for the solradmin. So now we know where the records are > *not*. How would you suggest finding the place/reason for the > disappearing records? > > > Jon Gorman wrote: > > First, the vufind.sh should be outputting solr results, telling you > > when a document is added. Second, try doing a query for [* TO *] > > directly to solr. > > > > http://localhost:8080/solr/select/?q=[* TO *] > > > > (urlencode that, although your browser should do that automatically if > > you just copy and paste it I believe. > > > > Otherwise go to http://localhost:8983/solr/admin/form.jsp. > > > > You should get results. It may be that certain records are being > > rejected. Again, I wouldn't > > necessarily trust the output of the import script. Keep an eye on > > vufind.sh instead. > > > > Of course, the actual VUfind folks might have better advice. > > > > Jon Gorman > > > > > > On Feb 5, 2008 10:16 AM, Jeffrey Barnett <jef...@ya...> wrote: > >> I'm not ready to try a large load yet, my question is how to verify that > >> an import really worked at all, or rather why when the import-solr.php > >> script reports success, nothing shows up in the browse page. I tailed > >> the solr request log, I see POSTs for imports and GETS for queries, no > >> error messages, but also no results. FIND with no search terms returns > >> a blank page. FIND for known content (e.g.) title returns "no items > >> match your search". Where is the window to see what is actually > >> happening on import? > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Vufind-tech mailing list > > Vuf...@li... > > https://lists.sourceforge.net/lists/listinfo/vufind-tech > |