From: Ludwig Z. <lud...@go...> - 2007-09-19 08:03:33
|
Hi Chris, Yeah, that was actually obvious... ;) I didn't find it yesterday night. We were taking a <record> out of context, with the result, that the first <xml> line was missing. May I put sth. on the todo list of vufind: More robust import scripts! :) Haystack searches in XML files are probably too context-dependent. Why not using a DOM parser from PEAR? Yes! I am getting results now (for your file)! The only problem that remains, is my incorrectly formatted MARCXML dump. Thanks, Ludwig Chris Delis schrieb: > On Tue, Sep 18, 2007 at 05:02:18PM -0500, Chris Delis wrote: >> On Tue, Sep 18, 2007 at 11:22:11PM +0200, Ludwig Zeller wrote: >>> Hi Chris, >>> >>>>> I'll attach one record that worked for me (testmarc.xml). >>> Thanks for the file. I discovered why no files were created in data/ for >>> me. The reason was, that import-solr.php:83 matches against >>> <controlfield tag="001"> and expects to have exactly two whitespaces in >>> front of that string... Just in case, anybody is wondering... ;) >>> >>> Cool, now I have the files in data/, but I am still having problems with >>> solr, also with your file. >> >> Ludwig, >> >> I am also experiencing the same errors as you. I am actually using an >> older version of import-solr.php and marcxml2solr.xsl (which works!). > > Looks like I had to comment out lines 78 - 80 in import-solr.php and > all is fine now: > > // Skip 1st 2 lines: XML Declaration and Collection tag > if ($lineCnt > 1) { > $record .= $line; > } > > becomes: > > // Skip 1st 2 lines: XML Declaration and Collection tag > //if ($lineCnt > 1) { > //$record .= $line; > //} > > My catalog.xml files have neither an "XML Declaration" nor a > "Collection tag" in them. The script was skipping 2 lines of valid > XML (<record><leader></leader>) which caused the > DOMDocument::LoadXML() method to fail on line 106. Without a valid > XML file to begin with, everything else is sure to fail. > > Chris > > > > >> I forgot to mention that. Sorry! In any case, it should still work >> with the latest version from svn (since the input file shouldn't >> matter). It looks like a bug may have been introduced here. >> >> Here are the results of running import-solr.php with an old version: >> .................................................................... >> % php import-solr.php >> Begin Import >> Format: Book >> Import Completed >> >> Imported 1 Records of 1 total in 0.41270112991333 seconds with 0 >> failures >> >> >> Here are the results with the latest version (and the script doesn't >> even finish -- it hangs): >> .................................................................... >> % php import-solr.php >> VuFind Importer >> >> This process may take a few hours depending on your collection size >> and hardware. >> >> The importer will begin in a few moments... >> >> >> Warning: DOMDocument::loadXML(): Extra content at the end of the >> document in Entity, line: 3 in >> /vufind/vufind-test/import/import-solr.php on line 106 >> Importing... 0/1 >> [----------------------------------------------------------------------] >> 0% 00:00.00 >> [ETA: 00:00.00] >> >> >> --Chris >> >> >> >> >>> I have added an echo $record; after the XSL transformation in line >>> import-solr.php:107 and for your file I get: >>> >>> Warning: DOMDocument::loadXML(): Extra content at the end of the >>> document in Entity, line: 3 in /usr/local/vufind/import/import-solr.php >>> on line 106 >>> <?xml version="1.0" encoding="utf-8"?> >>> <add xmlns:marc="http://www.loc.gov/MARC21/slim"> >>> <doc> >>> <field name="id"/> >>> <field name="collection">Catalog</field> >>> <field name="format"/> >>> <field name="language"/> >>> <field name="title"> </field> >>> </doc> >>> </add> >>> >>> The warning has been there even before the echo statement. Apparently >>> the xml records cannot be interpreted correctly as XML. But why? What is >>> the extra content? As you can see, the transformed XML is pretty empty. >>> >>> That is the file Chris gave me: >>> >>> <record xmlns="http://www.loc.gov/MARC21/slim"> >>> <leader>00831nam a2200241 a 4500</leader> >>> <controlfield tag="001">1518744</controlfield> >>> <controlfield tag="005">20020415162200.0</controlfield> >>> <controlfield tag="008">881011s1988 moua b 00110 eng >>> d</controlfield> >>> <datafield tag="020" ind1=" " ind2=" "> >>> <subfield code="a">0875273378 (pbk.)</subfield> >>> </datafield> >>> <datafield tag="035" ind1=" " ind2=" "> >>> <subfield code="a">(OCoLC)ocm18592502</subfield> >>> </datafield> >>> <datafield tag="035" ind1=" " ind2=" "> >>> <subfield code="9">AMC-5122</subfield> >>> </datafield> >>> <datafield tag="040" ind1=" " ind2=" "> >>> <subfield code="a">GEI</subfield> >>> <subfield code="c">GEI</subfield> >>> <subfield code="d">MDU</subfield> >>> <subfield code="d">IAS</subfield> >>> </datafield> >>> <datafield tag="100" ind1="1" ind2="0"> >>> <subfield code="a">Amacher, A. Loren</subfield> >>> </datafield> >>> <datafield tag="245" ind1="1" ind2="0"> >>> <subfield code="a">Pediatric head injuries :</subfield> >>> <subfield code="b">a handbook /</subfield> >>> <subfield code="c">by A. Loren Amacher.</subfield> >>> </datafield> >>> <datafield tag="260" ind1="0" ind2=" "> >>> <subfield code="a">St. Louis, MO :</subfield> >>> <subfield code="b">W.H. Green,</subfield> >>> <subfield code="c">c1988.</subfield> >>> </datafield> >>> <datafield tag="300" ind1=" " ind2=" "> >>> <subfield code="a">viii, 129 p. :</subfield> >>> <subfield code="b">ill. ;</subfield> >>> <subfield code="c">23 cm.</subfield> >>> </datafield> >>> <datafield tag="504" ind1=" " ind2=" "> >>> <subfield code="a">Includes bibliographies and index.</subfield> >>> </datafield> >>> <datafield tag="650" ind1=" " ind2="0"> >>> <subfield code="a">Children</subfield> >>> <subfield code="x">Wounds and injuries</subfield> >>> </datafield> >>> <datafield tag="650" ind1=" " ind2="0"> >>> <subfield code="a">Children</subfield> >>> <subfield code="x">Wounds and injuries</subfield> >>> <subfield code="x">Rehabilitation.</subfield> >>> </datafield> >>> <datafield tag="650" ind1=" " ind2="0"> >>> <subfield code="a">Head</subfield> >>> <subfield code="x">Wounds and injuries</subfield> >>> </datafield> >>> <datafield tag="650" ind1=" " ind2="0"> >>> <subfield code="a">Youth</subfield> >>> <subfield code="x">Wounds and injuries</subfield> >>> </datafield> >>> <datafield tag="650" ind1=" " ind2="0"> >>> <subfield code="a">Youth</subfield> >>> <subfield code="x">Wounds and injuries</subfield> >>> <subfield code="x">Rehabilitation.</subfield> >>> </datafield> >>> <datafield tag="650" ind1=" " ind2="0"> >>> <subfield code="a">Head</subfield> >>> <subfield code="x">Wounds and injuries</subfield> >>> <subfield code="x">Complications</subfield> >>> </datafield> >>> </record> >>> >>> Can somebody supply me another catalog.xml? >>> >>> Thanks for your help, >>> Ludwig >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> VuFind-General mailing list >>> VuF...@li... >>> https://lists.sourceforge.net/lists/listinfo/vufind-general > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general > |