SourceForge has been redesigned. Learn more.
Close

Foxml and Greek

Help
a7corsair
2007-06-14
2012-10-29
  • a7corsair

    a7corsair - 2007-06-14

    Dear all,

    I have the following problem!
    I have created a foxml file according to the foxml schema. Inside a datastream element I include a mods xml that has greek letters. I am trying to ingest it to Fedora using the admin client but I get an error. When I remove greek everything works fine!
    Does anyone have the same experience?

    Best regards,

    Kostas Stamatis

     
    • a7corsair

      a7corsair - 2007-06-18

      Dear Matt, Christian,

      It is working now. Indeed, the produced Foxml file was not a right UTF-8 file even if it was containing Greek letters. I changes the Java code that created this Foxml file and now thw produced file is UTF-8 and I can ingest it in Fedora!

      Thank you very much for the responses!

      Best regards,

      Kostas Stamatis

       
    • Matthew Smith

      Matthew Smith - 2007-06-15

      What does the error_handler.log say at the exact time of ingest? There should be a timestamp and then an error message with the XML that fedora didn't like. it may be an encoding issue - XML must be UTF8 encoded.

       
    • a7corsair

      a7corsair - 2007-06-15

      Dear Matt,

      the xml encoding is UTF-8. When I change the greek letters everything works fine.
      The error I get is the following...


      ERROR 2007-06-15 08:55:07.156 [http-8083-Processor24] (DOValidatorSchematron) Schematron validation failed
      javax.xml.transform.TransformerException: Failure reading file:///usr/local/fedora/tomcat/temp/fedora-ingest-temp55077.xml
      at com.icl.saxon.om.Builder.build(Builder.java:267)
      at com.icl.saxon.Controller.transform(Controller.java:936)
      at fedora.server.validation.DOValidatorSchematron.validate(DOValidatorSchematron.java:104)
      at fedora.server.validation.DOValidatorSchematron.validate(DOValidatorSchematron.java:73)
      at fedora.server.validation.DOValidatorImpl.validateByRules(DOValidatorImpl.java:267)
      at fedora.server.validation.DOValidatorImpl.validate(DOValidatorImpl.java:205)
      at fedora.server.validation.DOValidatorModule.validate(DOValidatorModule.java:164)
      at fedora.server.storage.DefaultDOManager.getIngestWriter(DefaultDOManager.java:567)
      at fedora.server.management.DefaultManagement.ingestObject(DefaultManagement.java:170)
      .
      .
      .
      Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8 sequence.
      at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
      at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
      at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
      .
      .
      .
      at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
      at java.lang.Thread.run(Thread.java:619)
      ERROR 2007-06-15 08:55:07.158 [http-8083-Processor24] (DOValidatorImpl) VALIDATE: ERROR - failed Schematron rules validation.
      fedora.server.errors.ObjectValidityException: Failure reading file:///usr/local/fedora/tomcat/temp/fedora-ingest-temp55077.xml
      at fedora.server.validation.DOValidatorSchematron.validate(DOValidatorSchematron.java:109)
      at fedora.server.validation.DOValidatorSchematron.validate(DOValidatorSchematron.java:73)
      at fedora.server.validation.DOValidatorImpl.validateByRules(DOValidatorImpl.java:267)
      at fedora.server.validation.DOValidatorImpl.validate(DOValidatorImpl.java:205)
      at fedora.server.validation.DOValidatorModule.validate(DOValidatorModule.java:164)
      at fedora.server.storage.DefaultDOManager.getIngestWriter(DefaultDOManager.java:567)
      at fedora.server.management.DefaultManagement.ingestObject(DefaultManagement.java:170)
      at fedora.server.management.FedoraAPIMBindingSOAPHTTPImpl.ingest(FedoraAPIMBindingSOAPHTTPImpl.java:92)
      .
      .
      .
      and goes on!


      Is there a way to send you the foxml file to check?

      Best regards,

      Kostas Stamatis

       
      • Christiaan

        Christiaan - 2007-06-15

        Hi Kostas

        From the look of this line the greek characters you have typed in are not in UTF8 format:

        Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8

        That the xml encoding is in UTF8 doesn't me the greek characters you manually typed into the fedora admin client are utf8. PHP has a nice function for this: http://au.php.net/utf8_encode

        I am not sure what format your desktop OS / keyboard is entering into the admin client - but i don't think it is UTF8!

        Cheers,
        Christiaan

         
        • Matthew Smith

          Matthew Smith - 2007-06-15

          You should also check by doing a right click in firefox when on the metadata entry page and then choose 'page info'. Under the 'Encoding' item it should say UTF-8 or there should be Content-Type in the 'Meta' section.

          If not, you can correct it with a php.ini setting: default_charset = "utf-8" or there is some way to set it in the apache config if you just want to restrict it to a certain <Location> .

          Matt

           

Log in to post a comment.