Import objects created by SIP Creator in FEZ

Help
a7corsair
2007-06-01
2012-10-29
  • a7corsair
    a7corsair
    2007-06-01

    Hello there,

    here is my problem! I am using SIP Creator distributed with Fedora and Directory Ingest service that is used for batch import of data in the fedora repository! Everything goes fine, and the items ara added to the repository.
    However, I cannot see them via the FEZ interface. From the Administrator menu, I click the "Index Fedora Objects into Fez" and there I have the possibility to search an item in the fedora repository and add it to FEZ's MySQL db and so I can see it via FEZ under the community and collection I have selected! The question is that if I add 1000 pics, I cannot do it for every pic. Is there a way to get all unindexed items and add them in a FEZ collection?

    Also, if I delete an item from fedora repository, and not via FEZ interface, fez is still showing the item with a broken link! In this case, I have to go to MySQL db and detele some entries in a table regarding these items. Then FEZ stops showing them! Is this the case??

    As fas as the first question is concerned, I have also tried to change the relationship datastream of each object when SIP Creator created them. I make these objects to be "isMemberOf" a FEZ colelction. But again, FEZ does not show them. I have again to "Index Fedora Objects into Fez"...

    I will be waiting for your response!

    Best regards,

    Kostas Stamatis

     
    • a7corsair
      a7corsair
      2007-06-04

      Dear Christian,

      thank you very much for your response!
      I am new in digital repositories and all these and I am trying to get as many information as it gets in order to get deep in this area!
      I have contacted a person from Elated and they told me that they are not going to maintain Elated from now on even to support fedora 2.2. Possibly, someone from the community has to volunteer to do it.
      When you say that Fez has been tested over 40.000 objects, how did you ingest all this objects? Using Fez's batch import? OK, that is a possibility. But then how did you edit the metadata for each entry. Or, you nly used the common template for the metadata! I note that, because, I thought that every entry should have different title and metadata in order for the evaulation of searching an item among 40,000 items to be sufficient! Isn't that right?

      We are still evaluating tools and repositories but I guess that Fez above Fedora is the most possible to win! Thus, I hope that in the future we will have the change to cooperate on creating that eprints-like modification for ingesting metadata!

      Again, thank you very much for your help!

      Cheers,

      Kostas Stamatis

       
      • Christiaan
        Christiaan
        2007-06-04

        Hi Kostas

        Those 40,000 objects was our 3000 eprints records ingested 14 times, plus 10,000 other records. We have written data conversion php scripts for our own internal Oracle systems which house about 30,000 records of research. These scripts convert oracle sql views into foxml object files that we transform into Fez style Fedora objects, ingest them directly into Fedora with the directory foxml batch import (not sip creator), then index them into fez after. Our production site has 30,000 objects, and soon will have another 28,000 ingested next week, so about 60,000 objects in the near term with probably around 100,000 at least in the next year or two depending on the outcomes of internal projects.

        The fez batch import will ignore the 'set template' if it finds eprints or other ingest material that is not simply images or foxml.

        Personally, If I were looking at repositories right now and wanted a 100% open source solution I would seriously evaluate:

        1) Fez + Fedora - growing community - great if you prefer PHP.
        2) ePrints 3.0 - strong community, the new 3.0 version looks like a vast improvement over 2.x. Especially if you like Perl, but with 3.0 much can be customised with xml config files I believe.
        3) DSpace - big community, complete GUI (especially with Manakin) but Fedora has much better core architecture from most accounts. Especially if you have good Java programming skills, if you want to customise more than what can be done with configuration settings.

        It really depends on if you want a simple IR, or something more flexible like Fedora. Also depends on your IT team and if you want to get your hands a bit dirty with the code sometimes. Or perhaps you want to just buy a product from a vendor and not dedicate any of your IT team time to it. Its not an easy thing to decide! We are glad we went with Fedora and Fez is really giving us the power to do exactly what we want to do and we have found great benefit from releasing our software as open source from contributions given back from the community.

        Cheers,
        Christiaan

         
    • a7corsair
      a7corsair
      2007-06-04

      Thank you Christian,

      very useful info, that was actually the idea I had for these 3 systems!

      Cheers,
      Kostas

       
    • Christiaan
      Christiaan
      2007-06-01

      Hi Kostas

      > The question is that if I add 1000 pics, I cannot do it for every pic. Is there a way to get all unindexed items and add them in a FEZ collection?

      Yes - click the 'Index All' button on the index fedora objects into fez page. I am pretty sure this is in 1.3, although if not it may only be in the latest code in our subversion trunk.

      > Also, if I delete an item from fedora repository, and not via FEZ interface, fez is still showing the item with a broken link! In this case, I have to go to MySQL db and detele some entries in a table regarding these items. Then FEZ stops showing them! Is this the case??

      Yes this is the case. It is better to delete them through the fez interface to delete the fez index for that pid AND the fedora object itself.

      > But again, FEZ does not show them. I have again to "Index Fedora Objects into Fez"...

      Yes you will need to index them in fez.

      You could always try the Fez batch importer rather than the SIP Createor in Fedora. Just point the Fez batch importer to a directory full of iamges (eg 1000 Tiffs) and it will create Fez and Fedora objects including the thumbnail and jhove perservation metadata generation. This way is preferable when using Fez.

      Cheers,
      Christiaan

       
    • a7corsair
      a7corsair
      2007-06-01

      Dear Christian, thank you for your response.

      1. I cannot find the "Index All" button that you say. I think you mean the button "Index All Pages Into Fez". But, let's say that I am importing some items with pids from changeme:1 to changeme:20. If I search for changeme:1 I get the result and I index it to Fez. The same for the others. If I search for changeme:* I get nothing. Can I use wildcards to search all unindexed items? And then I will press the button "Index All Pages Into Fez".

      2. I found the batch import of Fez, and definitely I am going to use this method. However, now I am getting another problem. I importing an image and I cannot see it (I think that was the problem in exactly previous post messages). I imported images in the morning and everything is ok. Now, I am importing the same image and I cannot see it. I run again the sanity check and I am getting the following error at the top of the page:

      Warning: exec() [function.exec]: Unable to fork [/usr/bin/php "/var/www/html/fezora/misc/run_background_process.php" 262 "/var/www/html/fezora/" > /tmp/fezbgp_262.log 2>&1 &] in /var/www/html/fezora/include/class.background_process.php on line 176

      The errors from the sanity check are the following:

      Failed: File Jhove Result = '/tmp/presmd_test.xml' This file doesn't exist, check the path and the permissions so that webserver user can read the file (the webserver must have 'rx' permission on any parent directories as well as 'r' permission on the file)

      Failed: File Check Image Convert Result = '/tmp/thumbnail_test.jpg' This file doesn't exist, check the path and the permissions so that webserver user can read the file (the webserver must have 'rx' permission on any parent directories as well as 'r' permission on the file)

      Failed: backgroundProcess Run Background Process = '262' The background process doesn't seem to have run. On windows this can be a problem with the version of apache or php - try different versions.

      Indeed, these files does not exist! What about the "fork" error at the top of the page?

      Thanks again!

      Kostas Stamatis

       
      • Christiaan
        Christiaan
        2007-06-01

        Hi Kostas

        1. Try searching for just * - that should show all (this is just the fedora findObjects API so works the same way).

        2. It looks like you have a few things to fix from your sanity check..

        a. A quick google on "unable to fork" show'd: http://www.somacon.com/p255.php
        It looks like you are running windows and perhaps IIS rather than Apache? I would suggest Apache over IIS, however following the instructions in that link (or other general googling for that error) should resolve your problem. We only test Fez in Apache, but will try to help with your IIS install (however personally I recommend Apache over IIS, even in Windows).

        b. Those jhove, imagemagick and background process errors will be due to the above forking security issue you have.

        Cheers,
        Christiaan

         
    • a7corsair
      a7corsair
      2007-06-01

      Dear Christian,

      I did some changes in my "php.ini" file and I managed to resolve the php fork error! All the errors in sanity check were resolved, actually.

      However, when I make a batch import there seems to be some problems! I have tested it many times!
      Firstly, all the images were imported but I could not see the image. I checked in fedora and there where no datastreams for the thumbnail and the preview.
      Next, I did it again, and now there were thumbnails for the 2 out of 5 images!

      Finally, the php error appeared again (what I did to fix it was to increase the memory a php application needs to run).
      If I restart my Apache, the error dissappears.

      Any clue?

      Regards,

      Kostas

      PS: I'll do more test on batch import and I will let you know for the results!

       
      • Christiaan
        Christiaan
        2007-06-02

        Hi Kostas

        Were you previously using the IIS webserver and now you are using Apache? Did you turn off the IIS service? You may have two php.ini files. One for apache php, one for PHP CLI (command line). You will need to change the settings for memory in both files.

        Can you post here the exact error message it displays?

        Cheers,
        Christiaan

         
    • a7corsair
      a7corsair
      2007-06-04

      Christian,

      I am only using Apache, from the beginning!
      Actually, I increased the memory of the Virtual Machine that hosts my fedora system and fez and also, from the "php.ini" file I increased the memory that a php application can use. And now, everything works fine! I batch imported images and I can see all of them (since all the "preview", "thumbnail", etc datastreams have been created!

      I have 4 more questions!

      1) I have noticed that batch import takes mach time. For 20 images it took about 2-3 minutes. Is this the case?

      2) I am currently testing 3 tools for fedora system (Fez, Elated and Vital). Do you have any evaluations about these 3 tools like how many institutes use each one? Or something like that? By the way, Fez is the only one that supports fedora 2.2!

      3) Let's say that I have an image to import and I have another xml for its metadata! Can I import the file along with its metadata or this is not possible. I know that I can import metadata from eprints. But what happens if I have files with their metadata seperately?

      4) I want to test Fedora and Fez with many many items! About. let us say, 1 million items. How am I import all these items to Fez??? And what about their metadata! Do you know any similar work done by other??

      I am really really looking forward to your response!

      Best regards,

      Kostas Stamatis

       
      • Christiaan
        Christiaan
        2007-06-04

        Hi Kostas

        I'm glad to learn it is working for you!

        1) Most of the time will be taken by converting the image into a web, thumbnail, and preview copy. So for each of those 20 objects it is converting your archival version images (100MB tiffs perhaps?) into 3 jpeg images for web display. ImageMagick is the tool doing this and time will increase as the filesize of the original image increases (eg a 100MB is slower than a 30MB Tiff). So 20x3/150secs = 2.5secs per image conversion. That sounds about what we expect. This is why we run batch image imports are background processes (so you don't have to wait for the page to come back - you can track in 'my fez - my processes'. You can test the performance of ImageMagick by running some command line based coversions of images (manually) to confirm they take about 2.5 secs per conversion. If you visit the ImageMagick website there may be some ways you can improve it's performance if that is unacceptable. Oh it also creates a JHOVE preservation metadata stream per image as well, and that can take up to a second sometimes depending on the input images. So really - 3x image magick + 1x jhove preservation metatdata extraction per image, plus fedora object generation, fedora object ingest and then fez indexing - so a fair bit happens per object!

        2) Here is a full report evaluation of DSpace, ePrints and Fedora (using Fez):
        https://eduforge.org/docman/view.php/131/1062/Repository%20Evaluation%20Document.pdf

        The Catalyst group who worked on this report are now active Fez software developers.

        I don't know of any reports that evaluate Elated or Vital, however I am sure VTLS could help you find some. From memory Elated was a college based summer project that is updated only really to support Fedora as it is updated, but is only really intended as an example of what you could do or as a starting block to build more - although that's just my understanding.

        I believe Vital 3.0 is now a JSP based web application (and has come a long way) however it is not open source (last I heard). You could contact the Australian ARROW project (who are the biggest Vital users last I heard) if you wanted a user based evaluation - www.arrow.edu.au, and of course VTLS themselves for sales inquiries.

        3) You would have to add a eprints-like modification for the batch importer to do this. It would not be very hard as you could use the ePrints import plugin as an example. We could assist in design planning with this (eg through campfire) as we think this would be a great feature.

        4) I have recently created a roadmap for Fez performance and scalability to make it able to handle 1 million records in its index. It will follow much the same theory as the Fedora 2.2 MPTStore and includes support for PostgreSQL for this amount of data (over MySQL). This is what I am actively working on right now so expect to see commits in our SVN trunk related to scaling and performance in the next few weeks.
        The current trunk architecture has been tested against 40,000 objects (1 million records in the main index table) and performs acceptably on mid-priced hardware - although our hardware is also changing to virtual machine hardware with much more recent linux and software versions so I expect to see performance improve for all these reasons.

        I am to put this roadmap up on the Fez Wiki soon (it is only in our internal private wiki so far).

        For fedora itself I am sure I have read it has been testing with over 1 million objects by the NSDL.

        Cheers,
        Christiaan

         
    • Christiaan
      Christiaan
      2007-06-04

      > to make it able to handle 1 million records in its index.

      This should read 1 million fedora objects in its index. This would equate to about 30million fez index rows. Each object has around 30-40 rows the fez index, however the roadmap has ways to split this up like Fedora 2.2 MPTStore handles this.