Menu

#137 Indexer ignores setting for Excel documents in settings.xml

5.0.0
closed
nobody
None
bug
2015-04-17
2014-03-23
Daniel
No

This issue has been reported in Thread https://sourceforge.net/p/seeddms/discussion/general/thread/89d13e49/?limit=25

I tested this as outlined in the thread. Check the config file and it states the following:

<converter mimetype="application/vnd.ms-excel">xls2csv %s</converter>

If the indexer is run it captures PDF and Word without any issue but the following error occurs for Excel files:

sh: ssconvert: not found

The indexer is also not picking up the terms in the Excel document when it is started with the "Update Fulltext" button.

I tested for word documents, replacing the entry in there with blabla and ran the indexer again, but it still works.

After investigating i found that the settings in Lucene/IndexedDocument.php seem to override the user settings within settings.xml. After editing line 34 the Excel document was successfully indexed.

Discussion

  • Daniel

    Daniel - 2014-03-23

    Sorry, messed up something. I meant when i tested the word documents, i replaced the string for msword in there with blabla to see if the indexer fails afterwards.

     
  • Uwe Steinmann

    Uwe Steinmann - 2014-03-25
    • status: open --> pending
     
  • Uwe Steinmann

    Uwe Steinmann - 2014-03-25

    Try to replace

    $index->addDocument(new SeedDMS_Lucene_IndexedDocument($dms, $document));

    with

    $index->addDocument(new SeedDMS_Lucene_IndexedDocument($dms, $document, $settings->_converters ? $settings->_converters : null));

    in utils/indexer.php

    Uwe

     
  • Daniel

    Daniel - 2014-03-25

    OK, changed that, throws an error for every document now:

    Notice: Trying to get property of non-object in /volume1/web/seeddms/applikation/utils/indexer.php on line 61
    5:Word Document
    PHP Notice: Undefined variable: settings in /volume1/web/seeddms/applikation/utils/indexer.php on line 61

     

    Last edit: Daniel 2014-03-25
  • Uwe Steinmann

    Uwe Steinmann - 2014-03-25

    Too bad. $settings must be declared global in the tree() function. I missed that.

    global $index, $dms;

    must be

    global $index, $dms, $settings;

     
  • Daniel

    Daniel - 2014-03-25

    This does the trick, works perfectly now and captures the values from the settings. Verified that by putting some fake entries in the settings for fulltext index.

     
  • Uwe Steinmann

    Uwe Steinmann - 2014-11-17
    • status: pending --> closed
     
  • kaihowl

    kaihowl - 2015-04-13

    This problem is still present in the current 4.3.16 release.
    Both op.AddDocument.php and utils/indexer.php reference _convcmd in Settings. AFAIK, Settings does not offer this property anymore and instead converter['fulltext'] needs to be consulted. Otherwise, AddDocument will always use the defaults supplied in Lucene/IndexedDocument.php and utils/indexer.php will silently fail to convert any document's content since no document will have the mimetype "Fulltext". Correct me if I am wrong. I am totally willing to do a pull request (or whatever notion sourceforge is using). Let me know how to help.

     
  • Uwe Steinmann

    Uwe Steinmann - 2015-04-16

    Thanks for finding this. 4.3.17 will finally fix it.

     
  • kaihowl

    kaihowl - 2015-04-17

    You have noticed the merge request (https://sourceforge.net/p/seeddms/code/merge-requests/2/) for this issue? There were a few more places than I originally mentioned, that needed to be fixed for this issue.

     
  • Uwe Steinmann

    Uwe Steinmann - 2015-04-17

    Yes, I did the merge manually but took over all the differences from the merge request.

     
  • kaihowl

    kaihowl - 2015-04-17

    Very nice. Thanks for the fast response and in general for maintaining SeedDMS!

     

Log in to post a comment.

MongoDB Logo MongoDB