This issue has been reported in Thread https://sourceforge.net/p/seeddms/discussion/general/thread/89d13e49/?limit=25
I tested this as outlined in the thread. Check the config file and it states the following:
<converter mimetype="application/vnd.ms-excel">xls2csv %s</converter>
If the indexer is run it captures PDF and Word without any issue but the following error occurs for Excel files:
sh: ssconvert: not found
The indexer is also not picking up the terms in the Excel document when it is started with the "Update Fulltext" button.
I tested for word documents, replacing the entry in there with blabla and ran the indexer again, but it still works.
After investigating i found that the settings in Lucene/IndexedDocument.php seem to override the user settings within settings.xml. After editing line 34 the Excel document was successfully indexed.
Sorry, messed up something. I meant when i tested the word documents, i replaced the string for msword in there with blabla to see if the indexer fails afterwards.
Try to replace
$index->addDocument(new SeedDMS_Lucene_IndexedDocument($dms, $document));
with
$index->addDocument(new SeedDMS_Lucene_IndexedDocument($dms, $document, $settings->_converters ? $settings->_converters : null));
in utils/indexer.php
Uwe
OK, changed that, throws an error for every document now:
Notice: Trying to get property of non-object in /volume1/web/seeddms/applikation/utils/indexer.php on line 61
5:Word Document
PHP Notice: Undefined variable: settings in /volume1/web/seeddms/applikation/utils/indexer.php on line 61
Last edit: Daniel 2014-03-25
Too bad. $settings must be declared global in the tree() function. I missed that.
global $index, $dms;
must be
global $index, $dms, $settings;
This does the trick, works perfectly now and captures the values from the settings. Verified that by putting some fake entries in the settings for fulltext index.
This problem is still present in the current 4.3.16 release.
Both op.AddDocument.php and utils/indexer.php reference _convcmd in Settings. AFAIK, Settings does not offer this property anymore and instead converter['fulltext'] needs to be consulted. Otherwise, AddDocument will always use the defaults supplied in Lucene/IndexedDocument.php and utils/indexer.php will silently fail to convert any document's content since no document will have the mimetype "Fulltext". Correct me if I am wrong. I am totally willing to do a pull request (or whatever notion sourceforge is using). Let me know how to help.
Thanks for finding this. 4.3.17 will finally fix it.
You have noticed the merge request (https://sourceforge.net/p/seeddms/code/merge-requests/2/) for this issue? There were a few more places than I originally mentioned, that needed to be fixed for this issue.
Yes, I did the merge manually but took over all the differences from the merge request.
Very nice. Thanks for the fast response and in general for maintaining SeedDMS!