I wonder if anyone can offer some help with full text searching;
I have Windows 10 with Xampp and Pdftotext Installed, all latest versions as of 2 days ago. All seems well except there is no full text search content. I have updated the application/pdf to pdftotext -nopgbrk %s - and when i run a re-index it creates a text file in the directory where the PDF is located and populates the Lucene folder, however you cannot search on any of the words in the text file. Its as if the text file is created but the content is not passed on to SeedDMS.
Ive been googling in circles for 2 days now and tried lots of different resolutions but i just cant get it to work.
EDIT: Switched over to SQLiteFTS (and applied the fixes in ticket 340) as this seems to be the preferred method for the full text and still cannot search on any text, am getting an index.db file in the lucene folder that is 24k and this contains information about the documents but still no content, searched the DB and there is no content in there. When you view the full text index info there is also no content. If you manually add content to the database the full text searching works OK.
EDIT: Have added some logging to SQLiteFTS\Indexer.php and can see that when the INSERT INTO command runs there is no content being passed despite the pdftotext function running and creating the text file with the content next to the original pdf file. Do i maybe have the text content files from pdftotext in the wrong place?
EDIT: Logging shows the following when re-indexing;
$path = $dms->contentDir . $version->getPath(); = C:/xampp/htdocs/seeddms51x/data/\1048576/5/1.pdf
I wonder if anyone can offer some help with full text searching;
I have Windows 10 with Xampp and Pdftotext Installed, all latest versions as of 2 days ago. All seems well except there is no full text search content. I have updated the application/pdf to pdftotext -nopgbrk %s - and when i run a re-index it creates a text file in the directory where the PDF is located and populates the Lucene folder, however you cannot search on any of the words in the text file. Its as if the text file is created but the content is not passed on to SeedDMS.
Ive been googling in circles for 2 days now and tried lots of different resolutions but i just cant get it to work.
EDIT: Switched over to SQLiteFTS (and applied the fixes in ticket 340) as this seems to be the preferred method for the full text and still cannot search on any text, am getting an index.db file in the lucene folder that is 24k and this contains information about the documents but still no content, searched the DB and there is no content in there. When you view the full text index info there is also no content. If you manually add content to the database the full text searching works OK.
EDIT: Have added some logging to SQLiteFTS\Indexer.php and can see that when the INSERT INTO command runs there is no content being passed despite the pdftotext function running and creating the text file with the content next to the original pdf file. Do i maybe have the text content files from pdftotext in the wrong place?
EDIT: Logging shows the following when re-indexing;
$path = $dms->contentDir . $version->getPath(); = C:/xampp/htdocs/seeddms51x/data/\1048576/5/1.pdf
$mimetype = $version->getMimeType(); = application/pdf
$cmd = sprintf($convcmd[$mimetype], $path); = pdftotext -nopgbrk C:/xampp/htdocs/seeddms51x/data/\1048576/5/1.pdf
I have confirmed the text file was created successfuly.
$content = self::execWithTimeout($cmd, $timeout); = ''
SQL INSERT INTO statement also has nothing in the content field.
Last edit: Simon Bendall 2017-11-10