I have the same issue. PDF and Text work, but no docx documents. I use 6.0.13
I testet it on command line. Doc files work while docx seem not to be supported by catdoc. What is the alternative?
Last edit: Michael 2020-12-08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I fixed it. In my case I needed to add a converter with mimeType="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
and use docx2txt
this is the line I added:
<converter mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document">docx2txt %s -</converter>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi,
i'm using seeddms 5.1.19 on a centos 7.6 (php 7.1 / mariadb 10.2)
the docx document are not indexed.
i try with the docx2txt.pl but it doen't solve the problem.
extract of my settings.xml
i also have a pb with the accented characters (éèàê etc ... ) in the rtf documents
edit : i enable "Override MimeType:" in the settings
best regards
Emmanuel
Last edit: EKC 2020-11-12
Does your perl script output the converted document document to stdout? Do the other converters work?
it's ok for .pdf (and .rtf without accented characters)
how can i try the perl script in command line ?
I have the same issue. PDF and Text work, but no docx documents. I use 6.0.13
I testet it on command line. Doc files work while docx seem not to be supported by catdoc. What is the alternative?
Last edit: Michael 2020-12-08
I fixed it. In my case I needed to add a converter with mimeType="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
and use docx2txt
this is the line I added:
<converter mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document">docx2txt %s -</converter>