DocumentGrep Overview
Name of the program:
Further development of my project (PDF)GrepGui lead to the new name DocumentGrep. As it is now possible to search text in PDF, DOC, DOCX, ODT, TXT, RTF and HTML files, it was time to change the name of the project.
Description:
This is a GUI for the command line tools grep, pdfgrep, pdftotext, unrtf, odt2txt, antiword,docx2txt, html2text and libreoffice. DocumentGrep search text in multiple files types. You can use regular expressions for the search (https://en.wikipedia.org/wiki/Regular_expression). This GUI and the command line tools work without indexing. Either the document is converted into text and processed by the RegExpr libary of Andrey V. Sorokin or handeled by the cli command itself (like pdfgrep).
Performance:
This GUI works well when searching in several hundreds of documents, depending on the speed of your system and the length of the documents. Libreoffice is also used to convert the document to text, but only as the last option. Libreoffice is not designed to convert it in fast way. Using libre office for this purpose will take a very long time. So if you want to serach text in doc (antiword), docx (docx2txt), odt (odt2txt) or rtf (unrtf), please install the additional packages by "sudo apt-get install grep pdfgrep pdftotext unrtf odt2txt antiword,docx2txt html2text". You can check, if the additional packed are installed by clickling in the menu on info - about. If the packages are installed Libre/Open Office won't be used for these filetypes anymore.
Results/Viewers:
It is recommended to use pdfgrep instead of pdftotext, as you can open the result of your search on the correct page, text marked, in the pdfviewer (if the pdf viewer supports this). If you use the text based search, you will only see the line number. You can choose your favorite text editor or pdf viewer in the options, all other documents will be opened with be the standard applications, set by your desktop environment.
Config Dir:
[homedir]/.config/DocumentGrep
Available Languages:
- English
- German
You can add your own language by editing the file [homedir]/.config/DocumentGrep/language.set
end of the file (please don't leave empty lines in the file).
Change all numbers at the beginning of the copied text to a new unique number and replace the values (after the'=') in your language.
If you have made your own translation please send it to stephan.stein@online.de, I will place it on the homepage, so everyone can download it.
PDF Viewers:
In order to be able to doubleclick on a search result, in order to open it with an external document viewer, a viewer have to be installed and you have enter the name of the viewer in the options (menu/options)
At the first program start the program tries to find an installed document viewer. The following viewers are checked:
Viewer Example for options (not all tested)
Okular (KDE) okular -p $PAGE $FILE
Evince (Gnome) evince -p $PAGE -l $SEARCH $FILE
Atril (MATE) atril -p $PAGE -l $SEARCH $FILE
XPDF xpdf $FILE $PAGE
Gostview gv -page=$PAGE $FILE