Command line tool to find documents which contain strings matched by a regular expression. Supports pdf, Microsoft Office 98-2003, chm, text files. Will soon support html as well. Precompiled win32 version available
Be the first to post a text review of Document Regex Search. Rate and review a project by clicking thumbs up or thumbs down in the right column.
Version 1.1 Most command line options may be added to a configure file called $HOME/_searchinc. However still expects the search file and regular expression to be given on the command line. Use one option per line and loose the leading "/". Specify regular expressions using the command line option re, eg to search for a line containing ice cream use /re="[\\n\\r]*[^\\n\\r]*ice[^\\n\\r]*cream[^\\n\\r]*[\\n\\r]+. The regex deficient can instead search for several key words within a single line using the sl program option; eg the same search becomes /sl=ice,cream Microsoft Compiled Html files (chm) now supported. However will not be fully supported until html is (i.e. until then html tags will be output and searchable) Readable; but not copyable secure pdfs now searchable Can use either perl OR Posix extended regular expressions to search; specify which using /REtype=perl OR /REtype=ere (the latter is the default) Can force certain file extensions to be considered as text by the use of the txtSfx program option, eg if wish to search c++ language files use /txtsfx=cc,h,cpp,hdr Antiwords mapping error output has been supressed. The reading of the file still seems to work on the documents I have tested (2003 compatible files written by Microsoft Word 2007); but the errors obscure the search matches! This fix requires that the antiword library does not link against its own unix.c; but instead against the new unix_aw.c (change makefile.aw to not include unix.o in static lib and compile up unix_aw.o in CompileMingGW.bat)
Version 1.1 Most command line options may be added to a configure file called $HOME/_searchinc. However still expects the search file and regular expression to be given on the command line. Use one option per line and loose the leading "/". Specify regular expressions using the command line option re, eg to search for a line containing ice cream use /re="[\\n\\r]*[^\\n\\r]*ice[^\\n\\r]*cream[^\\n\\r]*[\\n\\r]+. The regex deficient can instead search for several key words within a single line using the sl program option; eg the same search becomes /sl=ice,cream Microsoft Compiled Html files (chm) now supported. However will not be fully supported until html is (i.e. until then html tags will be output and searchable) Readable; but not copyable secure pdfs now searchable Can use either perl OR Posix extended regular expressions to search; specify which using /REtype=perl OR /REtype=ere (the latter is the default) Can force certain file extensions to be considered as text by the use of the txtSfx program option, eg if wish to search c++ language files use /txtsfx=cc,h,cpp,hdr Antiwords mapping error output has been supressed. The reading of the file still seems to work on the documents I have tested (2003 compatible files written by Microsoft Word 2007); but the errors obscure the search matches! This fix requires that the antiword library does not link against its own unix.c; but instead against the new unix_aw.c (change makefile.aw to not include unix.o in static lib and compile up unix_aw.o in CompileMingGW.bat)
Be the first person to add a text review.
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use
Thanks for your rating!
Would you also like to write a review?
Thanks for your review!
Get credit for your review by logging in via OpenID. Click your account provider: