Thanks for this very useful program. I am just getting started with 1.1.3 on
win XP.
How can I include files which have no file extension into the index as text
files? Thunderbird stores email in ascii text files with no extension. I can
read and search these files with notepad or word (or any text editor), but
because they have no file extension, docfetcher ignores the contents. I can
change the folder name (in T'bird) to folder.txt and docfetcher will index
them but it seems there should be a more straightforward way to do this.
Regards and Best Wishes, Bob
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, at the moment DocFetcher doesn't support indexing files with no
extension. You might have some luck though with writing a regex pattern that
matches all those files and turns the "mime-type detection" on, which will
cause DocFetcher to recognize them as plain text files. For more info about
regexes, have a look at the manual subpage "Regular expressions".
Best regards
q:-) <= Quang
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Not necessary to apologize that your very excellent program is not quite
perfect. The real villian here is T'bird which creates files without an
extension.
I followed your suggestion without success. Namely, I added a line to the
exclude files/detect mime type list, I put a single period . in the Pattern
(regex) column, and changed the Action to "Detect mime type (slower)". But
Docfetcher does not seem to have indexed the files.
Regards and Best Wishes, Bob
sorry if this message is a duplicate, the first time didn't seem to work
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
cool&thanks it works! :) however a feature request,
if docfetcher index unix plain text inbox style emails http://en.wikipedia.org/wiki/Mbox
that'd be great ! :)
mozilla thunderbird saves them in those formats (without extensions on unix, not sure about windows though)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I also vote for mbox support someday; I just bumped into the same issue trying to index my mail. Thanks for posting the workaround - it worked for me to get the files indexed.
Thanks for a great app!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Any update to this issue? There are some well known REGEX for digging into mbox and maildir email messages (see the O'Reily Mastering Regular Expressions use these as examples, or at least did in the 2nd edition).
This would let you pull out the header and metadata and email text, and you should be able to link the attachment to the message. Linking conversations, sifting through listserv, etc. might take a bit more.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No need for regexes, any of the readily available email parsers would suffice. The only problem is that I don't have time to integrate such a parser into DocFetcher. Note that just "adding the parser" wouldn't be enough. Some modifications in the GUI and elsewhere, as well as tweaking and testing, would also be needed.
See also the question "Can you please add feature XY?" at the top of the DocFetcher FAQ.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello All,
Thanks for this very useful program. I am just getting started with 1.1.3 on
win XP.
How can I include files which have no file extension into the index as text
files? Thunderbird stores email in ascii text files with no extension. I can
read and search these files with notepad or word (or any text editor), but
because they have no file extension, docfetcher ignores the contents. I can
change the folder name (in T'bird) to folder.txt and docfetcher will index
them but it seems there should be a more straightforward way to do this.
Regards and Best Wishes, Bob
Bob,
Sorry, at the moment DocFetcher doesn't support indexing files with no
extension. You might have some luck though with writing a regex pattern that
matches all those files and turns the "mime-type detection" on, which will
cause DocFetcher to recognize them as plain text files. For more info about
regexes, have a look at the manual subpage "Regular expressions".
Best regards
q:-) <= Quang
Quang,
Not necessary to apologize that your very excellent program is not quite
perfect. The real villian here is T'bird which creates files without an
extension.
I followed your suggestion without success. Namely, I added a line to the
exclude files/detect mime type list, I put a single period . in the Pattern
(regex) column, and changed the Action to "Detect mime type (slower)". But
Docfetcher does not seem to have indexed the files.
Regards and Best Wishes, Bob
sorry if this message is a duplicate, the first time didn't seem to work
The single period won't work, because it only matches filenames that are one
character long.
Try this:
.*
or this:
[^\.]*
The first pattern matches any filename, and the second one any filename that
doesn't contain a period (= filename without extension).
Last edit: Nam-Quang Tran 2014-03-27
The second option seems to have worked.
Thanks for the help.
Bob
Hey bob,
can you please exactly explain the steps?
On the indexing dialog, there's a file exclusion table. Add a new exclusion rule to that table with the following values:
Pattern: [^\.]*
Match Against: Filename
Action: Detect mime-type
cool&thanks it works! :) however a feature request,
if docfetcher index unix plain text inbox style emails
http://en.wikipedia.org/wiki/Mbox
that'd be great ! :)
mozilla thunderbird saves them in those formats (without extensions on unix, not sure about windows though)
I'm aware of MBOX, but I don't have time to work on new features at the moment.
no problem, thanks much for writing the app first hand :)
I also vote for mbox support someday; I just bumped into the same issue trying to index my mail. Thanks for posting the workaround - it worked for me to get the files indexed.
Thanks for a great app!
Any update to this issue? There are some well known REGEX for digging into mbox and maildir email messages (see the O'Reily Mastering Regular Expressions use these as examples, or at least did in the 2nd edition).
This would let you pull out the header and metadata and email text, and you should be able to link the attachment to the message. Linking conversations, sifting through listserv, etc. might take a bit more.
No need for regexes, any of the readily available email parsers would suffice. The only problem is that I don't have time to integrate such a parser into DocFetcher. Note that just "adding the parser" wouldn't be enough. Some modifications in the GUI and elsewhere, as well as tweaking and testing, would also be needed.
See also the question "Can you please add feature XY?" at the top of the DocFetcher FAQ.
In DocFetcher Pro, there is a checkbox "Index files without file extension as text files" that does as the name suggests.