Menu

indexing text files with no file extension

Bob
2012-09-01
2021-06-08
  • Bob

    Bob - 2012-09-01

    Hello All,

    Thanks for this very useful program. I am just getting started with 1.1.3 on
    win XP.

    How can I include files which have no file extension into the index as text
    files? Thunderbird stores email in ascii text files with no extension. I can
    read and search these files with notepad or word (or any text editor), but
    because they have no file extension, docfetcher ignores the contents. I can
    change the folder name (in T'bird) to folder.txt and docfetcher will index
    them but it seems there should be a more straightforward way to do this.

    Regards and Best Wishes, Bob

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2012-09-01

    Bob,

    Sorry, at the moment DocFetcher doesn't support indexing files with no
    extension. You might have some luck though with writing a regex pattern that
    matches all those files and turns the "mime-type detection" on, which will
    cause DocFetcher to recognize them as plain text files. For more info about
    regexes, have a look at the manual subpage "Regular expressions".

    Best regards

    q:-) <= Quang

     
  • Bob

    Bob - 2012-09-02

    Quang,

    Not necessary to apologize that your very excellent program is not quite
    perfect. The real villian here is T'bird which creates files without an
    extension.

    I followed your suggestion without success. Namely, I added a line to the
    exclude files/detect mime type list, I put a single period . in the Pattern
    (regex) column, and changed the Action to "Detect mime type (slower)". But
    Docfetcher does not seem to have indexed the files.

    Regards and Best Wishes, Bob

    sorry if this message is a duplicate, the first time didn't seem to work

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2012-09-02

    The single period won't work, because it only matches filenames that are one
    character long.

    Try this:

    .*

    or this:

    [^\.]*

    The first pattern matches any filename, and the second one any filename that
    doesn't contain a period (= filename without extension).

     

    Last edit: Nam-Quang Tran 2014-03-27
  • Bob

    Bob - 2012-09-02

    The second option seems to have worked.

    Thanks for the help.

    Bob

     
  • Mike

    Mike - 2014-03-27

    Hey bob,
    can you please exactly explain the steps?

     
    • Nam-Quang Tran

      Nam-Quang Tran - 2014-03-27

      On the indexing dialog, there's a file exclusion table. Add a new exclusion rule to that table with the following values:

      Pattern: [^\.]*
      Match Against: Filename
      Action: Detect mime-type

       
  • andrew goh

    andrew goh - 2014-04-23

    cool&thanks it works! :) however a feature request,
    if docfetcher index unix plain text inbox style emails
    http://en.wikipedia.org/wiki/Mbox
    that'd be great ! :)
    mozilla thunderbird saves them in those formats (without extensions on unix, not sure about windows though)

     
    • Nam-Quang Tran

      Nam-Quang Tran - 2014-04-23

      I'm aware of MBOX, but I don't have time to work on new features at the moment.

       
  • andrew goh

    andrew goh - 2014-04-25

    no problem, thanks much for writing the app first hand :)

     
  • ardentperf

    ardentperf - 2014-08-21

    I also vote for mbox support someday; I just bumped into the same issue trying to index my mail. Thanks for posting the workaround - it worked for me to get the files indexed.

    Thanks for a great app!

     
  • Kevin Coonan, MD

    Any update to this issue? There are some well known REGEX for digging into mbox and maildir email messages (see the O'Reily Mastering Regular Expressions use these as examples, or at least did in the 2nd edition).
    This would let you pull out the header and metadata and email text, and you should be able to link the attachment to the message. Linking conversations, sifting through listserv, etc. might take a bit more.

     
    • Nam-Quang Tran

      Nam-Quang Tran - 2017-11-13

      No need for regexes, any of the readily available email parsers would suffice. The only problem is that I don't have time to integrate such a parser into DocFetcher. Note that just "adding the parser" wouldn't be enough. Some modifications in the GUI and elsewhere, as well as tweaking and testing, would also be needed.

      See also the question "Can you please add feature XY?" at the top of the DocFetcher FAQ.

       
  • Nam-Quang Tran

    Nam-Quang Tran - 2021-06-08

    In DocFetcher Pro, there is a checkbox "Index files without file extension as text files" that does as the name suggests.

     

Log in to post a comment.