Menu

Complete binary content of some mp3 files is indexed

Moodolf
2015-10-11
2015-10-31
  • Moodolf

    Moodolf - 2015-10-11

    Hi

    in the preview of some of my mp3 files I see something like this:
    TIT2=title
    TYER=1985
    TCON=(17)
    TPE1=artist

    This is OK. I like that DocFetcher indexes the mp3-tags within the file.

    But it seems, that DocFetcher is saving the whole binary content of SOME mp3 files in its database.
    Then the preview looks like this:
    https://picload.org/image/paircoc/docfetcher-binary.png
    And then it finds small words like "amt" anywhere inside the binary data, not only within the mp3-tags.

    Problems:
    1. This gives me false findings.
    2. I think that this is blowing up my DocFetcher-database with some GB of garbage.
    3. If I select such file, it can take many seconds (or some minutes) until the preview is displayed. While this happens, DocFetcher does not respond to user input.

    How can I disable this indexing of binary content?
    If helpful, I could send you such mp3-files for debugging.

    Another minor problem:
    On some mp3 files the mp3-tags seem to be UTF-16 encoded (or something like that) and DocFetcher preview looks like this:
    https://picload.org/image/paircop/docfetcher-utf-16.png

    Thank you.

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2015-10-12

    Hi,

    This could be a problem with your indexing settings. More precisely, it looks like DocFetcher was configured to treat some MP3 files as text files. If you think that might be the case, please select "Rebuild index" on the relevant index or indexes and post a screenshot of the indexing dialog that opens, so I can see what your indexing settings look like.

    Best regards
    q:-) <= Quang

     
  • Moodolf

    Moodolf - 2015-10-12

    DocFetcher 1.1.14

    Index-Settings:
    https://picload.org/image/paipwil/docfetcher-reindex.png

    Program-Settings:
    https://picload.org/image/paipwii/docfetcher-settings.png

    C:\Tools\DocFetcher\conf\program-conf.txt
    (I removed all comments and resorted just for this posting)
    AllowIndexCreation = true
    AllowIndexDeletion = true
    AllowIndexRebuild = true
    AllowIndexUpdate = true
    Analyzer = 0
    AppName = DocFetcher
    ColoredTabs = true
    CurvyTabs = false
    DryRun = false
    FixWindowSizes = false
    HtmlExtensions = html;htm;xhtml;shtml;shtm;php;asp;jsp
    IgnoreJunctionsAndSymlinks = true
    IndexExcelFormulas = true
    InitialSorting = 0
    MaxLinesInProgressPanel = 1000
    MaxResultsTotal = 10000
    PatternTableHeight = 4
    ReportObsoleteIndexFiles = true
    SaveSettings = true
    SearchBoxMaxWidth = 200
    SearchHistorySize = 50
    ShowAdvancedSettingsLink = true
    SkipTarArchives = false
    TextEncodingOverride =
    UnpackCacheCapacity = 20
    WebInterfacePageSize = 50

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2015-10-12

    I don't see any problems with these settings.

    You can send the sample files to my (reversed) email address: users.sourceforge.net <- qforce@

    Please note, however, that the MP3 tag extraction is handled by an external library, so depending on what the exact cause of the problem is, there might not be much I can do about it.

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2015-10-31

    The above issues have been fixed. The bugfix will be included in next official release. A preliminary download is available here: http://docfetcher.sf.net/docfetcher-1.1.16-portable.zip

     

Log in to post a comment.