DocFetcher / Forum / General Discussion: Complete binary content of some mp3 files is indexed

Complete binary content of some mp3 files is indexed

Forum: General Discussion

Creator: Moodolf

Created: 2015-10-11

Updated: 2015-10-31

Moodolf - 2015-10-11

Hi

in the preview of some of my mp3 files I see something like this:
TIT2=title
TYER=1985
TCON=(17)
TPE1=artist

This is OK. I like that DocFetcher indexes the mp3-tags within the file.

But it seems, that DocFetcher is saving the whole binary content of SOME mp3 files in its database.
Then the preview looks like this:
https://picload.org/image/paircoc/docfetcher-binary.png
And then it finds small words like "amt" anywhere inside the binary data, not only within the mp3-tags.

Problems:
1. This gives me false findings.
2. I think that this is blowing up my DocFetcher-database with some GB of garbage.
3. If I select such file, it can take many seconds (or some minutes) until the preview is displayed. While this happens, DocFetcher does not respond to user input.

How can I disable this indexing of binary content?
If helpful, I could send you such mp3-files for debugging.

Another minor problem:
On some mp3 files the mp3-tags seem to be UTF-16 encoded (or something like that) and DocFetcher preview looks like this:
https://picload.org/image/paircop/docfetcher-utf-16.png

Thank you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nam-Quang Tran - 2015-10-12

Hi,

This could be a problem with your indexing settings. More precisely, it looks like DocFetcher was configured to treat some MP3 files as text files. If you think that might be the case, please select "Rebuild index" on the relevant index or indexes and post a screenshot of the indexing dialog that opens, so I can see what your indexing settings look like.

Best regards
q:-) <= Quang

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Moodolf - 2015-10-12

DocFetcher 1.1.14

Index-Settings:
https://picload.org/image/paipwil/docfetcher-reindex.png

Program-Settings:
https://picload.org/image/paipwii/docfetcher-settings.png

C:\Tools\DocFetcher\conf\program-conf.txt
(I removed all comments and resorted just for this posting)
AllowIndexCreation = true
AllowIndexDeletion = true
AllowIndexRebuild = true
AllowIndexUpdate = true
Analyzer = 0
AppName = DocFetcher
ColoredTabs = true
CurvyTabs = false
DryRun = false
FixWindowSizes = false
HtmlExtensions = html;htm;xhtml;shtml;shtm;php;asp;jsp
IgnoreJunctionsAndSymlinks = true
IndexExcelFormulas = true
InitialSorting = 0
MaxLinesInProgressPanel = 1000
MaxResultsTotal = 10000
PatternTableHeight = 4
ReportObsoleteIndexFiles = true
SaveSettings = true
SearchBoxMaxWidth = 200
SearchHistorySize = 50
ShowAdvancedSettingsLink = true
SkipTarArchives = false
TextEncodingOverride =
UnpackCacheCapacity = 20
WebInterfacePageSize = 50

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nam-Quang Tran - 2015-10-12

I don't see any problems with these settings.

You can send the sample files to my (reversed) email address: users.sourceforge.net <- qforce@

Please note, however, that the MP3 tag extraction is handled by an external library, so depending on what the exact cause of the problem is, there might not be much I can do about it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nam-Quang Tran - 2015-10-31

The above issues have been fixed. The bugfix will be included in next official release. A preliminary download is available here: http://docfetcher.sf.net/docfetcher-1.1.16-portable.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.