Still working on this media filter issue -- maybe this might point me in the right direction:  how are bitstreams selected for filtering?  Is it something like SELECT * FROM bitstream WHERE ???
What is in the WHERE clause?  Or is there some other basis for selection?

Thanks,
Bill


On Wed, Sep 18, 2013 at 2:09 PM, Bill Tantzen <wilee53@gmail.com> wrote:
Here's a snip from my dspace.cfg:

#Names of the enabled MediaFilter or FormatFilter plugins                       
filter.plugins = \
  PDF Text Extractor, \
  PDF Thumbnail, \
  HTML Text Extractor, \
  Word Text Extractor, \
  JPEG Thumbnail, \
  Branded Preview JPEG, \
  PowerPoint Text Extractor

# [To enable Branded Preview]: remove last line above, and uncomment 2 lines be\
low                                                                             
#                        Word Text Extractor, JPEG Thumbnail, \                 
#                        Branded Preview JPEG                                   

#Assign 'human-understandable' names to each filter                             
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
  org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \
  org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \
  org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
  org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
  org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \
  org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG, \
  org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor

Specifically, I *think* the pdf filter should be enabled...  As I said, the majority of the files are .pdf...
Bill


On Wed, Sep 18, 2013 at 2:00 PM, helix84 <helix84@centrum.sk> wrote:
Hi Bill,

check your configuration to see which media filters you actually have enabled:
https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters

It's possible that you have only a mediafilter for one file type
enabled and thus it skips the majority of your files.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette