You can check that your Media Filter config isn't strange, and trying to feed Word Docs to the PDFFilter?
https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L428

That text "Invalid Word Format", I think is only thrown by the Word Filter:
https://github.com/LongsightGroup/DSpace/blob/bookreader/dspace-api/src/main/java/org/dspace/app/mediafilter/WordFilter.java#L92

You can try running your media filter on just this single item, and check the output. Also, see if anything additional gets written to dspace.log, as System.out and log.error might have different things.

________________
Peter Dietz
Longsight
www.longsight.com
peter@longsight.com
p: 740-599-5005 x809


On Wed, Aug 6, 2014 at 2:14 PM, Monika Mevenkamp <monikam@princeton.edu> wrote:
I get the an exception from filter-media, see below.

What puzzles me is that it complains about 'Invalid Word Format'
although the system lists the format as PDF and the stack shows that
DSPACE tries to parse it as PDF. I can view the file without issue. When
I download the file  and look at its stated format on the command line I
get.

> wget http://dataspace.princeton.edu/jspui/bitstream/88435/dsp01x920fw884/1/8ers.pdf

> file 8ers.pdf

8ers.pdf: PDF document, version 1.5


I am not sure what to look at next.

Monika

        Item Handle: 88435/dsp01x920fw884
        Bundle Name: ORIGINAL
        Bitstream: 3597
        Name: 8ers.pdf
        File Size: 370978
        Checksum: cc6054581d069e06cf72f2749cd7163b (MD5)
        Asset Store: 0
java.lang.IllegalArgumentException
java.lang.IllegalArgumentException
        at org.apache.fontbox.cff.CFFParser.readEntry(CFFParser.java:150)
        at org.apache.fontbox.cff.CFFParser.readDictData(CFFParser.java:117)
        at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:461)
        at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:71)
        at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:313)
        at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104)
        at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162)
        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
        at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
        at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101)
        at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:715)
        at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:537)
        at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:487)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:455)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersCollection(MediaFilterManager.java:433)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersCommunity(MediaFilterManager.java:417)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:379)
        at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:309)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
Invalid Word Format
ERROR filtering, skipping bitstream:
PDF at


--
Monika Mevenkamp
phone: 609-258-4161
123 693 Alexander Street, Princeton University, Princeton, NJ 08544


------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette