Ivan,

Thanks for checking in...

dspace filter-media returns with exit status 0.  The dspace log shows no errors, just entries of the form:

2013-09-23 10:37:41,012 INFO  org.dspace.search.DSIndexer @ Writing Community: 2408/104859 to Index

or:

2013-09-23 10:37:40,336 INFO  org.dspace.search.DSIndexer @ Writing Collection: 2408/55874 to Index

The output from the command line is short.  Normally, I would expect to see a log of each bitstream examined beginning with 'FILTERED' or 'SKIPPED'.  Instead I see only a few errors for .doc files (Invalid Format) followed by a couple of SKIPPED entries for bitstreams with an existing .txt file.

All the .pdf files are in the ORIGINAL bundle.  For instance:

dspace=> select * from item2bundle where item_id = 34950;
-[ RECORD 1 ]----
id        | 39982
item_id   | 34950
bundle_id | 39983
-[ RECORD 2 ]----
id        | 39983
item_id   | 34950
bundle_id | 39984

dspace=> select * from bundle where bundle_id in ( 39983, 39984 );
-[ RECORD 1 ]--------+---------
bundle_id            | 39983
name                 | LICENSE
primary_bitstream_id | 
-[ RECORD 2 ]--------+---------
bundle_id            | 39984
name                 | ORIGINAL
primary_bitstream_id | 

dspace=> select * from bundle2bitstream where bundle_id = 39984;
-[ RECORD 1 ]---+------
id              | 40042
bundle_id       | 39984
bitstream_id    | 40065
bitstream_order | 2

dspace=> select * from bitstream where bitstream_id = 40065;
-[ RECORD 1 ]-----------+------------------------------------------------
bitstream_id            | 40065
bitstream_format_id     | 3
name                    | 8175706.pdf
size_bytes              | 6587102
checksum                | 164de17195af1d0de45cd17a431fc2b9
checksum_algorithm      | MD5
description             | 
user_format_description | 
source                  | /dspace/assetstore/dspace-sr/upload/8175706.pdf
internal_id             | 104968051252620967298398595849898250327
deleted                 | f
store_number            | 0
sequence_id             | 2

This bitstream however is neither FILTERED nor SKIPPED.

This database has been recently updated from v1.42 to v3, and I suspect the problem is somewhere in the db rather than a bug in the code, but everything *looks* right to me.  I can trace the relations from the community to collection to item, but for some reason the bitstreams are simply not checked.

What do you think?
Bill


On Sun, Sep 22, 2013 at 12:35 PM, helix84 <helix84@centrum.sk> wrote:
Hi Bill, please remember to keep dspace-tech in CC.

Can you please tell me what the result of each of my suggestion was?
1) What was the errorlevel of your filter-media command?
2) Did you look at the log while it was running using "tail -f"?
3) Were all the bitstreams you expected to be filtered in the ORIGINAL
bundle? (check at least a few)


On Fri, Sep 20, 2013 at 10:09 PM, Bill Tantzen <wilee53@gmail.com> wrote:
> Hi Ivan!
>
> I've tried all these suggestions, and still, no success.
>
> There are no errors in the log, only entries of the form:
>
> 2013-09-20 15:00:24,802 INFO  org.dspace.search.DSIndexer @ Writing
> Community: 2408/36293 to Index
>
> And
>
> 2013-09-20 15:00:17,990 INFO  org.dspace.search.DSIndexer @ Writing
> Collection: 2408/35292 to Index
>
> One for each community and collection.  The bundles are ORIGINAL, nothing
> special here...
>
> The database seems OK, I am able to follow the communities to collections to
> items just fine, but no bitstreams are being filtered.
>
> I'll keep debugging on my end, but if you have any other ideas, do pass them
> my way!
> Bill
>
>
> On Thu, Sep 19, 2013 at 9:08 AM, helix84 <helix84@centrum.sk> wrote:
>>
>> Hi Bill,
>>
>> Jose's suggestion to look at the logs for errors is a good one. First
>> of all, we should determine whether the filtering failed during
>> processing some item or whether it completed with nothing else to
>> process.
>>
>> Also check the errorlevel of the command. 1 means error, 0 means success.
>>
>>
>> On Thu, Sep 19, 2013 at 3:03 PM, Bill Tantzen <wilee53@gmail.com> wrote:
>> > Still working on this media filter issue -- maybe this might point me in
>> > the
>> > right direction:  how are bitstreams selected for filtering?  Is it
>> > something like SELECT * FROM bitstream WHERE ???
>> > What is in the WHERE clause?  Or is there some other basis for
>> > selection?
>>
>> No, it's not SQL. It's a recursive call down the hierarchy, as you can
>> see in this method and the few following it: [1]
>>
>> However your WHERE suggestion got me thinking which bitstreams are
>> being processed and the answer is bitstreams in the ORIGINAL bundle.
>> So please check that your content bundles are called ORIGINAL and not
>> something else (e.g. THUMBNAIL or something custom).
>>
>> [1]
>> https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L393
>> [2]
>> https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L502
>>
>> Regards,
>> ~~helix84
>>
>> Compulsory reading: DSpace Mailing List Etiquette
>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
>



Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette