I developed a set of four related command line apps for dspace: 
    1) a lister / report generator
    2) a policy tool, that adds / removes policies 
    3) a metada tool, that adds / removes specific metadata  values 
    4) a bitstream replacer tool

All expect 4 work on a set of dspace objects specified by two command line arguments: 
    --root  ROOT      - where ROOT is a handle  or  object type follow by an ID
    --type  TYPE      - where TYPE is  one of  collection, item, bundle, or bitstream

—-root COLLECTION.10  --type BITSTREAM     
    means work on all bitstreams in collection with ID 10
--root handle/12345  --type ITEM               
    means work on all items contained in the object designated by handle 

There is an additional argument, --doWorkFlowItems, that restricts sets to items in workflows and by extension to bundles or bitstreams in items in workflows. 


The lister generates tsv or txt formatted output, printing properties of the selected set of DSpace objects. Its  --include option determines which properties are printed.  You can choose to print IDs and handles, as well as policy information, or specify select item metadata fields. You can include an items 'withdrawn' status or a bundle's embargo state. Bitstream reports may print mimeType, checksum, ...   When printing DSpace objects, you can choose to print properties of enclosing Dspace objects. For example when printing  bitstreams in a collection, you can include bundle names,  item handles, even item metadata values by using options like these:
    --include 'object,name,mimeType,BUNDLE.name,ITEM.handle,ITEM.dc.contributor,author'

The lister works nicely with the other commands, since all four commands use the same mechanism to select the objects they work on. For example you might use the lister to review which DSpace objects need policy or metadata changes. After applying changes, it comes in handy, when making sure the changes performed are in fact the ones, that were intended. 


The policy tool decides which action to apply to each DSpaceObject selected by the --root and --type parameters based on three options: 
    --action   [ADD | DEL ]      - whether to add or delete policies 
    --dspace_action  [READ | WRITE | REMOVE | ... ]       
    --who [group  | eperson]               

For example 
    dspace bulk-pols -r handle/712657 -t BITSTREAM —action ADD  —dspace_action WRITE --who EPERSON.monikam
        gives the eperson monikam WRITE priviledges  on all bitstreams contained in the object  behind the given handle, which may be a community, collection, or item.

    dspace bulk-pols -r handle/712657 -t BITSTREAM -a DEL -d READ -w GROUP.Anonymous 
        removes the READ permission from the Anonymous group 

The metadata tool works similar to the policy tool. Of cause it makes only sense to apply to item sets. 

The bitstream replacer works on single bitstreams. It is related to the other tools in that it selects the bitstream to work on in the same fashion, aka with --root and --type arguments. 

I developed these commands in connection with a project here at Princeton, where I needed to  add a cover page to all bitstreams in original bundles in a community. The lister gave me the list of bitstreams. Printing the list in txt format, allowed me to grep for name=ORIGINAL. I included the mimeType in the  listing, so I would only work on pdf documents. Including the internalId allowed me to use the file right from the assetstore and stick it into my  ‘add the cover page’ script. I  replaced the old bitstream using the IDs, printed earlier, to define the —root parameter to  the bitstream replacer.  Finally I used the lister to check on the access policies of the bitstreams.  Right now I run the lister command in a cronjob to watch the submission progress in one of our communities.

I wrote more detailed documentation which is part of the pull request that I created for this code. Here at Princeton we are still running 1.8. The bulk-do code mostly lives in its own package and should play well with version 3 (I have not tried it). The PR is based on the master.  In other words unless you run pre 1.8,  merging this into your version should be relatively painless - and it goes without saying - I'd help sort out conflicts. 

The PR is HERE and the documentation is THERE 

I believe this code would be useful for many DSpace administrators.  It would be straight forward to add a JSON/XML output format to offer this functionally in the REST API.  So please have a look, send feedback, and possibly step up as a volunteer tester / reviewer. 


Monika

— 
Monika Mevenkamp      
phone: 609-258-4161
123 693 Alexander Street, Princeton University, Princeton, NJ 08544