[ https://jira.duraspace.org/browse/DS-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=24083#comment-24083 ]
Mark Diggory commented on DS-893:
So, Sarah, it does sound from your use-case that what your needing are the following features:
1.) An ability to restrict access and completely hide a bitstream from view in the GUI for regular users.
2.) Be able to manage all Bitstreams, by the CollectionManager / Administrators, enforcing the above restrictions
3.) Have the ability to "classify" a Bitstream as the "original source" of the content.
An aside, it would be possible to create features in the DSpace Item Edit interface to allow you to manage/move bitstreams between bundles, however, there are "assumptions" in the coding of the system that introduce complexity if we are going to just let administrators have full access to the bundles and be able to manipulate them. For example, with both search and browse, the system "assumes" the following "relationships"
1.) Full Text Indexing : ORIGINAL/file.pdf has text extract TEXT/file.pdf.txt
2.) Thumbnail Generation : ORIGINAL/file.pdf has thumbnail THUMBNAIL/file.pdf.gif
3.) SWORDv2 Versioning : ORIGINAL/file.pdf has previous version [SOMEBUNDLENAME]/file.pdf
4.) Vireo Sources : ORIGINAL/file.pdf has source version SOURCE/file.doc
All these relationships are "assumed" in the code of DSpace and not "expressed" in an way in the structural metadata of DSpace METS packages.
FInally, in summary, in addition now to TEXT, THUMBNAIL, PREVIEW, LICENSE and SWORDv2 versioning bundles, we need to add Vireo "SOURCE" bundles as well.
This really exposes the "tension" between the theoretical and the practical here. Given we have "bundles" and need to preserve these different formats or derivatives of the uploaded resource. Do we want to propose at all to the community that these align in any way with FRBR Expressions/Manifestations?
A FRBR Expression is the set of all FRBR manifestations that are identical in "information content". A text extract of keynote presentation with significant audiovisual content doesn't seem to me to be a manifestation, nor does a thumbnail of the first page of a thesis dissertation.
In the Vireo case we see the only case of having two different files with possibly the same information content, but even then, is it the case that different file formats constitute manifestations or are they just the same manifestation expressed in different file formats (FRBR Items)?
I think we can quickly get derailed trying to address Bundles as anything but system level constructs that were not meant to be exposed to DSpace Endusers (Submitters, Curators, Administrators). Both the Vireo and SWORD usages cause concern for me.
An aside, I am currently working on how we contribute the Dryad Versioning work to DSpace 3.x, this would include a rewrite of SWORDv2 versioning that made it possible to version the entire Item, not just the included Bitstreams, which is the capability that Dryad provides. So IMO, SWORDv2 versioning can be eliminated as a concern if there is an interest in getting rid of bundles.
> suggestion for a re-implementation of Bundles in the DSpace data model
> Key: DS-893
> URL: https://jira.duraspace.org/browse/DS-893
> Project: DSpace
> Issue Type: Improvement
> Components: DSpace API
> Affects Versions: 1.8.0
> Reporter: Bill Hays
> Preliminary ideas for a new implementation of "Bundle" in the DSpace data model
> Current database model relationships:
> Item <- Item2Bundle -> Bundle <- Bundle2Bitstream -> Bitstream
> Current java object model relationships:
> Item <-> Bundle <-> Bitstream
> Proposed database model relationships (1):
> Item <- Item2Bitstream -> Bitstream(id, ..., bundlename, ...)
> or even more succinctly:
> Item <- Bitstream(id, item_id, bundlename, ...)
> In current DSpace, there is no realized benefit from the container complexity in the current model for Bundles.
> This first step in the proposal removes the Bundle table and directly associates Bitstream to Item. The concept of "bundle" is replaced by an enum field in the Bitstream that identifies a bundle type (ORIGINAL, THUMBNAIL, etc). Functionally this is very similar to what we get now: A bitstream belongs to one item and is associated with one bundle. The bundle names are not constrained, but some names are expected in various parts of the codebase.
> Proposed database model relationships (2):
> Item <- Bitstream -> MBundle(id, name, collection_id, derivative ...)
> This variation replaces the bundlename enum with a new class and database table "MBundle." Here bundles are not implemented as containers but are an associated type concept for a bitstream. With the association to a collection, bundles can be managed per collection or use a default set. Other properties of MBundle can be added to further enhance management capabilities, e.g.:
> isDerivative - identify bundles for Thumbnails and DerivativeText
> isVisible - indicate that the related bitstreams should be visible in display contexts
> isReserved - such as for very large "source" objects not for display or filtermedia
> [needs work - how complex does bundle "metadata" need to be?]
> Primary Bitstream Id: This is currently only used for the ORIGINAL bundle, so conceptually there is one per item. Note that the current model (API and database) allows for multiple ORIGINAL bundles which therefore allows multiple primary bitstream ids; however, the implementation doesn't expose this possibility.
> Possible replacement API calls, depending on the implementation:
> item.setPrimaryBitstream(Bitstream b)
> bitstream.isPrimary(Boolean b)
> Various db solutions:
> item.primaryBitstreamId - not standard database normalization but consistent with dspace practice
> item2bitstream.primaryBitstream - a boolean, standard normalization but requires some management to avoid duplicates
> mbundle.primaryBitstreamId - not standard database normalization but consistent with dspace practice
> item.primaryBitstream - a boolean, standard normalization but requires some management to avoid duplicates
> In the event that someone has used primaryBitstreamId in non-ORIGINAL bundles for special purposes, only 3 or 4 would work.
> Affected Java code: Item and Bitstream would need to be adjusted. This is fairly low-level so should not be visible to much of the API. Group authorizations would need some work (this has not been fully analyzed). Custom code that uses the API might be affected. Custom SQL such as for reporting might break, but the replacement is shorter code. Collection management of bundles types would need a new tab on the collection page (XMLUI).
> Upgrading a DSpace instance: The database can be modified with queries. No affect on assetstores.
> Simpler, more concise model which removes unused/unnecessary containership structure.
> Enhanced bitstream management with bundle properties.
> Enumeration of names instead of uncontrolled strings, preventing typos in bundle names (e.g. from ItemImport)
> Provides easy solution to making derivative bitstreams visible.
> Moving bitstreams between bundles does not require deleting and re-adding the bitstream.
> Fixes data model problem with primary bitstream and multiple bundles with the same name
> Not a backwardly compatible change. A fundamental change to the data model.
> Custom SQL code connecting using bundles will require rework.
> Bundles are categories for Bitstreams and do not need to be imlemented as containers.
> Bundles could be improved with added metadata and management features.
> The current Bundle implementation may not be a priority issue to merit the work suggested. However, the ideas
> above may be suggestive for other work, including metadata for all DSpace objects and exposing the data model
> to external systems (e.g. Fedora)
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira