Hi Beth, Shane and John,
I think your discussion is a very interesting repository policy one.
My view on this is to regard PDF(A) as a production container format.
E.g.use it as a consumable just like flash, wmf, MP3 etc. Don't use it as
your standard archive format unless it's (by policy) defined as the archive
format. This is happening with pdf for some public governments here in the
Netherlands. It is fine if your retention policy is 5 years and you don't
need to re-use the content in another context.
If you however need to preserve data for a long time/forever best is to
store them in the original (uncompressed) single bitstream formats (which
are preferably captured at the source they were created). These are your
master bistreams that can always be used to recreate containers anyway you
like in the future. Depending on the importance of the presentation format
you need to add (XML) metadata. If the image quality is important you need
to think about resolution, raw vs. processed and standards like TIFF. For
video it gets even more complex. Most important for this is being able to
use converters for future presentation, without big quality losses. I would
definitely advice to work together with broadcast experts on this. Let them
help you with the preservation and migration/conversions.
The actual storage and management of bitstreams (E.g. asset store design) is
very important for DSpace. There is a general storage
abstraction/virtualization in the IT industry taking place at this moment.
As a result we don't need to worry about file or directory structures
anymore. We assign Unique Identifiers to objects and describe their
metadata. The question is whether this object should be a single bitstream
or a container Archival Information Package.(Based on which standard?) I
suspect that the AIP discussion might end up with the same problems
described by Shane about PDF/A's. My feeling is that the most important
thing is to not carve things in stone and be very flexible and open to
changes. This probably means managing relations and metadata like DSpace
does at a higher collection and item levels and make sure that the
persistence at the more technical bitstream level is guaranteed by storage
management systems. These systems need to be regularly checked for
I wonder if these more policy related issues are something to put into a
DSpace Wiki part?
----- Original Message -----
From: "John Murtagh" <John.Murtagh@...>
To: <ssadler@...>; <dspace-general@...>
Sent: Monday, February 11, 2008 10:25 AM
Subject: Re: [Dspace-general] Dspace-general Digest, Vol 55, Issue 8
> Hi Beth
> Here at Brunel University we get our submitters to use Open Office, an
> open source application that converts Word to PDF.
> Due to expensive licencing Adobe Professional is only used by myself for
> Word conversion, but I've not seen much difference in the quality of the
> PDF produced. In fact I have only ever seen the Adobe converter get
> bogged down with large files.
> John Murtagh (on behalf of) BURA-manager@...
> Brunel University Research Archive (BURA)
> Brunel Library
> Kingston Road
> UB8 3PH
> Tel: 0189 526 5417
> Fax: 01895269741
> E-mail: BURA-manager@...
> Website: http://bura.brunel.ac.uk
> -----Original Message-----
> From: ssadler@... [mailto:ssadler@...]
> Sent: 09 February 2008 15:32
> To: dspace-general@...
> Subject: Re: [Dspace-general] Dspace-general Digest, Vol 55, Issue 8
> Hi Beth,
> We are using PDF/A and these features for our theses and other PDF work
> in our repository.
> Hope this helps, feel free to e-mail follow-up questions.
> Shawna Sadler
> Coordinator, Digital Initiaves
> University of Calgary
> (403) 229-2477
>> Send Dspace-general mailing list submissions to
>> To subscribe or unsubscribe via the World Wide Web, visit
>> or, via email, send a message with subject or body 'help' to
>> You can reach the person managing the list at
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Dspace-general digest..."
>> Today's Topics:
>> 1. PDF/A (Beth Black)
>> 2. Re: PDF/A (Shane Beers)
>> Message: 1
>> Date: Fri, 08 Feb 2008 06:47:56 -0500
>> From: Beth Black <black.367@...>
>> Subject: [Dspace-general] PDF/A
>> To: "dspace-general@..." <dspace-general@...>
>> Message-ID: <59c3956eaf.56eaf59c39@...>
>> Content-Type: text/plain; charset=us-ascii
>> We are investigating using PDF/A in our repository and wonder what
>> others are doing. Do you recommend or require PDF/A? If you require,
>> do you convert for submitters in some way?
>> Any other thoughts on PDF/A?
>> Beth Black
>> Systems Librarian and Assistant Professor University Libraries 610
>> Ackerman Road, Room 5855 Columbus, Ohio 43202
>> 614-688-5428 phone
>> 614-292-7859 fax
>> Message: 2
>> Date: Fri, 08 Feb 2008 09:27:59 -0500
>> From: Shane Beers <sbeers@...>
>> Subject: Re: [Dspace-general] PDF/A
>> To: Beth Black <black.367@...>
>> Cc: "dspace-general@..." <dspace-general@...>
>> Message-ID: <BBA92DA4-9B39-47C3-A3F6-937D04F626F7@...>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>> When I first arrived here, I was interested in attempting to
>> standardize the use of PDF/A, especially for our ETD collections where
>> the submissions were far more controllable.
>> The major hurdle is that all content of the PDF/A must be embedded
>> into the document itself - this can become an impossibility if the
>> person creating the PDF is someone who does not know how to correctly
>> embed everything. You would be surprised (or maybe you wouldn't...)
>> how fairly simple technology can confuse even PhD students.
>> That also creates a limitation of the content of the PDF/A's - as of
>> now they can not contain things like multimedia objects. This may not
>> be a real concern of yours, but it's of note.
>> Additionally, the conversion from regular PDFs to PDF/A is often
>> impossible, as the information in the PDF was not embedded in the
>> first place, and you no longer have things like the original font and
>> so on.
>> Know that this is only my cursory findings, and not gleaned from a
>> deep investigation on my part. You may wish to investigate it
>> yourself. Essentially what I'm saying is - it will probably work quite
>> well in an extremely structured workflow. Outside of that, you may
>> have many issues to overcome!
>> Shane Beers
>> Digital Repository Services Librarian
>> George Mason University
>> On Feb 8, 2008, at 6:47 AM, Beth Black wrote:
>>> We are investigating using PDF/A in our repository and wonder what
>>> others are doing. Do you recommend or require PDF/A? If you
>>> require, do you convert for submitters in some way?
>>> Any other thoughts on PDF/A?
>>> Beth Black
>>> Systems Librarian and Assistant Professor University Libraries 610
>>> Ackerman Road, Room 5855 Columbus, Ohio 43202
>>> 614-688-5428 phone
>>> 614-292-7859 fax
>>> Dspace-general mailing list
>> Dspace-general mailing list
>> End of Dspace-general Digest, Vol 55, Issue 8
> Dspace-general mailing list