On Wed, Dec 17, 2008 at 10:04 PM, John Mark Ockerbloom <ockerblo@pobox.upenn.edu> wrote:
Andrew Marlow wrote:
Your web site is great. Yes, the subject hierachy you have is the sort of thing I am after.  I did not see the traditional breadcums that show the hierarchy though. You have opted for saying 'broader' or 'narrower' classification.

Yes, because LC subject headings doesn't represent a traditional
hierarchy, but more of a conceptual network.  That is, there's more
than one "broader term" in use for many terms, so there's no
canonical hierarchy to hang bread crumbs off of (unlike DDC,
LC call numbers, or UDC).

This is very interesting. I have been talking to one of our business analysts and she says that modern computer-based subject classifications need not be hierachy based as DDC is. But the screen mockups of the new system still show breadcrumbs during subject navigation. She was involved in the creation of these mockups. I think she knows what she is talking about so I think this shows that actual implementations of conceptual networks are very thin on the ground. The mockups are probably showing breadcrumbs because we are so used to systems like DDC, but she is aware we could, and probably should, do better.

To see a clearer example of the complexity involved,
look at "Information storage and retrieval systems",
which I have a subject map for at

http://onlinebooks.library.upenn.edu/webbin/book/browse?type=lcsubc&key=Information storage and retrieval systems

Here, notice that there are multiple broader, narrower, and related terms;
no single thing "above" it in a breadcrumb hierarchy.  (Instead, you
can go up whichever route you choose.)

This is fantastic. I want it!
Is your system based on DSpace? You mentioned having to do some non-trivial programming. If your system is DSpace-based will you be able to contribute the code back to DSpace?

It's not DSpace based.  It's currently a Perl library drawing on some
somewhat out of date metadata for the LC subject headings taxonomy local
to Penn.  

Hmm. Whilst your link shows some great categorization, stuff local to Penn might be a problem when it comes to international systems (I am in the UK).
I'm considering porting the subject mapping library to Java, though,
for some other developments, which could in theory make it usable by Dspace
if interest warrants.  (Or if anyone else wants to work on this, I can
point folks to downloadable XML versions of the LC subject headings
conceptual network.)


I hope other people will be interested in this too. I think it would be a fantastic addition to DSpace. In fact I think it would be good to add even with the Penn local stuff present. After all, the controlled vocabularies that come as standard are particular to certain locales, e.g the Swedish Research Subject Categories. I am not sure it is possible to come up with a subject categorization that is locale neutral. This reminds me of some work I did in the financial sector where we had to classify certain equity time series are belonging to certain industries. I found that the industry classifications are very locale specific. For example, in Italy there is a dedicated industry classification for terracotta! So maybe the only thing we can do is add locale-specific controlled vocabularies then let the administrator/configurator choose the closest from the ones available.

Andrew M.