Good suggestion from Jakob, though it turned out David's problem was
two-fold.
First, some of the dates in his EAD files were of the form "December 1999,
copyright (C) 1999". The prefilter was getting confused by the two years and
causing the indexer to abort those files prematurely. I have a stylesheet
fix for this if anyone else experiences that problem.
The second problem was that XTF needs "ead" to be in the directory path or
filename in order to invoke the correct prefilter. When he moved the files
to a subdirectory called "ead" and reindexed, it worked fine.
That second point will be important to anyone upgrading to the
1.8stylesheets: XTF used to assume every XML file was TEI; now it
needs to
distinguish TEI from EAD, and your data layout has to give it that clue.
Thus the layout of the new sample data file:
data/
tei/
teiFiles...
ead/
eadFiles...
If you don't want to organize your data that way, you can use whatever
directory names you want, and instead name the individual XML files like
this: "myfile.ead.xml" or "myfile.tei.xml".
Of course, if you don't like the way it's differentiating files, you can
always code whatever custom logic you like in the docSelector.xsl.
--Martin
On 8/29/06, Jakob Saternus <jakob@...> wrote:
>
>
> David,
>
> A solution would be to write a simple stylesheet to extract
> those Dublin Core elements occurring below
> /ead/eadheader/filedesc/titlestmt
> from all your EAD files and save them in those ittle .dc.xml files.
> Look at the TEI sample files - you'll notice they come in twos.
>
> /Jakob
>
> Friday, August 25, 2006, 6:39:06 PM, you wrote:
>
> > Dear List:
>
> > Yesterday I download and successfully installed the XTF application in
> > about one hour. The author(s) of the Deployment and Programming Guide
> did a
> > wonderful job. This morning I began to explore under the hood of the
> system
> > to understand how the application works. To that end I added some of my
> own
> > ead encoded xml files and used "textIndexer" to add additional content
> to
> > the default index. From the new index I was able to search and retrieve
> my
> > own content. The one problem I did encounter was that my ead document
> (the
> > search result was formatted properly) did not display metadata for the
> > tags: Author, Title, Collection, Published, Matches and Similar. In
> > addition words in context although rendered in red text where not hot
> linked.
>
> > I am not an expert XSLT programmer but I knew enough (I think) to
> > determine that textIndexer points to docSelector.xsl where pre-filtering
> > and display options are identified. I looked at docSelector and
> observed
> > that ead files are pre-filtered by eadPreFiler.xsl that is pre-pended
> with
> > preFilterCommon.xsl. Together these files operate to extract metdata
> about
> > the above tags and than write out a new ead file that is sent to the
> index
> > engine. I examined the code in eadPreFiler.xsl which can obtain
> metadata
> > for the tags either from a "dc" file if present or from the ead document
> > itself. The templates get-ead-title get-ead-author get-ead-date
> > get-ead-description get-ead-identifier had test conditions that
> aligned
> > with my ead document. For example the template get-ead-author says:
>
> > <!-- Fetch creator (author) info from the eadHeader or the first
> archdesc/did. -->>
> > <xsl:template name="get-ead-author">
> > <xsl:choose>
> > <xsl:when test="/ead/eadheader/filedesc/titlestmt/author">
> > <creator xtf:meta="true">
> > <xsl:value-of
> > select="/ead/eadheader/filedesc/titlestmt/author"/>
> > </creator>
> > </xsl:when>
> > <xsl:when
> > test="/ead/archdesc[1]/did[1]/origination[starts-with(@label,
> 'Creator')]">
> > <creator xtf:meta="true">
> > <xsl:value-of
> > select="/ead/archdesc[1]/did[1]/origination[@label, 'Creator']"/>
> > </creator>
> > </xsl:when>
> > </xsl:choose>
> > </xsl:template>
>
> > Here is a code snippet from my ead document:
>
> > <?xml version="1.0" encoding="ASCII"?>
> > <!DOCTYPE ead PUBLIC "-//Society of American Archivists//DTD ead.dtd
> > (Encoded Archival Description (EAD) Version 1.0)//EN" "ead.dtd">
> > <ead id="mssa.ms.0205">
> > <eadheader langencoding="ISO639-2"
> > findaidstatus="edited-full-draft"
> > audience="external" id="ACP7866">
> > <eadid systemid="xml">mssa.ms.0205</eadid>
> > <filedesc>
> > <titlestmt>
> > <titleproper>Guide to the Percival Farquhar Papers
> > <num>Manuscript Group 205</num></titleproper>
> > <author>compiled by Manuscripts and Archives Staff</author>
> > </titlestmt>
>
> > Why on output in the Author tag is this data missing "compiled by
> > Manuscripts and Archives" and the word "none" is substituted? Any
> general
> > assistance to resolve this type of problem and any insight as to why
> words
> > in context where not hot linked would be greatly appreciated.
>
> > Thanks Much,
>
> > David
>
>
>
> > David Gewirtz
> > Yale University
> > Project Manager and Digital Preservation Architect
> > Academic Media & Technology - Library Systems Group
> > Phone:203-432-3195 Cell: 203-530-6218
> > e-mail: David.Gewirtz@...
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Xtf-user mailing list
> Xtf-user@...
> https://lists.sourceforge.net/lists/listinfo/xtf-user
>
|