Thread: Re: [eXist-TEIXML] Encoding dates for use in range index

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Peter,

Thanks for your reply.  I'd always envisioned constraining the date facets to year data, but had wanted to record the month and day when it was supplied as well.  It's really only an issue for articles published in newspapers, where the exact date is essential, but it would be handy if eXist's range index could work with xs:gYear values.

I had not thought beyond the range index as far as constructing my own index or concordance in the way you've done.  One complexity is that I have ~1300 TEI files (one for each author), and then many <biblStruct> elements within each TEI file (probably 20000+). However, I could match the xml:id for each <biblStruct> to a (publication) date similar to your approach. Thanks for sharing this.

I also wonder whether this might be a good use for the map datatype?

let $pub-dates :=
map {
         "id1" = 1938,
         "id2" = 1975,
         "id3" = 1989
        }

Each id would be a <biblStruct> xml:id.  The resulting map would be large, so I don't know if this is viable or not. But an advantage would be having the map functions to manipulate / manage it.

Thanks,
Chris

________________________________________
From: Peter Stadler [st...@we...]
Sent: 03 April 2013 03:13
To: Christopher Thomson
Cc: exist-teixml ‎[exi...@li...]‎
Subject: [SPAM: 11.000] Re: [eXist-TEIXML] Encoding dates for use in range index

Hi Chris,

I've been struggling with TEI dates for the same reasons as well: The TEI @when (as well as @from, @to et. al.) is a blend of several datatypes which makes it very hard to work with. My solution has been to a) constrain the possible datatypes to xs:gYear and xs:date in the schema and b) regularize the dates in a concordance file. This file simply lists all my document IDs together with its respective regularized date value (which is xs:date in my case).  That way I can perform all operations on that file only which is pretty fast. (That file should then be updated via e.g. triggers every time there are changes to your documents.)
But I think you can make your life much easier if you constrained your facets to years (which should be a sufficient granularity for bibliographies IMHO)?!
If you go down that road you probably won't need the util:index-keys approach but can simply do a distinct-values on your regularized year values.

Does that help?
All the best
Peter

Am 28.03.2013 um 01:57 schrieb Christopher Thomson <chr...@ca...>:

> Hi all,
>
> I'm working on obtaining data using util:index-keys in order to build a faceted search filter, as described in this tutorial: http://rvdb.wordpress.com/2010/10/06/mimicking-faceted-searching-in-exist/
>
> My question relates to dates.  I have been encoding dates using the TEI @when attribute.  However, I understand eXist's range index only works on valid xs:date attributes which need to be YYYY-MM-DD. The data is a humanities bibliography, so largely has publication dates which are year only, or sometimes year-month.  I can see a couple of options, but maybe there are others:
>
> 1) Index the @when as a string, ie <create qname="@when" type="xs:string"/>. This seems like a good option right now.  As my goal is to create a faceted search interface, I'm not sure if I lose much by recording dates as strings in the index. I want to be able to, say, narrow down search results by decade.  Any thoughts?  If I extract the date strings from my index and create facets, can I then cast them back to xs:date if that proves necessary?
>
> 2) Use a different attribute for non xs:date conforming date data, eg the TEI has @when-custom as an option for the date element.  The obvious downside to this is it puts my statistics for faceting into two places, ie @when and @when-custom.  The main use case is likely to be, 'Find all publications between these dates', so I suspect the data with year of publication only is going to be the most important to users.
>
> Does anyone have thoughts or advice on this?
>
> Thanks,
> Chris
>

--
Peter Stadler
Carl-Maria-von-Weber-Gesamtausgabe
Arbeitsstelle Detmold
Gartenstr. 20
D-32756 Detmold
Tel. +49 5231 975-665
Fax: +49 5231 975-668
stadler at weber-gesamtausgabe.de
www.weber-gesamtausgabe.de

Thread: Re: [eXist-TEIXML] Encoding dates for use in range index

eXist-db is a feature rich Open Source native XML database

exist-teixml