Re: [eXist-TEIXML] Transforming text

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Peter,

>> There is a very useful blog by Joe Wicentowski on transforming text into
>> XML which you can find at
>> http://joewiz.posterous.com/an-under-appreciated-use-for-xquery-wrangling.

Thanks for sharing the post!  I should've mentioned it here before.  I
think it could be useful to many TEI projects that require importing
data from non-XML formats that use "flat text" to represent distinct
structural/semantic features (e.g., tabs representing list levels, and
numbers indicating footnote texts), whereas in TEI we use explicit
tags to indicate these features.  Processing flat text to identify
these features and translate them into TEI can be challenging but also
rewarding.

>> As I have quite a lot of this to do, I started experimenting, and my
>> effort is shown below.  This successfully achieves a basic
>> transformation of nearly 200 pages of text with footnotes at the bottom
>> of each page and headers at the top.

Wow, 200 pages!  That's great.

> Ok, I think I may have a solution to the problem of moving the notes to the
> correct position in the text.  After much experimentation, the only way I
> could find to sort out what turned out to be  a context problem was to move
> to the typeswitch function.  Now for the paragraphs!

Excellent.  Nice work!  I think the technique illustrated in the blog
post could definitely be extended to deal with many flat text
features.  The trick is in getting the "pipeline" correct: the steps
need to be executed in the right order so that each feature is
captured correctly.

Keep us posted,
Joe

Re: [eXist-TEIXML] Transforming text

eXist-db is a feature rich Open Source native XML database

Re: [eXist-TEIXML] Transforming text