From: Joe W. <jo...@gm...> - 2012-03-17 17:20:21
|
Hi Peter, >> There is a very useful blog by Joe Wicentowski on transforming text into >> XML which you can find at >> http://joewiz.posterous.com/an-under-appreciated-use-for-xquery-wrangling. Thanks for sharing the post! I should've mentioned it here before. I think it could be useful to many TEI projects that require importing data from non-XML formats that use "flat text" to represent distinct structural/semantic features (e.g., tabs representing list levels, and numbers indicating footnote texts), whereas in TEI we use explicit tags to indicate these features. Processing flat text to identify these features and translate them into TEI can be challenging but also rewarding. >> As I have quite a lot of this to do, I started experimenting, and my >> effort is shown below. This successfully achieves a basic >> transformation of nearly 200 pages of text with footnotes at the bottom >> of each page and headers at the top. Wow, 200 pages! That's great. > Ok, I think I may have a solution to the problem of moving the notes to the > correct position in the text. After much experimentation, the only way I > could find to sort out what turned out to be a context problem was to move > to the typeswitch function. Now for the paragraphs! Excellent. Nice work! I think the technique illustrated in the blog post could definitely be extended to deal with many flat text features. The trick is in getting the "pipeline" correct: the steps need to be executed in the right order so that each feature is captured correctly. Keep us posted, Joe |