From: Peter W. <pet...@ke...> - 2012-03-15 21:32:07
|
Ok, I think I may have a solution to the problem of moving the notes to the correct position in the text. After much experimentation, the only way I could find to sort out what turned out to be a context problem was to move to the typeswitch function. Now for the paragraphs! Peter On 15/03/2012 10:49, Peter Watson wrote: > Hi > There is a very useful blog by Joe Wicentowski on transforming text into > XML which you can find at > http://joewiz.posterous.com/an-under-appreciated-use-for-xquery-wrangling. > As I have quite a lot of this to do, I started experimenting, and my > effort is shown below. This successfully achieves a basic > transformation of nearly 200 pages of text with footnotes at the bottom > of each page and headers at the top. Incidentally it is useful to look > at the text, with code revealed, eg in Word as this shows what is likely > to work best when tokenizing. I run into problems when I try to replace > specially marked note numbers in the text with the associated notes (I > tried using following-sibling::note and matching the numbers without > success) and also what to do about amalgating paragraphs crossing page > breaks. Again I can see what might work reasonably well, eg paras > starting with lower case after page breaks but I'm not sure how set out > the XQuery to get there. Maybe I'll need to revert to manual from this > point, but would be grateful to hear from anyone who has suggestions as > to how I might proceed. > > Thanks > > Peter > > xquery version "1.0"; > declare function local:transform-block($block) > { > (:To replace the note number in the text with a more identifiable marker:) > let $text-with-marked-note-number := replace($block, > '\s(\d)\s|\s(\d)$',' XXX$1$2 ') > (:The next two lines replace the page headers with<pb/>:) > return if (matches($text-with-marked-note-number, "THE OKEOVERS OF > OKEOVER.*")) then<pb/> > else if (matches($text-with-marked-note-number, "\dTHE OKEOVERS OF > OKEOVER")) then<pb/> > (:To identify footnotes and markup accordingly:) > else if (matches($text-with-marked-note-number,'^\d\s')) then > <note>{$text-with-marked-note-number}</note> > (:Everything else marked up as element p:) > else<p>{$text-with-marked-note-number}</p> > }; > > let $file := doc('chsokeover.txt')/root > (:Text tokenised on the basis of 3 carriage returns/spaces:) > let $content :=tokenize($file, '\s{3,}') return > <root> > <head>The Okeovers of Okeover.</head> > {for $block in $content > (:Sends the tokenized text to be transformed:) > let $result := local:transform-block($block) > return $result} > </root> > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > eXist-TEIXML mailing list > eXi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-teixml > . > |