From: Joe W. <jo...@gm...> - 2011-12-14 20:21:23
|
Hi Peter, Great, glad to hear this does the trick. My one other thought, since it sounds like you're keeping the resulting XHTML snippet in your RDF, is that you might also consider forcing the markup into the xhtml namespace, just to ensure you don't inherit some other namespace from the surrounding RDF. Just add the appropriate xmlns=... namespace declaration to the div in the $wrap-with-div section. Joe On Wed, Dec 14, 2011 at 10:11 AM, Peter Watson <pet...@ke...> wrote: > Thanks Joe. That's brilliant. Pasted into Sandbox it does just just what's > needed. Information that fills a gap that's difficult to discover. Now I > aim to incorporate this into the query that pulls out information relating > to any specified book and eventually put it into a type switch routine that > will edit the underlying Zotero .rdf file before it is saved into eXist. > > Best wishes > > Peter > > > On 13/12/2011 22:03, Joe Wicentowski wrote: >> >> Hi Peter, >> >> First, for everyone's benefit, here's a link to the documentation for the >> function Peter mentioned, util:parse() -- >> http://demo.exist-db.org/functions/util/parse. This function is very useful >> for taking escaped or garbled strings of HTML (such as the stuff in Zotero >> fields) and having eXist turn it into valid XML. >> >> > This is it. The would be <p> s come from the way the returns are >> > treated >> > and I added markup for a <per>son! >> >> Great, that's very helpful and illuminates why you might be encountering >> some problems. If it were just a matter of a string like >> >> <p>page 40 ditto<:/p> >> >> needing to be escaped, then yes, util:parse() would do the trick. But the >> text you have needs a little extra work to be parsed: >> >> 1. util:parse() expects to be fed text which, once parsed, will contain a >> single root element. Since your snippet of escaped HTML doesn't contain a >> root element (e.g., <div>), you need to prepend and append a <div> and >> </div> to your string, so that the result will come out with a nice root >> element. >> >> 2. There's some doubly-escaped text, e.g., &nbsp;. This is an escaped >> version of , which itself is the entity for non-breaking-space. If >> you run util:parse() on this, it will be thrown off by &nbsp;. Even if >> you replace every instance of &nbsp; with , util:parse() doesn't >> know how to treat this entity. My suggestion is to pre-process the string, >> and replace instances of &nbsp; with  , which is the pure version >> of the non-breaking-space entity. >> >> So, taking that into account, here's the script that will produce the >> desired results: >> >> >> xquery version "1.0"; >> >> declare namespace rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; >> >> let $rdf-element := >> <rdf:value><h6>1256 to 1272</h6> >> <p>&nbsp;</p> >> <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de >> Grendon regarding the warrant of 8 acres</p> >> <p>page 40 ditto</p> >> <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of >> Henry of Ashbourne&nbsp; re dower from various individuals including >> Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( >> William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower >> amongst others.&nbsp; Makes one wonder whether &lt;per >> corresp='#williamofhultonclerk' role='m' &gt;William de >> Hulton&lt;/per&gt; and William the clerk are the same person.</p> >> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the >> custody of Margaret countess Derby&nbsp; and lands in the custody of >> Edmund king's son</p> >> <p>page 9&nbsp; and 10 1258 Information re Henry of >> Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms >> Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one >> of the pledges for Henry.</p> >> </rdf:value> >> let $rdf-text := $rdf-element/text() >> let $fix-nbsp := replace($rdf-text, '&nbsp;', ' ') >> let $wrap-with-div := concat('<div>', $fix-nbsp, '</div>') >> return >> util:parse($wrap-with-div) >> >> >> This will yield the following results - which I think is what you want: >> >> <div> >> <h6>1256 to 1272</h6> >> <p> </p> >> <p>page 32 roll 1218a 1272 John the Clerk against William de Grendon >> regarding the warrant of 8 acres</p> >> <p>page 40 ditto</p> >> <p> p108 roll 144 1269 Claim by Margery who was the wife of Henry of >> Ashbourne re dower from various individuals including Stephen of Ireton the >> third part of an acre of meadow in Snelston, and ?( William de ) Hulton in >> Clifton . William de hylton gives up dower amongst others. Makes one >> wonder whether <per corresp='#williamofhultonclerk' role='m' >William de >> Hulton</per> and William the clerk are the same person.</p> >> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the >> custody of Margaret countess Derby and lands in the custody of Edmund >> king's son</p> >> <p>page 9 and 10 1258 Information re Henry of Ashbourne. Holds a court. >> Case of villeinage. Confirms Henry heir of Robert of Ashbourne. Stephen >> of Ireton one of the pledges for Henry.</p> >> </div> >> >> >> Cheers, >> Joe > > |