From: Peter W. <pet...@ke...> - 2011-12-13 19:50:52
|
Hi I use Zotero to collect bibliographical data and the .rdf export to transfer the data into my eXist database. This works fine generally, but in trying to transfer notes I find that any markup within the notes is escaped. I can handle this by using find and replace to edit the file but am trying to get this to work on a less time consuming basis. Can anyone help me by suggesting a method for achieving this? I understand that util:parse may be relevant in this process but as yet haven't managed to figure out a way using this in an XQuery to clean up the file. ( I guess shifting to the TEI export might prove better in the long run but at the moment I cannot test this as it does not seem to function for me, using the Zotero 3.02 beta.) Thanks Peter |
From: Joe W. <jo...@gm...> - 2011-12-13 21:11:11
|
Hi Peter, > I use Zotero to collect bibliographical data and the .rdf export to > transfer the data into my eXist database. This works fine generally, > but in trying to transfer notes I find that any markup within the notes > is escaped. Could you post a small example of a note with escaped text that you're trying to clean up? Joe |
From: Peter W. <plw...@bl...> - 2011-12-13 22:00:57
|
Hi Joe This is it. The would be <p> s come from the way the returns are treated and I added markup for a <per>son! <rdf:value><h6>1256 to 1272</h6> <p>&nbsp;</p> <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de Grendon regarding the warrant of 8 acres</p> <p>page 40 ditto</p> <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of Henry of Ashbourne&nbsp; re dower from various individuals including Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower amongst others.&nbsp; Makes one wonder whether &lt;per corresp='#williamofhultonclerk' role='m' &gt;William de Hulton&lt;/per&gt; and William the clerk are the same person.</p> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the custody of Margaret countess Derby&nbsp; and lands in the custody of Edmund king's son</p> <p>page 9&nbsp; and 10 1258 Information re Henry of Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one of the pledges for Henry.</p> </rdf:value> </bib:Memo> Peter On 13/12/2011 21:10, Joe Wicentowski wrote: > Hi Peter, > >> I use Zotero to collect bibliographical data and the .rdf export to >> transfer the data into my eXist database. This works fine generally, >> but in trying to transfer notes I find that any markup within the notes >> is escaped. > Could you post a small example of a note with escaped text that you're > trying to clean up? > > Joe > -- Peter Watson |
From: Joe W. <jo...@gm...> - 2011-12-20 03:14:50
|
Hi Peter, I happened to mention this Zotero export "problem" (double-escaped markup) in a tweet, and I got a reply (presumably from a Zotero developer or enthusiast) asking how the exported text was created. Could you share your steps for arriving at this text? Here's the link to the question I got: https://twitter.com/ajlyon/status/148852215342313472 Joe Sent from my iPad On Dec 13, 2011, at 4:27 PM, Peter Watson <plw...@bl...> wrote: > Hi Joe > > This is it. The would be <p> s come from the way the returns are treated and I added markup for a <per>son! > > <rdf:value><h6>1256 to 1272</h6> > <p>&nbsp;</p> > <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de Grendon regarding the warrant of 8 acres</p> > <p>page 40 ditto</p> > <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of Henry of Ashbourne&nbsp; re dower from various individuals including Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower amongst others.&nbsp; Makes one wonder whether &lt;per corresp='#williamofhultonclerk' role='m' &gt;William de Hulton&lt;/per&gt; and William the clerk are the same person.</p> > <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the custody of Margaret countess Derby&nbsp; and lands in the custody of Edmund king's son</p> > <p>page 9&nbsp; and 10 1258 Information re Henry of Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one of the pledges for Henry.</p> > </rdf:value> > </bib:Memo> > > Peter |
From: Joe W. <jo...@gm...> - 2011-12-13 22:03:32
|
Hi Peter, First, for everyone's benefit, here's a link to the documentation for the function Peter mentioned, util:parse() -- http://demo.exist-db.org/functions/util/parse. This function is very useful for taking escaped or garbled strings of HTML (such as the stuff in Zotero fields) and having eXist turn it into valid XML. > This is it. The would be <p> s come from the way the returns are treated > and I added markup for a <per>son! Great, that's very helpful and illuminates why you might be encountering some problems. If it were just a matter of a string like <p>page 40 ditto<:/p> needing to be escaped, then yes, util:parse() would do the trick. But the text you have needs a little extra work to be parsed: 1. util:parse() expects to be fed text which, once parsed, will contain a single root element. Since your snippet of escaped HTML doesn't contain a root element (e.g., <div>), you need to prepend and append a <div> and </div> to your string, so that the result will come out with a nice root element. 2. There's some doubly-escaped text, e.g., &nbsp;. This is an escaped version of , which itself is the entity for non-breaking-space. If you run util:parse() on this, it will be thrown off by &nbsp;. Even if you replace every instance of &nbsp; with , util:parse() doesn't know how to treat this entity. My suggestion is to pre-process the string, and replace instances of &nbsp; with  , which is the pure version of the non-breaking-space entity. So, taking that into account, here's the script that will produce the desired results: xquery version "1.0"; declare namespace rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; let $rdf-element := <rdf:value><h6>1256 to 1272</h6> <p>&nbsp;</p> <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de Grendon regarding the warrant of 8 acres</p> <p>page 40 ditto</p> <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of Henry of Ashbourne&nbsp; re dower from various individuals including Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower amongst others.&nbsp; Makes one wonder whether &lt;per corresp='#williamofhultonclerk' role='m' &gt;William de Hulton&lt;/per&gt; and William the clerk are the same person.</p> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the custody of Margaret countess Derby&nbsp; and lands in the custody of Edmund king's son</p> <p>page 9&nbsp; and 10 1258 Information re Henry of Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one of the pledges for Henry.</p> </rdf:value> let $rdf-text := $rdf-element/text() let $fix-nbsp := replace($rdf-text, '&nbsp;', ' ') let $wrap-with-div := concat('<div>', $fix-nbsp, '</div>') return util:parse($wrap-with-div) This will yield the following results - which I think is what you want: <div> <h6>1256 to 1272</h6> <p> </p> <p>page 32 roll 1218a 1272 John the Clerk against William de Grendon regarding the warrant of 8 acres</p> <p>page 40 ditto</p> <p> p108 roll 144 1269 Claim by Margery who was the wife of Henry of Ashbourne re dower from various individuals including Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( William de ) Hulton in Clifton . William de hylton gives up dower amongst others. Makes one wonder whether <per corresp='#williamofhultonclerk' role='m' >William de Hulton</per> and William the clerk are the same person.</p> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the custody of Margaret countess Derby and lands in the custody of Edmund king's son</p> <p>page 9 and 10 1258 Information re Henry of Ashbourne. Holds a court. Case of villeinage. Confirms Henry heir of Robert of Ashbourne. Stephen of Ireton one of the pledges for Henry.</p> </div> Cheers, Joe |
From: Peter W. <pet...@ke...> - 2011-12-14 15:43:14
|
Thanks Joe. That's brilliant. Pasted into Sandbox it does just just what's needed. Information that fills a gap that's difficult to discover. Now I aim to incorporate this into the query that pulls out information relating to any specified book and eventually put it into a type switch routine that will edit the underlying Zotero .rdf file before it is saved into eXist. Best wishes Peter On 13/12/2011 22:03, Joe Wicentowski wrote: > Hi Peter, > > First, for everyone's benefit, here's a link to the documentation for > the function Peter mentioned, util:parse() -- > http://demo.exist-db.org/functions/util/parse. This function is very > useful for taking escaped or garbled strings of HTML (such as the > stuff in Zotero fields) and having eXist turn it into valid XML. > > > This is it. The would be <p> s come from the way the returns are > treated > > and I added markup for a <per>son! > > Great, that's very helpful and illuminates why you might be > encountering some problems. If it were just a matter of a string like > > <p>page 40 ditto<:/p> > > needing to be escaped, then yes, util:parse() would do the trick. But > the text you have needs a little extra work to be parsed: > > 1. util:parse() expects to be fed text which, once parsed, will > contain a single root element. Since your snippet of escaped HTML > doesn't contain a root element (e.g., <div>), you need to prepend and > append a <div> and </div> to your string, so that the result will come > out with a nice root element. > > 2. There's some doubly-escaped text, e.g., &nbsp;. This is an > escaped version of , which itself is the entity for > non-breaking-space. If you run util:parse() on this, it will be > thrown off by &nbsp;. Even if you replace every instance of > &nbsp; with , util:parse() doesn't know how to treat this > entity. My suggestion is to pre-process the string, and replace > instances of &nbsp; with  , which is the pure version of the > non-breaking-space entity. > > So, taking that into account, here's the script that will produce the > desired results: > > > xquery version "1.0"; > > declare namespace rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; > > let $rdf-element := > <rdf:value><h6>1256 to 1272</h6> > <p>&nbsp;</p> > <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William > de Grendon regarding the warrant of 8 acres</p> > <p>page 40 ditto</p> > <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife > of Henry of Ashbourne&nbsp; re dower from various individuals > including Stephen of Ireton the third part of an acre of meadow in > Snelston, and ?( William de ) Hulton in Clifton .&nbsp; William de > hylton gives up dower amongst others.&nbsp; Makes one wonder > whether &lt;per corresp='#williamofhultonclerk' role='m' > &gt;William de Hulton&lt;/per&gt; and William the clerk > are the same person.</p> > <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in > the custody of Margaret countess Derby&nbsp; and lands in the > custody of Edmund king's son</p> > <p>page 9&nbsp; and 10 1258 Information re Henry of > Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; > Confirms Henry heir of&nbsp; Robert of Ashbourne.&nbsp; > Stephen of Ireton one of the pledges for Henry.</p> > </rdf:value> > let $rdf-text := $rdf-element/text() > let $fix-nbsp := replace($rdf-text, '&nbsp;', ' ') > let $wrap-with-div := concat('<div>', $fix-nbsp, '</div>') > return > util:parse($wrap-with-div) > > > This will yield the following results - which I think is what you want: > > <div> > <h6>1256 to 1272</h6> > <p> </p> > <p>page 32 roll 1218a 1272 John the Clerk against William de Grendon > regarding the warrant of 8 acres</p> > <p>page 40 ditto</p> > <p> p108 roll 144 1269 Claim by Margery who was the wife of Henry of > Ashbourne re dower from various individuals including Stephen of > Ireton the third part of an acre of meadow in Snelston, and ?( William > de ) Hulton in Clifton . William de hylton gives up dower amongst > others. Makes one wonder whether <per corresp='#williamofhultonclerk' > role='m' >William de Hulton</per> and William the clerk are the same > person.</p> > <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the > custody of Margaret countess Derby and lands in the custody of Edmund > king's son</p> > <p>page 9 and 10 1258 Information re Henry of Ashbourne. Holds a > court. Case of villeinage. Confirms Henry heir of Robert of > Ashbourne. Stephen of Ireton one of the pledges for Henry.</p> > </div> > > > Cheers, > Joe |
From: Joe W. <jo...@gm...> - 2011-12-14 20:21:23
|
Hi Peter, Great, glad to hear this does the trick. My one other thought, since it sounds like you're keeping the resulting XHTML snippet in your RDF, is that you might also consider forcing the markup into the xhtml namespace, just to ensure you don't inherit some other namespace from the surrounding RDF. Just add the appropriate xmlns=... namespace declaration to the div in the $wrap-with-div section. Joe On Wed, Dec 14, 2011 at 10:11 AM, Peter Watson <pet...@ke...> wrote: > Thanks Joe. That's brilliant. Pasted into Sandbox it does just just what's > needed. Information that fills a gap that's difficult to discover. Now I > aim to incorporate this into the query that pulls out information relating > to any specified book and eventually put it into a type switch routine that > will edit the underlying Zotero .rdf file before it is saved into eXist. > > Best wishes > > Peter > > > On 13/12/2011 22:03, Joe Wicentowski wrote: >> >> Hi Peter, >> >> First, for everyone's benefit, here's a link to the documentation for the >> function Peter mentioned, util:parse() -- >> http://demo.exist-db.org/functions/util/parse. This function is very useful >> for taking escaped or garbled strings of HTML (such as the stuff in Zotero >> fields) and having eXist turn it into valid XML. >> >> > This is it. The would be <p> s come from the way the returns are >> > treated >> > and I added markup for a <per>son! >> >> Great, that's very helpful and illuminates why you might be encountering >> some problems. If it were just a matter of a string like >> >> <p>page 40 ditto<:/p> >> >> needing to be escaped, then yes, util:parse() would do the trick. But the >> text you have needs a little extra work to be parsed: >> >> 1. util:parse() expects to be fed text which, once parsed, will contain a >> single root element. Since your snippet of escaped HTML doesn't contain a >> root element (e.g., <div>), you need to prepend and append a <div> and >> </div> to your string, so that the result will come out with a nice root >> element. >> >> 2. There's some doubly-escaped text, e.g., &nbsp;. This is an escaped >> version of , which itself is the entity for non-breaking-space. If >> you run util:parse() on this, it will be thrown off by &nbsp;. Even if >> you replace every instance of &nbsp; with , util:parse() doesn't >> know how to treat this entity. My suggestion is to pre-process the string, >> and replace instances of &nbsp; with  , which is the pure version >> of the non-breaking-space entity. >> >> So, taking that into account, here's the script that will produce the >> desired results: >> >> >> xquery version "1.0"; >> >> declare namespace rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; >> >> let $rdf-element := >> <rdf:value><h6>1256 to 1272</h6> >> <p>&nbsp;</p> >> <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de >> Grendon regarding the warrant of 8 acres</p> >> <p>page 40 ditto</p> >> <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of >> Henry of Ashbourne&nbsp; re dower from various individuals including >> Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( >> William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower >> amongst others.&nbsp; Makes one wonder whether &lt;per >> corresp='#williamofhultonclerk' role='m' &gt;William de >> Hulton&lt;/per&gt; and William the clerk are the same person.</p> >> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the >> custody of Margaret countess Derby&nbsp; and lands in the custody of >> Edmund king's son</p> >> <p>page 9&nbsp; and 10 1258 Information re Henry of >> Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms >> Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one >> of the pledges for Henry.</p> >> </rdf:value> >> let $rdf-text := $rdf-element/text() >> let $fix-nbsp := replace($rdf-text, '&nbsp;', ' ') >> let $wrap-with-div := concat('<div>', $fix-nbsp, '</div>') >> return >> util:parse($wrap-with-div) >> >> >> This will yield the following results - which I think is what you want: >> >> <div> >> <h6>1256 to 1272</h6> >> <p> </p> >> <p>page 32 roll 1218a 1272 John the Clerk against William de Grendon >> regarding the warrant of 8 acres</p> >> <p>page 40 ditto</p> >> <p> p108 roll 144 1269 Claim by Margery who was the wife of Henry of >> Ashbourne re dower from various individuals including Stephen of Ireton the >> third part of an acre of meadow in Snelston, and ?( William de ) Hulton in >> Clifton . William de hylton gives up dower amongst others. Makes one >> wonder whether <per corresp='#williamofhultonclerk' role='m' >William de >> Hulton</per> and William the clerk are the same person.</p> >> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the >> custody of Margaret countess Derby and lands in the custody of Edmund >> king's son</p> >> <p>page 9 and 10 1258 Information re Henry of Ashbourne. Holds a court. >> Case of villeinage. Confirms Henry heir of Robert of Ashbourne. Stephen >> of Ireton one of the pledges for Henry.</p> >> </div> >> >> >> Cheers, >> Joe > > |
From: Peter W. <pet...@ke...> - 2011-12-14 21:17:14
|
Hi Joe A good thought because I find that the markup for the inner element <per> reverts to being escaped when part of the xquery running on eXist. I'm not sure of the syntax for adding the namespace to <div>. What you seem to be suggesting is let $wrap-with-div := concat('<div xmlns = "http://www.w3.org/1999/xhtml" >', $fix-nbsp, '</div>') However doing this makes all other mark up including the <div> wrapper disappears except for the escaped markup around 'per' so I'm a bit stumped though the page does load the text. Trying to add the namespace as an attribute of div as part of the typeswitch doesn't seem to do it either creating another error and the page fails to load. I've also carefully tried to ensure that the variables are of the correct 'type'. Peter On 14/12/2011 20:20, Joe Wicentowski wrote: > Hi Peter, > > Great, glad to hear this does the trick. My one other thought, since > it sounds like you're keeping the resulting XHTML snippet in your RDF, > is that you might also consider forcing the markup into the xhtml > namespace, just to ensure you don't inherit some other namespace from > the surrounding RDF. Just add the appropriate xmlns=... namespace > declaration to the div in the $wrap-with-div section. > > Joe > > > On Wed, Dec 14, 2011 at 10:11 AM, Peter Watson > <pet...@ke...> wrote: >> Thanks Joe. That's brilliant. Pasted into Sandbox it does just just what's >> needed. Information that fills a gap that's difficult to discover. Now I >> aim to incorporate this into the query that pulls out information relating >> to any specified book and eventually put it into a type switch routine that >> will edit the underlying Zotero .rdf file before it is saved into eXist. >> >> Best wishes >> >> Peter >> >> >> On 13/12/2011 22:03, Joe Wicentowski wrote: >>> Hi Peter, >>> >>> First, for everyone's benefit, here's a link to the documentation for the >>> function Peter mentioned, util:parse() -- >>> http://demo.exist-db.org/functions/util/parse. This function is very useful >>> for taking escaped or garbled strings of HTML (such as the stuff in Zotero >>> fields) and having eXist turn it into valid XML. >>> >>>> This is it. The would be<p> s come from the way the returns are >>>> treated >>>> and I added markup for a<per>son! >>> Great, that's very helpful and illuminates why you might be encountering >>> some problems. If it were just a matter of a string like >>> >>> <p>page 40 ditto<:/p> >>> >>> needing to be escaped, then yes, util:parse() would do the trick. But the >>> text you have needs a little extra work to be parsed: >>> >>> 1. util:parse() expects to be fed text which, once parsed, will contain a >>> single root element. Since your snippet of escaped HTML doesn't contain a >>> root element (e.g.,<div>), you need to prepend and append a<div> and >>> </div> to your string, so that the result will come out with a nice root >>> element. >>> >>> 2. There's some doubly-escaped text, e.g.,&nbsp;. This is an escaped >>> version of , which itself is the entity for non-breaking-space. If >>> you run util:parse() on this, it will be thrown off by&nbsp;. Even if >>> you replace every instance of&nbsp; with , util:parse() doesn't >>> know how to treat this entity. My suggestion is to pre-process the string, >>> and replace instances of&nbsp; with , which is the pure version >>> of the non-breaking-space entity. >>> >>> So, taking that into account, here's the script that will produce the >>> desired results: >>> >>> >>> xquery version "1.0"; >>> >>> declare namespace rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; >>> >>> let $rdf-element := >>> <rdf:value><h6>1256 to 1272</h6> >>> <p>&nbsp;</p> >>> <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de >>> Grendon regarding the warrant of 8 acres</p> >>> <p>page 40 ditto</p> >>> <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of >>> Henry of Ashbourne&nbsp; re dower from various individuals including >>> Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( >>> William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower >>> amongst others.&nbsp; Makes one wonder whether&lt;per >>> corresp='#williamofhultonclerk' role='m'&gt;William de >>> Hulton&lt;/per&gt; and William the clerk are the same person.</p> >>> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the >>> custody of Margaret countess Derby&nbsp; and lands in the custody of >>> Edmund king's son</p> >>> <p>page 9&nbsp; and 10 1258 Information re Henry of >>> Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms >>> Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one >>> of the pledges for Henry.</p> >>> </rdf:value> >>> let $rdf-text := $rdf-element/text() >>> let $fix-nbsp := replace($rdf-text, '&nbsp;', ' ') >>> let $wrap-with-div := concat('<div>', $fix-nbsp,'</div>') >>> return >>> util:parse($wrap-with-div) >>> >>> >>> This will yield the following results - which I think is what you want: >>> >>> <div> >>> <h6>1256 to 1272</h6> >>> <p> </p> >>> <p>page 32 roll 1218a 1272 John the Clerk against William de Grendon >>> regarding the warrant of 8 acres</p> >>> <p>page 40 ditto</p> >>> <p> p108 roll 144 1269 Claim by Margery who was the wife of Henry of >>> Ashbourne re dower from various individuals including Stephen of Ireton the >>> third part of an acre of meadow in Snelston, and ?( William de ) Hulton in >>> Clifton . William de hylton gives up dower amongst others. Makes one >>> wonder whether<per corresp='#williamofhultonclerk' role='m'>William de >>> Hulton</per> and William the clerk are the same person.</p> >>> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the >>> custody of Margaret countess Derby and lands in the custody of Edmund >>> king's son</p> >>> <p>page 9 and 10 1258 Information re Henry of Ashbourne. Holds a court. >>> Case of villeinage. Confirms Henry heir of Robert of Ashbourne. Stephen >>> of Ireton one of the pledges for Henry.</p> >>> </div> >>> >>> >>> Cheers, >>> Joe >> |
From: Joe W. <jo...@gm...> - 2011-12-15 02:44:38
|
Hi Peter, > A good thought because I find that the markup for the inner element <per> > reverts to being escaped when part of the xquery running on eXist. I'm not > sure of the syntax for adding the namespace to <div>. What you seem to be > suggesting is > > let $wrap-with-div := concat('<div xmlns = "http://www.w3.org/1999/xhtml" >>', $fix-nbsp, '</div>') Yes, that's what I was suggesting. But in my testing the problem with the <per> remaining escaped is the same whether you provide a namespace declaration for the div or not. The reason? We missed two other doubly-escaped entities: &gt; and &lt; I think if you account for these in the same way as we did for &amp; you'll be all set. The only difference is that unlike , < and > are defined XML entities, so no need to use numeric versions like  . Cheers, Joe |
From: Peter W. <plw...@bl...> - 2011-12-15 10:34:55
|
Hi Joe Yes, that's fixed it. Thank you very much indeed for your help. A surprisingly steep little learning curve. Best wishes PEter On 15/12/2011 02:44, Joe Wicentowski wrote: > Hi Peter, > >> A good thought because I find that the markup for the inner element<per> >> reverts to being escaped when part of the xquery running on eXist. I'm not >> sure of the syntax for adding the namespace to<div>. What you seem to be >> suggesting is >> >> let $wrap-with-div := concat('<div xmlns = "http://www.w3.org/1999/xhtml" >>> ', $fix-nbsp,'</div>') > Yes, that's what I was suggesting. But in my testing the problem with > the<per> remaining escaped is the same whether you provide a > namespace declaration for the div or not. The reason? We missed two > other doubly-escaped entities: > > &gt; and&lt; > > I think if you account for these in the same way as we did for > &amp; you'll be all set. The only difference is that unlike > ,< and> are defined XML entities, so no need to use > numeric versions like . > > Cheers, > Joe > -- Peter Watson |
From: Joe W. <jo...@gm...> - 2011-12-15 11:44:22
|
Hi Peter, > Yes, that's fixed it. Thank you very much indeed for your help. A > surprisingly steep little learning curve. Great! I hadn't encountered doubly-escaped entities before either. I think Zotero could do a better job at handling text like this. It seems they err on the side of retaining the text you enter into the program, by escaping the text, rather than treating it as XML (with its notions of well formedness and schema compliance). As a result, Zotero makes you go through contortions to get information out. I was looking at a Zotero database that a colleague had exported to RDF for me with attachments, and noticed lots of escaped HTML in the RDF file. Also, while some attachments were simply PDFs, others were saved copies of web pages. The HTML pages hadn't been subjected to any sort of "tidy" routine, so many were not well formed XML. This fits in the pattern we observed above: Zotero is erring on the side of preserving data 'as is', rather than imposing well-formedness on the data. I think the reality is that, in order to get Zotero databases into eXist, some extra processing is going to be necessary. If the TEI/Zotero community is already working on a "TEI" export of Zotero databases, it might be worth seeing what tools they're developing, or joining their effort, so you don't have to deal with the nastier side of Zotero data. Joe |
From: Peter W. <plw...@bl...> - 2011-12-15 16:19:32
|
Helpful comments Joe. Certainly the TEI route is preferable if it covers data requirements as there is no sense in duplication of effort. I've demonstrated to myself how the biblographical data can be integrated into the rest of my research data both from the Zotero notes and in collecting bibliographical references to the same source, as well as the construcion of the bibliography. TEI structure would make implementing and extending this much simpler. As noted elsewhere, at the moment I cannot get access to this on my current setup, but I'm living in hope. Peter On 15/12/2011 11:43, Joe Wicentowski wrote: > Hi Peter, > >> Yes, that's fixed it. Thank you very much indeed for your help. A >> surprisingly steep little learning curve. > Great! I hadn't encountered doubly-escaped entities before either. I > think Zotero could do a better job at handling text like this. It > seems they err on the side of retaining the text you enter into the > program, by escaping the text, rather than treating it as XML (with > its notions of well formedness and schema compliance). As a result, > Zotero makes you go through contortions to get information out. > > I was looking at a Zotero database that a colleague had exported to > RDF for me with attachments, and noticed lots of escaped HTML in the > RDF file. Also, while some attachments were simply PDFs, others were > saved copies of web pages. The HTML pages hadn't been subjected to > any sort of "tidy" routine, so many were not well formed XML. This > fits in the pattern we observed above: Zotero is erring on the side of > preserving data 'as is', rather than imposing well-formedness on the > data. > > I think the reality is that, in order to get Zotero databases into > eXist, some extra processing is going to be necessary. If the > TEI/Zotero community is already working on a "TEI" export of Zotero > databases, it might be worth seeing what tools they're developing, or > joining their effort, so you don't have to deal with the nastier side > of Zotero data. > > Joe > > ------------------------------------------------------------------------------ > 10 Tips for Better Server Consolidation > Server virtualization is being driven by many needs. > But none more important than the need to reduce IT complexity > while improving strategic productivity. Learn More! > http://www.accelacomm.com/jaw/sdnl/114/51507609/ > _______________________________________________ > eXist-TEIXML mailing list > eXi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-teixml -- Peter Watson |
From: Stefan M. <xm...@st...> - 2011-12-15 12:10:58
|
On -10/01/37 20:59, Peter Watson wrote: > the long run but at the moment I cannot test this as it does not seem to > function for me, using the Zotero 3.02 beta.) I am happy to report that we make good progress with the TEI translator and got it included in the 3.0 branch of Zotero. Therefore, it will be available in the stock installation of Zotero add-on and Zotero standalone. If you are using the FF add-in, you might want to try the development packages from the 3.0 branch at http://www.zotero.org/support/dev_builds . Interested users who are currently using Zotero2 in combination with my tei-zotero-translator add-on, need to deinstall the tei-zotero-translator add-on prior to installation. The reason the translator was not working properly, was some minor bug in Zotero that affected (i.e. broke) the TEI, RIS and LaTeX exports. I have reported this and this has been fixed in Zotero, so everything should be fine now. In any case I'be quite keen on receiving feedback regarding the pre-release version included in Zotero 3.0. cheers, Stefan |
From: Peter W. <plw...@bl...> - 2011-12-15 16:07:27
|
Thanks Stefan. I installed the latest version 3.0b r10701 from the development branch of Zotero but still get the same error message when I try to download using the TEI option. I'm certainly keen to go down the TEI route if it preserves the data I need. Best wishes Peter On 15/12/2011 11:41, Stefan Majewski wrote: > On -10/01/37 20:59, Peter Watson wrote: >> the long run but at the moment I cannot test this as it does not seem to >> function for me, using the Zotero 3.02 beta.) > > I am happy to report that we make good progress with the TEI > translator and got it included in the 3.0 branch of Zotero. Therefore, > it will be available in the stock installation of Zotero add-on and > Zotero standalone. > > If you are using the FF add-in, you might want to try the development > packages from the 3.0 branch at > http://www.zotero.org/support/dev_builds . Interested users who are > currently using Zotero2 in combination with my tei-zotero-translator > add-on, need to deinstall the tei-zotero-translator add-on prior to > installation. > > The reason the translator was not working properly, was some minor bug > in Zotero that affected (i.e. broke) the TEI, RIS and LaTeX exports. I > have reported this and this has been fixed in Zotero, so everything > should be fine now. In any case I'be quite keen on receiving feedback > regarding the pre-release version included in Zotero 3.0. > > cheers, > Stefan > -- Peter Watson |
From: Stefan M. <xm...@st...> - 2011-12-16 11:46:05
|
On 15/12/11 17:07, Peter Watson wrote: > Thanks Stefan. I installed the latest version 3.0b r10701 Thats the version that reports as 3.0b3r10701 in the add-ons manager, right? Then it is the version I am currently using as well. > from the development branch of Zotero but still get the same error > message when I try to download using the TEI option. As I don't get an error message with this version of Zotero and I would really like to fix this, I would be very grateful for some more details on the error you are seeing. Maybe off-list, as I fear that this might be boring and off-topic for the other list-members. 1. what are the steps you are performing? (you are exporting items, collections, libraries?) 2. what options are set for export? (exportNotes, generateXMLIDs, fullTEIDocument, createCollections?) 3. what are the error messages you see in the JavascriptConsole (these are _really_ helpful, you can open the JS console via the Tools Menu (Error Console) or Ctrl+Shift+J) 4. If it were possible for you to share a part of your bibliography that makes problems, an export (preferably ZoteroRDF) would be extremely helpful. Of course you could send it directly to me and of course I would keep the contents confidential. > I'm certainly keen to go down the TEI route if it preserves the data > I need. I'd be very keen to do this either, so let's join forces. kind regards, Stefan |
From: Peter W. <plw...@bl...> - 2011-12-20 09:26:08
|
Hi Joe What I am doing is to create a note in Zotero including a small amount of markup to tag names and anything else I want to highlight when the Zotero file is loaded into eXist. Any <p>s are created by Zotero presumably from the carriage returns I put in to format the Zotero text. I see there is an html button on the editor which has a pop up saying 'remove formatting' but I haven't used that - I'll test to see what happens when I do. I create my Zotero file for eXist using export with the 'Zotero RDF' option with 'Export Notes' ticked. I can then load this file into eXist via oXygen and perform any transformations as I pull information out of it with xQuery. Best wishes Peter > Hi Peter, > > I happened to mention this Zotero export "problem" (double-escaped markup) in a tweet, and I got a reply (presumably from a Zotero developer or enthusiast) asking how the exported text was created. Could you share your steps for arriving at this text? > > Here's the link to the question I got: > > https://twitter.com/ajlyon/status/148852215342313472 > > Joe > > Sent from my iPad > >> Hi Joe >> >> This is it. The would be<p> s come from the way the returns are treated and I added markup for a<per>son! >> >> <rdf:value><h6>1256 to 1272</h6> >> <p>&nbsp;</p> >> <p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de Grendon regarding the warrant of 8 acres</p> >> <p>page 40 ditto</p> >> <p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of Henry of Ashbourne&nbsp; re dower from various individuals including Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower amongst others.&nbsp; Makes one wonder whether&lt;per corresp='#williamofhultonclerk' role='m'&gt;William de Hulton&lt;/per&gt; and William the clerk are the same person.</p> >> <p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the custody of Margaret countess Derby&nbsp; and lands in the custody of Edmund king's son</p> >> <p>page 9&nbsp; and 10 1258 Information re Henry of Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one of the pledges for Henry.</p> >> </rdf:value> >> </bib:Memo> >> >> Peter -- Peter Watson |