From: Zac M. <mar...@gm...> - 2009-10-29 17:27:37
|
Hi, I was trying to conduct a XQuery full-text search within a CDATA field - via both Lucene and the built-in full-text configurations. Apparently it did not work. After searching the mail archive, I found the following thread: http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results Is it still the case? I understand CDATA was intended for comments. But in the XML files we have to consume, a lot of CDATA fields were used to convey useful information(needed to be escaped otherwise). I'd like to know any alternatives that does full-text XQuery within CDATA sections. Thank you Zac |
From: Mike S. <so...@if...> - 2009-10-29 17:48:37
|
I think the thread you referenced dealt with CDATA in a pretty definitive way: basically, indexes are created for elements and attributes: CDATA is neither - it's just a special kind of text, so to speak, so its contents are only indexed as part of any (indexed) element that contains it. Changing that would break other things, so it's not likely to change. It's possible that if you post some of your content and describe the intended query behaviour, that readers of the list would be able to propose a workable solution for you. -Mike Zac Marsh wrote: > Hi, > > I was trying to conduct a XQuery full-text search within a CDATA field > - via both Lucene and the built-in full-text configurations. > Apparently it did not work. > > After searching the mail archive, I found the following thread: > http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results > > Is it still the case? I understand CDATA was intended for comments. > But in the XML files we have to consume, a lot of CDATA fields were > used to convey useful information(needed to be escaped otherwise). I'd > like to know any alternatives that does full-text XQuery within CDATA > sections. > > Thank you > Zac > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |
From: Zac M. <mar...@gm...> - 2009-10-29 18:32:42
|
Hi Mike, Thanks for the response. The following is a snippet from the XML we are consuming. When using XQuery to do a full-text search within the CDATA section, eXist does not return results. <docset> <document><![CDATA[ Oct. 29 (Bloomberg) -- The U.S. economy returned to growth in the third quarter after a yearlong contraction as government incentives spurred consumers to spend more on homes and cars. The world’s largest economy expanded at a 3.5 percent pace from July through September, figures from the Commerce Department showed today in Washington. Household purchases climbed 3.4 percent, the most in two years. ]]> </document> <docTitle/> </docset> On Thu, Oct 29, 2009 at 1:48 PM, Mike Sokolov <so...@if...> wrote: > I think the thread you referenced dealt with CDATA in a pretty definitive > way: basically, indexes are created for elements and attributes: CDATA is > neither - it's just a special kind of text, so to speak, so its contents are > only indexed as part of any (indexed) element that contains it. Changing > that would break other things, so it's not likely to change. > > It's possible that if you post some of your content and describe the > intended query behaviour, that readers of the list would be able to propose > a workable solution for you. > > -Mike > > Zac Marsh wrote: >> >> Hi, >> >> I was trying to conduct a XQuery full-text search within a CDATA field >> - via both Lucene and the built-in full-text configurations. >> Apparently it did not work. >> >> After searching the mail archive, I found the following thread: >> >> http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results >> >> Is it still the case? I understand CDATA was intended for comments. >> But in the XML files we have to consume, a lot of CDATA fields were >> used to convey useful information(needed to be escaped otherwise). I'd >> like to know any alternatives that does full-text XQuery within CDATA >> sections. >> >> Thank you >> Zac >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> Exist-open mailing list >> Exi...@li... >> https://lists.sourceforge.net/lists/listinfo/exist-open >> > |
From: Mike S. <so...@if...> - 2009-10-29 18:35:04
|
It looks as if you just need to query the document element; what's the query you're trying that's failing, and what indexes do you have defined? -Mike Zac Marsh wrote: > Hi Mike, > > Thanks for the response. > > The following is a snippet from the XML we are consuming. When using > XQuery to do a full-text search within the CDATA section, eXist does > not return results. > > <docset> > <document><![CDATA[ > Oct. 29 (Bloomberg) -- The U.S. economy returned to growth in the > third quarter after a yearlong contraction as government incentives > spurred consumers to spend more on homes and cars. > > The world’s largest economy expanded at a 3.5 percent pace from July > through September, figures from the Commerce Department showed today > in Washington. Household purchases climbed 3.4 percent, the most in > two years. > ]]> > </document> > <docTitle/> > </docset> > > On Thu, Oct 29, 2009 at 1:48 PM, Mike Sokolov <so...@if...> wrote: > >> I think the thread you referenced dealt with CDATA in a pretty definitive >> way: basically, indexes are created for elements and attributes: CDATA is >> neither - it's just a special kind of text, so to speak, so its contents are >> only indexed as part of any (indexed) element that contains it. Changing >> that would break other things, so it's not likely to change. >> >> It's possible that if you post some of your content and describe the >> intended query behaviour, that readers of the list would be able to propose >> a workable solution for you. >> >> -Mike >> >> Zac Marsh wrote: >> >>> Hi, >>> >>> I was trying to conduct a XQuery full-text search within a CDATA field >>> - via both Lucene and the built-in full-text configurations. >>> Apparently it did not work. >>> >>> After searching the mail archive, I found the following thread: >>> >>> http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results >>> >>> Is it still the case? I understand CDATA was intended for comments. >>> But in the XML files we have to consume, a lot of CDATA fields were >>> used to convey useful information(needed to be escaped otherwise). I'd >>> like to know any alternatives that does full-text XQuery within CDATA >>> sections. >>> >>> Thank you >>> Zac >>> >>> >>> ------------------------------------------------------------------------------ >>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart your >>> developing skills, take BlackBerry mobile applications to market and stay >>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>> http://p.sf.net/sfu/devconference >>> _______________________________________________ >>> Exist-open mailing list >>> Exi...@li... >>> https://lists.sourceforge.net/lists/listinfo/exist-open >>> >>> |
From: Zac M. <mar...@gm...> - 2009-10-29 18:47:11
|
In <index> of collection.xconf We tried: <fulltext default="none" attributes="no"> <text qname="document"/> </fulltext> As well as <lucene> <text qname="document"/> </lucene> Neither works. However if we put another field (which is not CDATA) in the index, xquery will return the correct results. The query we used: //document[ft:query(., '"economy"')] Thanks On Thu, Oct 29, 2009 at 2:34 PM, Mike Sokolov <so...@if...> wrote: > It looks as if you just need to query the document element; what's the query > you're trying that's failing, and what indexes do you have defined? > > -Mike > > Zac Marsh wrote: >> >> Hi Mike, >> >> Thanks for the response. >> >> The following is a snippet from the XML we are consuming. When using >> XQuery to do a full-text search within the CDATA section, eXist does >> not return results. >> >> <docset> >> <document><![CDATA[ >> Oct. 29 (Bloomberg) -- The U.S. economy returned to growth in the >> third quarter after a yearlong contraction as government incentives >> spurred consumers to spend more on homes and cars. >> >> The world’s largest economy expanded at a 3.5 percent pace from July >> through September, figures from the Commerce Department showed today >> in Washington. Household purchases climbed 3.4 percent, the most in >> two years. >> ]]> >> </document> >> <docTitle/> >> </docset> >> >> On Thu, Oct 29, 2009 at 1:48 PM, Mike Sokolov <so...@if...> >> wrote: >> >>> >>> I think the thread you referenced dealt with CDATA in a pretty definitive >>> way: basically, indexes are created for elements and attributes: CDATA is >>> neither - it's just a special kind of text, so to speak, so its contents >>> are >>> only indexed as part of any (indexed) element that contains it. Changing >>> that would break other things, so it's not likely to change. >>> >>> It's possible that if you post some of your content and describe the >>> intended query behaviour, that readers of the list would be able to >>> propose >>> a workable solution for you. >>> >>> -Mike >>> >>> Zac Marsh wrote: >>> >>>> >>>> Hi, >>>> >>>> I was trying to conduct a XQuery full-text search within a CDATA field >>>> - via both Lucene and the built-in full-text configurations. >>>> Apparently it did not work. >>>> >>>> After searching the mail archive, I found the following thread: >>>> >>>> >>>> http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results >>>> >>>> Is it still the case? I understand CDATA was intended for comments. >>>> But in the XML files we have to consume, a lot of CDATA fields were >>>> used to convey useful information(needed to be escaped otherwise). I'd >>>> like to know any alternatives that does full-text XQuery within CDATA >>>> sections. >>>> >>>> Thank you >>>> Zac >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>>> is the only developer event you need to attend this year. Jumpstart your >>>> developing skills, take BlackBerry mobile applications to market and >>>> stay >>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>>> http://p.sf.net/sfu/devconference >>>> _______________________________________________ >>>> Exist-open mailing list >>>> Exi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/exist-open >>>> >>>> > |
From: Mike S. <so...@if...> - 2009-10-29 18:52:34
|
OK - now I'm stumped. That does sound like a bug: I would be really surprised if CDATA prevented the indexer from working properly, so I would first check all the dumb things: is there a namespace issue? Are you sure you re-indexed the content after changing the index settings? Etc,etc. Otherwise, if there is a problem as you describe, perhaps someone smarter than me can fix it! Also - I think this is different from the thread you referenced, in which the original poster seemed to be wanting to treat CDATA as some kind of element and find things in it (something like ft:query(//CDATA?,"words"))? Sorry not to be much help here... -Mike Zac Marsh wrote: > In <index> of collection.xconf > > We tried: > <fulltext default="none" attributes="no"> > <text qname="document"/> > </fulltext> > > As well as > > <lucene> > <text qname="document"/> > </lucene> > > Neither works. However if we put another field (which is not CDATA) in > the index, xquery will return the correct results. > > The query we used: > //document[ft:query(., '"economy"')] > > Thanks > > On Thu, Oct 29, 2009 at 2:34 PM, Mike Sokolov <so...@if...> wrote: > >> It looks as if you just need to query the document element; what's the query >> you're trying that's failing, and what indexes do you have defined? >> >> -Mike >> >> Zac Marsh wrote: >> >>> Hi Mike, >>> >>> Thanks for the response. >>> >>> The following is a snippet from the XML we are consuming. When using >>> XQuery to do a full-text search within the CDATA section, eXist does >>> not return results. >>> >>> <docset> >>> <document><![CDATA[ >>> Oct. 29 (Bloomberg) -- The U.S. economy returned to growth in the >>> third quarter after a yearlong contraction as government incentives >>> spurred consumers to spend more on homes and cars. >>> >>> The world’s largest economy expanded at a 3.5 percent pace from July >>> through September, figures from the Commerce Department showed today >>> in Washington. Household purchases climbed 3.4 percent, the most in >>> two years. >>> ]]> >>> </document> >>> <docTitle/> >>> </docset> >>> >>> On Thu, Oct 29, 2009 at 1:48 PM, Mike Sokolov <so...@if...> >>> wrote: >>> >>> >>>> I think the thread you referenced dealt with CDATA in a pretty definitive >>>> way: basically, indexes are created for elements and attributes: CDATA is >>>> neither - it's just a special kind of text, so to speak, so its contents >>>> are >>>> only indexed as part of any (indexed) element that contains it. Changing >>>> that would break other things, so it's not likely to change. >>>> >>>> It's possible that if you post some of your content and describe the >>>> intended query behaviour, that readers of the list would be able to >>>> propose >>>> a workable solution for you. >>>> >>>> -Mike >>>> >>>> Zac Marsh wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> I was trying to conduct a XQuery full-text search within a CDATA field >>>>> - via both Lucene and the built-in full-text configurations. >>>>> Apparently it did not work. >>>>> >>>>> After searching the mail archive, I found the following thread: >>>>> >>>>> >>>>> http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results >>>>> >>>>> Is it still the case? I understand CDATA was intended for comments. >>>>> But in the XML files we have to consume, a lot of CDATA fields were >>>>> used to convey useful information(needed to be escaped otherwise). I'd >>>>> like to know any alternatives that does full-text XQuery within CDATA >>>>> sections. >>>>> >>>>> Thank you >>>>> Zac >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>>>> is the only developer event you need to attend this year. Jumpstart your >>>>> developing skills, take BlackBerry mobile applications to market and >>>>> stay >>>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>>>> http://p.sf.net/sfu/devconference >>>>> _______________________________________________ >>>>> Exist-open mailing list >>>>> Exi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/exist-open >>>>> >>>>> >>>>> |
From: Wolfgang <wol...@ex...> - 2009-10-29 19:47:33
|
> Neither works. However if we put another field (which is not CDATA) in > the index, xquery will return the correct results. I just checked your example. It is indeed a bug. The CDATA is stored but the text is not passed on to the indexing pipeline. I'll fix this tonight. Wolfgang |
From: Dmitriy S. <sha...@gm...> - 2009-10-29 17:56:39
|
Use xsl to transform your xml. -- Cheers, Dmitriy Shabanov On Thu, 2009-10-29 at 13:48 -0400, Mike Sokolov wrote: > I think the thread you referenced dealt with CDATA in a pretty > definitive way: basically, indexes are created for elements and > attributes: CDATA is neither - it's just a special kind of text, so to > speak, so its contents are only indexed as part of any (indexed) element > that contains it. Changing that would break other things, so it's not > likely to change. > > It's possible that if you post some of your content and describe the > intended query behaviour, that readers of the list would be able to > propose a workable solution for you. > > -Mike > > Zac Marsh wrote: > > Hi, > > > > I was trying to conduct a XQuery full-text search within a CDATA field > > - via both Lucene and the built-in full-text configurations. > > Apparently it did not work. > > > > After searching the mail archive, I found the following thread: > > http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results > > > > Is it still the case? I understand CDATA was intended for comments. > > But in the XML files we have to consume, a lot of CDATA fields were > > used to convey useful information(needed to be escaped otherwise). I'd > > like to know any alternatives that does full-text XQuery within CDATA > > sections. > > > > Thank you > > Zac > > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > _______________________________________________ > > Exist-open mailing list > > Exi...@li... > > https://lists.sourceforge.net/lists/listinfo/exist-open > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open |
From: Vyacheslav S. <vya...@gm...> - 2009-10-29 18:01:12
|
why xslt? just should search in same way as without CDATA i guess Charles de Gaulle - "The better I get to know men, the more I find myself loving dogs." - http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html On Thu, Oct 29, 2009 at 8:56 PM, Dmitriy Shabanov <sha...@gm...> wrote: > Use xsl to transform your xml. > > -- > Cheers, > > Dmitriy Shabanov > > On Thu, 2009-10-29 at 13:48 -0400, Mike Sokolov wrote: >> I think the thread you referenced dealt with CDATA in a pretty >> definitive way: basically, indexes are created for elements and >> attributes: CDATA is neither - it's just a special kind of text, so to >> speak, so its contents are only indexed as part of any (indexed) element >> that contains it. Changing that would break other things, so it's not >> likely to change. >> >> It's possible that if you post some of your content and describe the >> intended query behaviour, that readers of the list would be able to >> propose a workable solution for you. >> >> -Mike >> >> Zac Marsh wrote: >> > Hi, >> > >> > I was trying to conduct a XQuery full-text search within a CDATA field >> > - via both Lucene and the built-in full-text configurations. >> > Apparently it did not work. >> > >> > After searching the mail archive, I found the following thread: >> > http://exist-open.markmail.org/search/?q=cdata+full-text#query:cdata%20full-text+page:1+mid:l5b2k2nmd7j5xpzq+state:results >> > >> > Is it still the case? I understand CDATA was intended for comments. >> > But in the XML files we have to consume, a lot of CDATA fields were >> > used to convey useful information(needed to be escaped otherwise). I'd >> > like to know any alternatives that does full-text XQuery within CDATA >> > sections. >> > >> > Thank you >> > Zac >> > >> > ------------------------------------------------------------------------------ >> > Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> > is the only developer event you need to attend this year. Jumpstart your >> > developing skills, take BlackBerry mobile applications to market and stay >> > ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> > http://p.sf.net/sfu/devconference >> > _______________________________________________ >> > Exist-open mailing list >> > Exi...@li... >> > https://lists.sourceforge.net/lists/listinfo/exist-open >> > >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> Exist-open mailing list >> Exi...@li... >> https://lists.sourceforge.net/lists/listinfo/exist-open > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |
From: Wolfgang <wol...@ex...> - 2009-10-29 22:03:38
|
Fixed: http://exist.svn.sourceforge.net/exist/?rev=10290&view=rev Wolfgang |
From: Zac M. <mar...@gm...> - 2009-10-30 21:37:05
|
Hi Wolfgang, Thanks. I downloaded the latest code from SVN trunk, and built it. It seems full-text search works in CDATA fields now. There is one problem though. If name spaces are involved, full-text search does not return any results. For example, in collection.xconf, I have <index xmlns:example="http://www.example.com/ns/"> <lucene> <text qname="example:document"/> </lucene> ====================== XQuery statement declare namespace example="http://www.example.com/ns/"; //example:document[ft:query(., '"keyword"')] does not return any results. Did I miss anything? Thank you Zac On Thu, Oct 29, 2009 at 6:03 PM, Wolfgang <wol...@ex...> wrote: > Fixed: > > http://exist.svn.sourceforge.net/exist/?rev=10290&view=rev > > Wolfgang > |
From: Wolfgang <wol...@ex...> - 2009-10-30 22:24:18
|
Hi Zac, I just checked again: the index does work properly with namespaces here (this is covered by the test suite anyway, so I would have been surprised). For example, I use the following config: <collection xmlns="http://exist-db.org/collection-config/1.0"> <index xmlns:atom="http://www.w3.org/2005/Atom" xmlns:html="http://www.w3.org/1999/xhtml"> <lucene> <text qname="atom:title"/> <text qname="html:div"/> </lucene> </index> </collection> I can't see what's wrong in your example though. Wolfgang |
From: Zac M. <mar...@gm...> - 2009-11-10 21:39:25
|
Hi Wolfgang, It seems the problem is not name space, it is something else (related to re-indexing). When I modify a file and save it back to the collection, full text search on CDATA works. However, if I re-index the collection, the full-text search on CDATA does not return any result. Could it be something related to re-indexing CDATA fields? I was using the latest SVN trunk. Thanks, zac On Fri, Oct 30, 2009 at 5:02 PM, Wolfgang <wol...@ex...> wrote: > Hi Zac, > > I just checked again: the index does work properly with namespaces here > (this is covered by the test suite anyway, so I would have been surprised). > For example, I use the following config: > > <collection xmlns="http://exist-db.org/collection-config/1.0"> > <index xmlns:atom="http://www.w3.org/2005/Atom" xmlns:html=" > http://www.w3.org/1999/xhtml"> > <lucene> > <text qname="atom:title"/> > <text qname="html:div"/> > </lucene> > </index> > </collection> > > I can't see what's wrong in your example though. > > Wolfgang > |