Sorry about that ;-)
I made the data available at https://dl.dropbox.com/u/31947664/data.zip
Each document has a root tag like below
<TypeName id="BAS16">
…
</TypeName>
My goal is to retrieve either the document-uri’s (e.g. BAS16.xml) as fast as possible for the complete collection but a workaround using an index on @id is also possible if that is faster. In that case I can append the .xml extension myself. For me it takes more than 1 minute to retrieve the results. I was expecting this could be done faster.
Thx in advance for looking into this.
Kind regards,
Robby
From: Олег Борисенко [mailto:al...@so...]
Sent: Monday, January 14, 2013 3:40 PM
To: Robby Pelssers
Cc: sed...@li...
Subject: Re: [Sedna-discussion] poor performance for large collection
Yes, it would be great if you give us the collection, it's much easier to debug with real data :) And we will try to discover the reason as soon as possible.
P.S: Don't forget to send a copy to sed...@li...<mailto:sed...@li...> (it's public) or se...@is...<mailto:se...@is...> (it's our private mail) please, it's much easier when all the team sees what's going on. Thanks!
Best regards, Borisenko Oleg, Sedna team
On Mon, Jan 14, 2013 at 6:31 PM, Robby Pelssers <Rob...@nx...<mailto:Rob...@nx...>> wrote:
Hi Oleg,
I ran following profile statements:
**************************************************************************************************
profile
for $i in index-scan("chemicalcontent_id", "", "GE")/@id return string($i)
which results in
<profile xmlns="http://www.modis.ispras.ru/sedna">
<total-time>14.941</total-time>
</profile>
<prolog xmlns="http://www.modis.ispras.ru/sedna"/>
<query xmlns="http://www.modis.ispras.ru/sedna">
<operation name="PPQueryRoot" time="14.941" calls="1">
<operation name="PPReturn" position="2:5" time="14.910" calls="24250">
<produces>
<variable descriptor="0" name="i"/>
</produces>
<operation name="PPDDO" position="2:54" time="14.678" calls="24250">
<operation name="PPAxisStep" step="attribute::attribute(id)" position="2:54" time="14.486" calls="24250">
<operation name="PPSeqChecker" mode="node" position="2:54" time="1.043" calls="24250">
<operation name="PPIndexScan" index-scan-condition="GE" position="2:11" time="1.040" calls="24250">
<operation name="PPConst" type="xs:string" value="chemicalcontent_id" position="2:22" time="0.000" calls="2"/>
<operation name="PPConst" type="xs:string" value="" position="2:44" time="0.000" calls="2"/>
<operation name="PPConst" type="xs:integer" value="0" position="2:11" time="0.000" calls="0"/>
</operation>
</operation>
</operation>
</operation>
<operation name="PPFnString" position="2:65" time="0.215" calls="48498">
<operation name="PPVariable" descriptor="0" variable-name="i" position="2:72" time="0.008" calls="48498"/>
</operation>
</operation>
</operation>
</query>
**************************************************************************************************
I also ran following
profile
for $doc in collection("chemicalContent/released") return document-uri($doc)
resulting in
<profile xmlns="http://www.modis.ispras.ru/sedna">
<total-time>1.594</total-time>
</profile>
<prolog xmlns="http://www.modis.ispras.ru/sedna"/>
<query xmlns="http://www.modis.ispras.ru/sedna">
<operation name="PPQueryRoot" time="1.594" calls="1">
<operation name="PPReturn" position="2:5" time="1.568" calls="24250">
<produces>
<variable descriptor="0" name="doc"/>
</produces>
<operation name="PPAbsPath" root="collection(chemicalContent/released)" position="2:13" time="1.247" calls="24250"/>
<operation name="PPFnDocumentURI" position="2:59" time="0.302" calls="48498">
<operation name="PPVariable" descriptor="0" variable-name="doc" position="2:72" time="0.012" calls="48498"/>
</operation>
</operation>
</operation>
</query>
BUT !!!
Serializing and sending that data over the wire… takes like forever.. >> 1 minute. I know it’s like 24k strings in total but it still smells fishy to me to be honest.
Thx upfront… If you want I can actually zip the collection and make it available via dropbox so you can e.g. simulate my issue
Robby
From: Олег Борисенко [mailto:al...@so...<mailto:al...@so...>]
Sent: Monday, January 14, 2013 2:16 PM
To: Robby Pelssers
Subject: Re: [Sedna-discussion] poor performance for large collection
It's difficult to say anything particular in that case. But we have one more diagnostic query named "profile", look through documentation here: http://www.sedna.org/progguide/ProgGuidesu10.html#x16-650002.7.4.
Could you send us the output please?
Best regards, Borisenko Oleg, Sedna team
On Fri, Jan 11, 2013 at 1:58 PM, Robby Pelssers <Rob...@nx...<mailto:Rob...@nx...>> wrote:
Someone on the list gave me following tip:
*********************************************************
for $i in index-scan("chemicalcontent_id", "", "GE")/@id return string($i) - as a result, you will get the keyset of an index, maybe with duplicate keys (if they are presented in index), which can be removed with distinct-values function.
Here, the blank key to compare with ( "" ) assumed to be less than any other key in index.
*********************************************************
But I tried that and it still is not responsive. I think Sedna is not using the index only but still doing a full collection scan. Can someone shed some light on this?
I was also looking a bit into the documentation for how to run an explain plan. But to my surprise I don't see anything back of index usage. It's something I would typically expect from an explain plan.
http://www.sedna.org/progguide/ProgGuidesu10.html#x16-640002.7.3
To give an example, I have following index and xquery module
*******************************************************************
create index "package_id"
on fn:collection("packages/released")/Package
by @identifier
as xs:string
*******************************************************************
module namespace packages = "http://www.nxp.com/packages";
declare function packages:getPackage($id as xs:string) as element(Package)? {
index-scan('package_id', $id, 'EQ')
};
*******************************************************************
Now I tried to explain a method invocation that uses a index:
*******************************************************************
explain
import module namespace packages = "http://www.nxp.com/packages";
packages:getPackage("SOT669")
*******************************************************************
It shows me following explanation, but no evidence of an index being used.
*******************************************************************
<prolog xmlns="http://www.modis.ispras.ru/sedna"/>
<query xmlns="http://www.modis.ispras.ru/sedna">
<operation name="PPQueryRoot">
<operation name="PPFunCall" id="0" function-name="packages:getPackage" position="3:1">
<operation name="PPConst" type="xs:string" value="SOT669" position="3:21"/>
</operation>
</operation>
</query>
*******************************************************************
So any tips on debugging my performance problem are very welcome !!
Thx in advance,
Robby
-----Original Message-----
From: Robby Pelssers [mailto:rob...@nx...<mailto:rob...@nx...>]
Sent: Thursday, January 10, 2013 12:09 PM
To: sed...@li...<mailto:sed...@li...>
Subject: [Sedna-discussion] poor performance for large collection
Hi all,
I have a single collection of 24468 documents.
<count>{count(collection("chemicalContent/released")/TypeName)}</count> == 24468
When I just try to run below statement it takes very long to execute. The documents themselves are not even that big varying between 2kb and 12kb.
for $i in collection("chemicalContent/released") return document-uri($i)
I also have a index on that collection:
create index "chemicalcontent_id"
on fn:collection("chemicalContent/released")/TypeName
by @id
as xs:string
Is it normal for that statement to execute that long (> 1 minute) ?
Is there a way to perhaps speed up fetching a list of all @id's for that particular collection?
Thx in advance,
Robby
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET<http://ASP.NET>, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Sedna-discussion mailing list
Sed...@li...<mailto:Sed...@li...>
https://lists.sourceforge.net/lists/listinfo/sedna-discussion
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET<http://ASP.NET>, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Sedna-discussion mailing list
Sed...@li...<mailto:Sed...@li...>
https://lists.sourceforge.net/lists/listinfo/sedna-discussion
|