Ok.
I did not see Blowfish-cbc but I did see blowfish so I went with that.
It looked to run a bit faster this time but at some point I ran into the same exception:
74LV132PW_C.xml
Client did not understand instruction 19, body length = 1446064952.
Exception in thread "Thread-0" java.lang.NullPointerException
Is that amount of data (about 24k URIs) just too big to handle properly? I’m wondering if using an index would even help at all. But my use case is that I receive a full new dump each x days and I have 2 options:
- I remove the complete collection and do a full import again. But that also takes considerable time and meanwhile the users don’t see any data
- I do a diff between existing data (fetch all URIs) and the new files in the zip. I remove any files from Sedna that are no longer in the new dataset and update the other ones. The benefit of this approach is that it has the least user impact.
But fetching all URIs is now the issue I’m trying to solve ;-)
Robby
From: Robby Pelssers [mailto:Rob...@nx...]
Sent: Wednesday, January 16, 2013 12:47 PM
To: ???? ?????????
Cc: sed...@li...; Ivan Shcheklein
Subject: Re: [Sedna-discussion] poor performance for large collection
Thx for the quick reply… I will dive into immediately.
Robby
From: Олег Борисенко [mailto:al...@so...]
Sent: Wednesday, January 16, 2013 12:42 PM
To: Robby Pelssers
Cc: Charles Foster; sed...@li...<mailto:sed...@li...>; Ivan Shcheklein
Subject: Re: [Sedna-discussion] poor performance for large collection
This error makes a lot of sense but I don't think that it's related to the first letter you wrote.
1. About this error. I'm 100% sure that the reason for this error is data corruption while client communicates with Sedna. Our protocol doesn't support any checksum algorithms so if at least one message has been lost or corrupted, Sedna throws an exception and that's what you see. This may happen in the following cases:
* OS error while sending the message (very unlikely)
* Tranfer error. This may happen if communication line between client and Sedna is bad or some errors were added during transport through the tunnel (the most probable case).
Could you tell me more about your connection settings and tunnel? What kind of tunnel do you use? What cipher do you use?
1. About the first letter. If you use xmldb:api for communication with Sedna, it's very possible that your tunnel is the bottleneck. I do not remember what cipher is default in putty, but ssh-client for *nix systems use aes128-ctr algorithm and it's very slow. Here you can see the chart for ciphers benchmark: http://blog.famzah.net/2010/06/11/openssh-ciphers-performance-benchmark/
When I need to use Sedna via tunnel, I use blowfish-cbc because it seems to me the fastest and the most reliable the same time.
Try it please and check results one more time for your last letter
Best regards, Borisenko Oleg, Sedna team
On Wed, Jan 16, 2013 at 2:58 PM, Robby Pelssers <Rob...@nx...<mailto:Rob...@nx...>> wrote:
I decided to write a little unit test omitting any unnecessary code that might result in a performance penalty.
A little bit about my setup:
I use a putty tunnel to connect to a remote sedna instance.
Java Unit test:
package com.nxp.spider2.xmldb.profiling;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.xmldb.api.DatabaseManager;
import org.xmldb.api.base.*;
import org.xmldb.api.modules.XQueryService;
import java.util.Date;
public class ChemicalContentTest {
private String databaseURI = "xmldb:sedna://localhost/nxp";
private String databaseUsername = "SYSTEM";
private String databasePassword = "MANAGER";
@Before
public void setup() {
}
@After
public void tearDown() {
}
@Test
public void getChemicalContentIds() throws XMLDBException {
Date start = new Date();
registerXMLDBDriver();
Collection chemicalcontentCollection = DatabaseManager.getCollection(
databaseURI + "/chemicalContent/released", databaseUsername, databasePassword);
XQueryService queryService =
(XQueryService)chemicalcontentCollection.getService("XQueryService","1.0");
ResourceSet resourceSet =
queryService.query("for $doc in collection(\"chemicalContent/released\") return document-uri($doc)");
System.out.println("The results were as follows:");
System.out.println("----------------------------");
ResourceIterator iterator = resourceSet.getIterator();
while(iterator.hasMoreResources())
{
Resource resource = iterator.nextResource();
System.out.println(resource.getContent());
}
System.out.println("----------------------------");
Date end = new Date();
long duration = end.getTime() - start.getTime();
System.out.println("It took " + duration / 1000 + " seconds");
}
public void registerXMLDBDriver() throws XMLDBException
{
try
{
Database dbDriver =
(Database)Class.forName(
"net.cfoster.sedna.DatabaseImpl").newInstance();
DatabaseManager.registerDatabase(dbDriver);
}
catch(ClassNotFoundException e) {
System.err.println("ClassNotFoundException: "+
e.getMessage());
}
catch(InstantiationException e) {
System.err.println("InstantiationException: "+
e.getMessage());
}
catch(IllegalAccessException e) {
System.err.println("IllegalAccessException: "+
e.getMessage());
}
}
}
Output console:
----------------------------------------------------------
Sedna XML:DB API Client started, Version 1.2.5 26/Oct/12
Copyright (C) 2007 Charles Foster, www.cfoster.net<http://www.cfoster.net>.
----------------------------------------------------------
The results were as follows:
----------------------------
25860-14Z.xml
25864-19.xml
2PD601ASL_DG.xml
….
74LV14PW_C1.xml
Client did not understand instruction 23, body length = 1446065232.
Exception in thread "Thread-0" java.lang.NullPointerException
at net.cfoster.sedna.xmldb.o.a(Unknown Source)
at net.cfoster.sedna.xmldb.g.c(Unknown Source)
at net.cfoster.sedna.xmldb.g.e(Unknown Source)
at net.cfoster.sedna.xmldb.h.a(Unknown Source)
at net.cfoster.sedna.xmldb.b.run(Unknown Source)
at java.lang.Thread.run(Thread.java:662)
It kept running for about a minute at least until it ran into this nullpointer exception.
Cheers,
Robby
From: Robby Pelssers [mailto:Rob...@nx...<mailto:Rob...@nx...>]
Sent: Tuesday, January 15, 2013 6:07 PM
To: Charles Foster
Cc: sed...@li...<mailto:sed...@li...>; Ivan Shcheklein
Subject: Re: [Sedna-discussion] poor performance for large collection
Sorry…
My mistake… I really don’t know why I always mix those 2. For the particular test it looks like we use the sedna-xmldb-api. I played with XQJ and Apache Cocoon 3 in the past, hence my confusion.
Robby
From: Charles Foster [mailto:ch...@cf...]
Sent: Tuesday, January 15, 2013 5:54 PM
To: Robby Pelssers
Cc: Konstantin Abakumov; sed...@li...<mailto:sed...@li...>; Ivan Shcheklein
Subject: Re: [Sedna-discussion] poor performance for large collection
Do you have some sample XQJ API code that I could see Robby?
Regards,
Charles
On 15 Jan 2013, at 15:07, Robby Pelssers <Rob...@nx...<mailto:Rob...@nx...>> wrote:
Thx for the investigation. We indeed use the XQJ API. I guess I will need to check out the packaged Java API.
Robby
From: Konstantin Abakumov [mailto:rusabakumov@<mailto:rusabakumov@>gmail.com<http://gmail.com>]
Sent: Tuesday, January 15, 2013 3:16 PM
To: Robby Pelssers
Cc: sed...@li...<mailto:sed...@li...>; Ivan Shcheklein; Oleg Borisenko
Subject: Re: [Sedna-discussion] poor performance for large collection
Hi again!
I suggest that you are using Sedna XQJ API, is it right?
I've compared three drivers on your queries:
1. Java API
, packaged with Sedna,
2. Sedna XML:DB API
3. Sedna XQJ API
, both from Charles Foster
It took only several seconds to execute queries using 1 and 2, but XQJ API worked about half a minute on each query - seems it can be a bottleneck in your case.
2013/1/15 Robby Pelssers <Rob...@nx...<mailto:Rob...@nx...>>
Hi,
We use the driver from Charles Foster. But also a test using the Sedna Database Administrator results in temporarily freezing. It is indeed a remote host.
Robby
From: Konstantin Abakumov [mailto:rus...@gm...<mailto:rus...@gm...>]
Sent: Tuesday, January 15, 2013 1:49 AM
To: Robby Pelssers
Cc: ???? ?????????; sed...@li...<mailto:sed...@li...>
Subject: Re: [Sedna-discussion] poor performance for large collection
Hello again!
Sorry for the late reply. And thank you for sending us the data!
Both the queries you had profiled in previous letters executed fast on my machine:
for $i in index-scan("chemicalcontent_id", "", "GE")/@id return string($i)
and
for $doc in collection("chemicalContent/released") return document-uri($doc)
it took less than a second to execute each of them. I had executed queries on locally through built-in terminal se_term.
You noticed that:
Serializing and sending that data over the wire… takes like forever.. >> 1 minute.
Seems that slowdown can be achieved during connection to Sedna and sending query results. Is Sedna server located on remote host? What driver do you use?
--
I've tried some tests on generated data. As expected, the more schema of individual documents differs from each other, the more performance degradation is observed. But in case of similar or not very different documents (which is yours) Sedna performs satisfactory fast. Hope that we will resolve your performance issue.
Best regards,
Konstantin Abakumov
------------------------------------------------------------------------------
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
_______________________________________________
Sedna-discussion mailing list
Sed...@li...<mailto:Sed...@li...>
https://lists.sourceforge.net/lists/listinfo/sedna-discussion
--
С уважением,
Константин Абакумов.
------------------------------------------------------------------------------
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512_______________________________________________
Sedna-discussion mailing list
Sed...@li...<mailto:Sed...@li...>
https://lists.sourceforge.net/lists/listinfo/sedna-discussion
------------------------------------------------------------------------------
Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122612
_______________________________________________
Sedna-discussion mailing list
Sed...@li...<mailto:Sed...@li...>
https://lists.sourceforge.net/lists/listinfo/sedna-discussion
|