You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(31) |
Nov
(25) |
Dec
(33) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(48) |
Feb
(62) |
Mar
(22) |
Apr
(29) |
May
(9) |
Jun
(45) |
Jul
(28) |
Aug
(41) |
Sep
(60) |
Oct
(96) |
Nov
(99) |
Dec
(70) |
2003 |
Jan
(98) |
Feb
(159) |
Mar
(164) |
Apr
(150) |
May
(143) |
Jun
(97) |
Jul
(184) |
Aug
(143) |
Sep
(207) |
Oct
(126) |
Nov
(159) |
Dec
(165) |
2004 |
Jan
(131) |
Feb
(229) |
Mar
(220) |
Apr
(212) |
May
(320) |
Jun
(223) |
Jul
(191) |
Aug
(390) |
Sep
(261) |
Oct
(229) |
Nov
(215) |
Dec
(184) |
2005 |
Jan
(221) |
Feb
(312) |
Mar
(336) |
Apr
(273) |
May
(359) |
Jun
(277) |
Jul
(303) |
Aug
(321) |
Sep
(256) |
Oct
(415) |
Nov
(428) |
Dec
(508) |
2006 |
Jan
(585) |
Feb
(419) |
Mar
(496) |
Apr
(296) |
May
(403) |
Jun
(404) |
Jul
(553) |
Aug
(296) |
Sep
(252) |
Oct
(416) |
Nov
(414) |
Dec
(245) |
2007 |
Jan
(354) |
Feb
(422) |
Mar
(389) |
Apr
(298) |
May
(397) |
Jun
(318) |
Jul
(315) |
Aug
(339) |
Sep
(253) |
Oct
(317) |
Nov
(350) |
Dec
(264) |
2008 |
Jan
(353) |
Feb
(313) |
Mar
(433) |
Apr
(383) |
May
(343) |
Jun
(355) |
Jul
(321) |
Aug
(338) |
Sep
(242) |
Oct
(206) |
Nov
(199) |
Dec
(279) |
2009 |
Jan
(327) |
Feb
(221) |
Mar
(280) |
Apr
(278) |
May
(237) |
Jun
(345) |
Jul
(322) |
Aug
(324) |
Sep
(676) |
Oct
(586) |
Nov
(735) |
Dec
(329) |
2010 |
Jan
(619) |
Feb
(424) |
Mar
(529) |
Apr
(241) |
May
(312) |
Jun
(554) |
Jul
(698) |
Aug
(576) |
Sep
(408) |
Oct
(268) |
Nov
(391) |
Dec
(426) |
2011 |
Jan
(629) |
Feb
(512) |
Mar
(465) |
Apr
(467) |
May
(475) |
Jun
(403) |
Jul
(426) |
Aug
(542) |
Sep
(418) |
Oct
(620) |
Nov
(614) |
Dec
(358) |
2012 |
Jan
(357) |
Feb
(466) |
Mar
(344) |
Apr
(215) |
May
(408) |
Jun
(375) |
Jul
(241) |
Aug
(260) |
Sep
(401) |
Oct
(461) |
Nov
(498) |
Dec
(294) |
2013 |
Jan
(453) |
Feb
(447) |
Mar
(434) |
Apr
(326) |
May
(295) |
Jun
(471) |
Jul
(463) |
Aug
(278) |
Sep
(525) |
Oct
(343) |
Nov
(389) |
Dec
(405) |
2014 |
Jan
(564) |
Feb
(324) |
Mar
(319) |
Apr
(319) |
May
(384) |
Jun
(259) |
Jul
(210) |
Aug
(219) |
Sep
(315) |
Oct
(478) |
Nov
(207) |
Dec
(316) |
2015 |
Jan
(222) |
Feb
(234) |
Mar
(201) |
Apr
(145) |
May
(367) |
Jun
(318) |
Jul
(195) |
Aug
(210) |
Sep
(234) |
Oct
(248) |
Nov
(217) |
Dec
(189) |
2016 |
Jan
(219) |
Feb
(177) |
Mar
(110) |
Apr
(91) |
May
(159) |
Jun
(124) |
Jul
(192) |
Aug
(119) |
Sep
(125) |
Oct
(64) |
Nov
(80) |
Dec
(68) |
2017 |
Jan
(156) |
Feb
(312) |
Mar
(386) |
Apr
(217) |
May
(89) |
Jun
(115) |
Jul
(79) |
Aug
(122) |
Sep
(100) |
Oct
(99) |
Nov
(129) |
Dec
(77) |
2018 |
Jan
(106) |
Feb
(78) |
Mar
(160) |
Apr
(73) |
May
(110) |
Jun
(160) |
Jul
(93) |
Aug
(92) |
Sep
(75) |
Oct
(147) |
Nov
(114) |
Dec
(97) |
2019 |
Jan
(141) |
Feb
(78) |
Mar
(158) |
Apr
(60) |
May
(123) |
Jun
(54) |
Jul
(44) |
Aug
(147) |
Sep
(117) |
Oct
(54) |
Nov
(74) |
Dec
(96) |
2020 |
Jan
(113) |
Feb
(125) |
Mar
(142) |
Apr
(57) |
May
(71) |
Jun
(99) |
Jul
(58) |
Aug
(81) |
Sep
(49) |
Oct
(50) |
Nov
(63) |
Dec
(37) |
2021 |
Jan
(37) |
Feb
(45) |
Mar
(39) |
Apr
(18) |
May
(14) |
Jun
(9) |
Jul
(44) |
Aug
(23) |
Sep
(13) |
Oct
(31) |
Nov
(13) |
Dec
(33) |
2022 |
Jan
(17) |
Feb
(8) |
Mar
(32) |
Apr
(7) |
May
(17) |
Jun
(7) |
Jul
(36) |
Aug
(29) |
Sep
(9) |
Oct
(20) |
Nov
(10) |
Dec
(1) |
2023 |
Jan
(30) |
Feb
(37) |
Mar
(23) |
Apr
(1) |
May
(14) |
Jun
(5) |
Jul
(3) |
Aug
(6) |
Sep
(5) |
Oct
(48) |
Nov
(4) |
Dec
(29) |
2024 |
Jan
(1) |
Feb
|
Mar
(21) |
Apr
(6) |
May
(16) |
Jun
(41) |
Jul
(11) |
Aug
(17) |
Sep
(16) |
Oct
(11) |
Nov
(3) |
Dec
(9) |
2025 |
Jan
(7) |
Feb
(7) |
Mar
(6) |
Apr
(6) |
May
(30) |
Jun
(8) |
Jul
(10) |
Aug
(4) |
Sep
(10) |
Oct
|
Nov
|
Dec
|
From: Michael P. <mic...@gm...> - 2024-10-31 07:44:01
|
Hi Tino, since 6.3 is not binary compatible with 4.1 (see https://exist-db.org/exist/apps/doc/incompatibilities), you need to follow https://exist-db.org/exist/apps/doc/upgrading.xml#non-binary-compatible-upgrades . (But please be aware that I'm neither an eXist-db core developer nor an active community member.) All best, Michael Am Mi., 30. Okt. 2024 um 18:06 Uhr schrieb Dai, Tino W <td...@lo...>: > Hi, > > > > I have a web app that I have been asked to bring up to 6.3.0. > Currently, the web app is running on 4.1 and is a set of files directories > located in /usr/local/eXist/webapp. Can I just migrate this over to 6.3.0 > as is or do I need to package it up into a .xar file? > > > > Thanks, > > Tino > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |
From: Adam R. <ad...@ex...> - 2024-10-30 21:04:18
|
You don't need to package it into a car, although that wouldn't hurt. However in more modern versions of eXist-db, all content, including XQuery files, need to get stored inside the database itself On Wed, 30 Oct 2024, 18:07 Dai, Tino W, <td...@lo...> wrote: > Hi, > > > > I have a web app that I have been asked to bring up to 6.3.0. > Currently, the web app is running on 4.1 and is a set of files directories > located in /usr/local/eXist/webapp. Can I just migrate this over to 6.3.0 > as is or do I need to package it up into a .xar file? > > > > Thanks, > > Tino > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |
From: Dai, T. W <td...@lo...> - 2024-10-30 17:10:59
|
Hi, It seems that the documentation is wrong: https://exist-db.org/exist/apps/doc/development-starter. There is no Application/New Application drop down in the eXide application. It seems that the New Application was removed back in 3.x according to this post https://stackoverflow.com/questions/70184724/create-new-web-app-in-exist-db-5-3-0-and-import-old-from-4-3-1 Is there a way to remove that section from the site? -Tino |
From: Dai, T. W <td...@lo...> - 2024-10-30 17:00:56
|
Hi, I have a web app that I have been asked to bring up to 6.3.0. Currently, the web app is running on 4.1 and is a set of files directories located in /usr/local/eXist/webapp. Can I just migrate this over to 6.3.0 as is or do I need to package it up into a .xar file? Thanks, Tino |
From: Duncan P. <dun...@gm...> - 2024-10-30 15:29:32
|
In an abstract for his presentation at the upcoming Declarative Amsterdam conference, Adam Retter announced his decision to fork eXist-db and launch a competing project. This action is commonplace in open source. However, in justifying his decision, Adam maligned the eXist community and project - in a strikingly unprofessional manner. On October 7 and 14, 2024 during the weekly Community Call, eXist-db’s core developers and active community members discussed this matter and unanimously concluded that we no longer have the necessary trust in Adam to speak or act in the best interests of the eXist community. We have therefore revoked his status as a core developer and his administrative privileges from the eXist-db GitHub organization and related community resources. We regret having to take these measures. But we believe that they are vital to restoring the collegial and welcoming atmosphere that marked the eXist-db community since its founding by Wolfgang Meier in 2001. We thank Adam for all of his work on eXist-db and wish him well with his new project. == Signatories == Dannes Wessels, Duncan Paterson, Joseph Wicentowski, Joern Turner, Juri Leino, Lars Windauer, Magdalena Turska, Olaf Schreck, Patrick Reinhart, Wolfgang Meier, (founder eXist-db) |
From: Adam R. <ad...@ex...> - 2024-10-28 11:43:39
|
Hi Nick, Yes, sorry about that. The eXist-db blog has been broken for what feels like forever, which means things get scrambled when new articles are created. I am afraid that I have no control over that. I think I saw in Slack that Joe Wickentowski kindly offered to fix that. Thanks, Adam. On Sun, 27 Oct 2024 at 01:01, Nick Sincaglia <nsi...@nu...> wrote: > I just wanted to make you aware that the eXist-db 6.2.0 releases notes > do not appear to be published yet. > > https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb630 > > Nick > > On 10/26/24 9:08 AM, Craig Berry via Exist-open wrote: > >> On Oct 26, 2024, at 6:21 AM, Adam Retter <ad...@ex...> wrote: > >> > >> We are very happy to announce releases of eXist-db 4, 5, and 6. > >> > >> eXist-db 4.11.2 and 5.5.2 are minor releases that fix just a few small > bugs - see: > >> * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb4112 > >> * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb552 > >> > >> eXist-db 6.3.0 is a feature release and update, and as such should be > 100% API and storage compatible with eXist-db 6.2.0. The full release notes > are available at: > https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb630 > > Thanks to Adam and everyone else involved for all of those fixes and > improvements. > > > > For anyone on macOS Sequoia (like I am since yesterday), it's now a bit > harder to open an application that isn't notarized. You have to attempt to > open the app once, dismiss the dialog, go into Privacy and Security in > settings, scroll down to where it says '"eXist-db" was blocked to protect > your Mac' and click "Open Anyway." You can no longer right-click or > control-click to get around lack of notarization. > > > > Note that the app *is* signed, just apparently not notarized by Apple. > You can check the signature with: > > > > % codesign -dv --verbose=4 /Applications/eXist-db.app > > > > ________________________________________ > > Craig A. Berry > > > > "... getting out of a sonnet is much more > > difficult than getting in." > > Brad Leithauser > > > > > > > > _______________________________________________ > > Exist-open mailing list > > Exi...@li... > > https://lists.sourceforge.net/lists/listinfo/exist-open > > -- > Nick Sincaglia > President/Founder > NueMeta, LLC > Digital Media & Technology > Phone: +1-630-303-7035 > nsi...@nu... > http://www.nuemeta.com > Skype: nsincaglia > > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > -- Adam Retter eXist Core Developer { United Kingdom } ad...@ex... |
From: Nick S. <nsi...@nu...> - 2024-10-26 23:01:19
|
I just wanted to make you aware that the eXist-db 6.2.0 releases notes do not appear to be published yet. https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb630 Nick On 10/26/24 9:08 AM, Craig Berry via Exist-open wrote: >> On Oct 26, 2024, at 6:21 AM, Adam Retter <ad...@ex...> wrote: >> >> We are very happy to announce releases of eXist-db 4, 5, and 6. >> >> eXist-db 4.11.2 and 5.5.2 are minor releases that fix just a few small bugs - see: >> * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb4112 >> * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb552 >> >> eXist-db 6.3.0 is a feature release and update, and as such should be 100% API and storage compatible with eXist-db 6.2.0. The full release notes are available at: https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb630 > Thanks to Adam and everyone else involved for all of those fixes and improvements. > > For anyone on macOS Sequoia (like I am since yesterday), it's now a bit harder to open an application that isn't notarized. You have to attempt to open the app once, dismiss the dialog, go into Privacy and Security in settings, scroll down to where it says '"eXist-db" was blocked to protect your Mac' and click "Open Anyway." You can no longer right-click or control-click to get around lack of notarization. > > Note that the app *is* signed, just apparently not notarized by Apple. You can check the signature with: > > % codesign -dv --verbose=4 /Applications/eXist-db.app > > ________________________________________ > Craig A. Berry > > "... getting out of a sonnet is much more > difficult than getting in." > Brad Leithauser > > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open -- Nick Sincaglia President/Founder NueMeta, LLC Digital Media & Technology Phone: +1-630-303-7035 nsi...@nu... http://www.nuemeta.com Skype: nsincaglia |
From: Craig B. <cra...@ma...> - 2024-10-26 14:28:21
|
> On Oct 26, 2024, at 6:21 AM, Adam Retter <ad...@ex...> wrote: > > We are very happy to announce releases of eXist-db 4, 5, and 6. > > eXist-db 4.11.2 and 5.5.2 are minor releases that fix just a few small bugs - see: > * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb4112 > * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb552 > > eXist-db 6.3.0 is a feature release and update, and as such should be 100% API and storage compatible with eXist-db 6.2.0. The full release notes are available at: https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb630 Thanks to Adam and everyone else involved for all of those fixes and improvements. For anyone on macOS Sequoia (like I am since yesterday), it's now a bit harder to open an application that isn't notarized. You have to attempt to open the app once, dismiss the dialog, go into Privacy and Security in settings, scroll down to where it says '"eXist-db" was blocked to protect your Mac' and click "Open Anyway." You can no longer right-click or control-click to get around lack of notarization. Note that the app *is* signed, just apparently not notarized by Apple. You can check the signature with: % codesign -dv --verbose=4 /Applications/eXist-db.app ________________________________________ Craig A. Berry "... getting out of a sonnet is much more difficult than getting in." Brad Leithauser |
From: Adam R. <ad...@ex...> - 2024-10-26 11:46:58
|
We are very happy to announce releases of eXist-db 4, 5, and 6. eXist-db 4.11.2 and 5.5.2 are minor releases that fix just a few small bugs - see: * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb4112 * https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb552 eXist-db 6.3.0 is a feature release and update, and as such should be 100% API and storage compatible with eXist-db 6.2.0. The full release notes are available at: https://exist-db.org/exist/apps/wiki/blogs/eXist/eXistdb630 *Features* - Add mail:get-mail-session#2 with authentication https://github.com/eXist-db/exist/pull/4801 - Improve JMX output https://github.com/eXist-db/exist/pull/4964 - Add support in `cache:create()` for 'expireAfterWrite' in `$config` https://github.com/eXist-db/exist/pull/4975 - Allow the user to override the JDWP Suspend at Docker build time https://github.com/eXist-db/exist/pull/5008 - Parameterise the XML-RPC parse method with the Media Type https://github.com/eXist-db/exist/pull/5070 - New exist:time XQuery Pragma https://github.com/eXist-db/exist/pull/5077 - Optimise Path Expressions that are visited by the BasicExpressionVisitor https://github.com/eXist-db/exist/pull/5083 - Refactor to highlight materialization query execution model https://github.com/eXist-db/exist/pull/5080 - Add Schema for EXPath Packaging System https://github.com/eXist-db/exist/pull/5113 - Backport build scripts https://github.com/eXist-db/exist/pull/5091 - Adds ContentFilePool and companion tests https://github.com/eXist-db/exist/pull/5217 - Adds in memory cache for RPC query results https://github.com/eXist-db/exist/pull/5191 - Switch CI from Temurin to Liberica https://github.com/eXist-db/exist/pull/5466 - Allow setting the namespace on the result of sql:execute https://github.com/eXist-db/exist/pull/5186 *Bugfixes* - Fix an issue where XQuery Trigger state may leak https://github.com/eXist-db/exist/pull/5481 - XQSuite assertXPath annotation: Support default element namespace https://github.com/eXist-db/exist/pull/4818 - add missing dependency to copy-maven-plugin https://github.com/eXist-db/exist/pull/4858 - tails of subsequences off by one https://github.com/eXist-db/exist/pull/4851 - Ensure that EXPath packages installed in $EXIST_HOME/data/expathrepo are filesystem portable https://github.com/eXist-db/exist/pull/4913 - Fix version number in `develop-6.x.x` branch https://github.com/eXist-db/exist/pull/4961 - Fix missing fn:transform global parameters https://github.com/eXist-db/exist/pull/4893 - Restore compatibility with Java language level 8 https://github.com/eXist-db/exist/pull/4974 - InputStream#available() should not be used to determine if there is data available https://github.com/eXist-db/exist/pull/4978 - Fixes to fn:replace, fn:tokenize, and fn:analyze-string https://github.com/eXist-db/exist/pull/4865 - cardinality of CastExpression#toFunction https://github.com/eXist-db/exist/pull/4976 - Correct function signatures that return empty sequences https://github.com/eXist-db/exist/pull/4981 - Log exception if the response is already committed https://github.com/eXist-db/exist/pull/4993 - Correct the XDM type for the Sequence holding a representation of a Java Stack Trace when an error occurs https://github.com/eXist-db/exist/pull/4994 - Fix an issue with XQuery transient imports within EXPath Package https://github.com/eXist-db/exist/pull/5012 - Fix the use of Debian Stretch repositories in the Docker builds https://github.com/eXist-db/exist/pull/5005 - Fix a bug in Node Path equality https://github.com/eXist-db/exist/pull/5046 - Fix an NPE in exist:time pragma https://github.com/eXist-db/exist/pull/5081 - Handle IllegalStateException in file:directory-list#2 https://github.com/eXist-db/exist/pull/5093 - Named Function References can have Postfix Expressions https://github.com/eXist-db/exist/pull/5115 - Remove sonar no longer supporting Java 8 https://github.com/eXist-db/exist/pull/5218 - Repair 6.x.x build for macOs https://github.com/eXist-db/exist/pull/5385 - Address vulnerabilities as indicated by OWASP/NVD. https://github.com/eXist-db/exist/pull/5387 - Fix flaky storage test timing https://github.com/eXist-db/exist/pull/5492 - Fix CI Windows stream corruption https://github.com/eXist-db/exist/pull/5513 - Fix errors on update operations https://github.com/eXist-db/exist/pull/5296 *Updated Libraries* - Update to appbundler-1.2.0 https://github.com/eXist-db/exist/pull/4781 - Use newer copy-maven-plugin which does not have CVE issues with its dependencies https://github.com/eXist-db/exist/pull/4960 - Bump org.codehaus.izpack:izpack-maven-plugin from 5.1.3 to 5.2.0 https://github.com/eXist-db/exist/pull/5029 - Bump jetty from 9.4.50.v20221201 to 9.4.54.v20240208 https://github.com/eXist-db/exist/pull/5369 - Bump jackson-core from 2.13.4 to 2.15.2 https://github.com/eXist-db/exist/pull/5367 - Upgrade all build related maven plugins https://github.com/eXist-db/exist/pull/5386 - Make sure the latest jdom(1) jar is used https://github.com/eXist-db/exist/pull/5394 *Backwards Compatibility* eXist-db 6.3.0 is binary compatible with previous eXist-db 6.x.x versions Regardless, before upgrading to this version of eXist-db, it is strongly recommended to perform a full backup and restore. Users who are upgrading should always consult the Upgrading Guide https://exist-db.org/ exist/apps/doc/upgrading.xml in the documentation. Thanks, Adam. -- Adam Retter eXist Core Developer { United Kingdom } ad...@ex... |
From: Erik S. <er...@xa...> - 2024-10-23 12:23:44
|
Reminder: In two weeks, November 7 and 8, the Declarative Amsterdam 2024 conference will be held. This year's <https://declarative.amsterdam/program> program contains a variety of technical and non-technical topics, such as: Affix grammars, Answer Set Programming, banking, custom elements, ixml, JSONiq, linked data, music, Petal, RumbleDB, StratML, Syntax highlighting, TEI, XForms, XProc, XSLT, XJSLT, XSL-FO. Some names: presentations and tutorials by Bahloul, von Criegern & Retter; Boot; Buzatu & Fourny; Ellensburg; Firsov, Graham; Groeneveld; Hetzner; Holman; Loughlin & Fourny; Meertens; Pemberton; Sanchez Rodriguez; Siegel; Verwer & Lamers. Register now at: <https://declarative.amsterdam/registration> https://declarative.amsterdam/registration. We hope to see you there, either in person or virtually! Best wishes, The Declarative Amsterdam Conference Committee. |
From: <pra...@gm...> - 2024-10-01 14:54:45
|
Dear community, can you help? > > eXistDB 6.2.0 > > Saving to a locally running database via the remote interface is several orders of magnitude slower compared to using an embedded database. I can't figure out where the problem lies, or how to store the data more efficiently. > > I need to store about 100,000 records. When saving via the remote interface, the speed is around 100 records per second. However, when using the embedded database, the speed is about 9,000 records per second. > > What am I doing wrong, or how can I improve this? > > Below is a code example. > > > > remote uri: xmldb:exist://localhost:8080/exist/xmlrpc > embeded uri: xmldb:exist:// > > In both cases the db is running on the same machine. db configuration file is attached. > > private void createRandomTestData(String prefix, int records) throws Exception { > final Random random = new Random(); > final HashMap<String, String> data = new HashMap<>(); > > for (int i = 0; i < records; i++) { > double latitude = MIN_LAT + (MAX_LAT - MIN_LAT) * random.nextDouble(); > double longitude = MIN_LON + (MAX_LON - MIN_LON) * random.nextDouble(); > > String randomId = UUID.randomUUID().toString(); > String randomName = generateRandomString(random, 30); > String randomValue = generateRandomString(random, 10); > > String resourceId = prefix + "/" + randomId; > String xml = String.format( > """ > <entity> > <name>%s</name> > <value>%s</value> > <pos>%.8f %.8f</pos> > </entity>""", > randomName, randomValue, latitude, longitude > ); > data.put(resourceId, xml); > } > xmlDbService.saveEntity(data); > } > > public void saveEntity(@NonNull HashMap<String, String> resources) throws Exception { > var sortedResources = new TreeMap<>(resources); > > Collection col = null; > String currentCollectionUri = null; > long start = System.currentTimeMillis(); > > int i = 0; > int c = 0; > try { > for (var entry : sortedResources.entrySet()) { > String resourceId = entry.getKey(); > String xmlData = entry.getValue(); > > var param = splitResourceId(resourceId); > String collectionUri = param[0]; > String resourceName = param[1]; > > if (col == null || !collectionUri.equals(currentCollectionUri)) { > if (col != null) { > col.close(); > } > col = getOrCreateCollection(collectionUri); > col.setProperty("indent", "no"); > currentCollectionUri = collectionUri; > } > > i++; > if (i % 1000 == 0) { > long executionTime = System.currentTimeMillis() - start; > log.info("Inserted: {}, rate {} / sec", i, c * 1000L / executionTime); > start = System.currentTimeMillis(); > c = 0; > } > XMLResource res = (XMLResource) col.createResource(resourceName, XMLResource.RESOURCE_TYPE); > res.setContent(xmlData); > col.storeResource(res); > c++; > log.trace("Resource saved: {} in collection: {}", resourceName, currentCollectionUri); > } > } finally { > if (col != null) { > col.close(); > } > } > } > > <dependency> > <groupId>org.exist-db</groupId> > <artifactId>exist-core</artifactId> > <version>6.2.0</version> > </dependency> > > <dependency> > <groupId>net.sf.xmldb-org</groupId> > <artifactId>xmldb-api</artifactId> > <version>1.7.0</version> > </dependency> > Thank you for your help and/or advice. > > V. > |
From: Benoit M. <ben...@us...> - 2024-09-30 15:52:46
|
I had the exact same problem few weeks ago with the char 😊 (U+1F60A <https://www.compart.com/fr/unicode/U+1F60A>) in a TEI document. I retested 10 minutes ago. Just creating a file with the following content from OxygenXML, performing an XQuery on//badChar and deleting the file (from Oxygen again) make my eXist server crashed. But few weeks ago the problem appeared while running Xqueries and/or reindexing, cannot remember, sorry. <debug> <badChar>😊</badChar> </debug> [eXist Version : 6.2.0] [eXist Build : 2023-02-04T22:42:29Z] Hope this will help, Benoit Le 2024-09-30 à 09 h 55, Joe Wicentowski a écrit : > Hi Jannik, > > I see from the screenshot that you're using version 6.2.0. > > When the problem occurs, do you see any errors in exist.log? > > If you isolate the character(s) in question, could you open an issue? > (Or, feel free to email me the file, and I can take a look.) > > Joe > > On Sun, Sep 29, 2024 at 4:58 PM Jean-Paul Rehr <re...@gm...> wrote: > > Dear Jannik, > > I ran into a similar issue some time ago, and it was due to a > problem with a hidden character inserted into the file. This may > or may not be your case, but if you have a backup version of the > file from when it worked fine, you could transform it into code > points, and do the same with the current file, and see if they are > truly identical. > > Hope this helps, > JPR > > On Sun, Sep 29, 2024 at 10:29 PM Jannik Franz <fr...@md...> wrote: > > > Dear eXist-db community, > > in our project we are using eXist-db for over two years now > and since a few weeks we are having a problem uploading a > specific XML-file (via eXide's upload interface). > As soon as the file is uploaded, the database crashes and the > server has to be restarted. > After the restart there remains an empty file in the database > as a result of the interrupted upload process. > > The problem occurs only with this single file. It is a simple > TEI-XML document, 68KB small and not special in any way (and > it didn't cause any problems for the last 2 years). > > Attached is a screenshot of the Monex report after uploading > the file. > > Has anyone had a similar problem or has any advice how to deal > with it? > > Many thanks in advance and best regards > Jannik Franz > > > Jannik Franz > Universität für Musik und darstellende Kunst Wien > Institut für Musikwissenschaft und Interpretationsforschung > Abteilung Wissenschaftszentrum Arnold Schönberg und die Wiener > Schule > Schwarzenbergplatz 6/Zaunergasse 1-3 > 1030 Wien > E-Mail: fr...@md... > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > <https://lists.sourceforge.net/lists/listinfo/exist-open> > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > <https://lists.sourceforge.net/lists/listinfo/exist-open> > > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open |
From: Joe W. <jo...@gm...> - 2024-09-30 13:56:18
|
Hi Jannik, I see from the screenshot that you're using version 6.2.0. When the problem occurs, do you see any errors in exist.log? If you isolate the character(s) in question, could you open an issue? (Or, feel free to email me the file, and I can take a look.) Joe On Sun, Sep 29, 2024 at 4:58 PM Jean-Paul Rehr <re...@gm...> wrote: > Dear Jannik, > > I ran into a similar issue some time ago, and it was due to a problem with > a hidden character inserted into the file. This may or may not be your > case, but if you have a backup version of the file from when it worked > fine, you could transform it into code points, and do the same with the > current file, and see if they are truly identical. > > Hope this helps, > JPR > > On Sun, Sep 29, 2024 at 10:29 PM Jannik Franz <fr...@md...> wrote: > >> >> Dear eXist-db community, >> >> in our project we are using eXist-db for over two years now and since a >> few weeks we are having a problem uploading a specific XML-file (via >> eXide's upload interface). >> As soon as the file is uploaded, the database crashes and the server has >> to be restarted. >> After the restart there remains an empty file in the database as a result >> of the interrupted upload process. >> >> The problem occurs only with this single file. It is a simple TEI-XML >> document, 68KB small and not special in any way (and it didn't cause any >> problems for the last 2 years). >> >> Attached is a screenshot of the Monex report after uploading the file. >> >> Has anyone had a similar problem or has any advice how to deal with it? >> >> Many thanks in advance and best regards >> Jannik Franz >> >> >> Jannik Franz >> Universität für Musik und darstellende Kunst Wien >> Institut für Musikwissenschaft und Interpretationsforschung >> Abteilung Wissenschaftszentrum Arnold Schönberg und die Wiener Schule >> Schwarzenbergplatz 6/Zaunergasse 1-3 >> 1030 Wien >> E-Mail: fr...@md... >> _______________________________________________ >> Exist-open mailing list >> Exi...@li... >> https://lists.sourceforge.net/lists/listinfo/exist-open >> > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |
From: Jean-Paul R. <re...@gm...> - 2024-09-29 20:57:07
|
Dear Jannik, I ran into a similar issue some time ago, and it was due to a problem with a hidden character inserted into the file. This may or may not be your case, but if you have a backup version of the file from when it worked fine, you could transform it into code points, and do the same with the current file, and see if they are truly identical. Hope this helps, JPR On Sun, Sep 29, 2024 at 10:29 PM Jannik Franz <fr...@md...> wrote: > > Dear eXist-db community, > > in our project we are using eXist-db for over two years now and since a > few weeks we are having a problem uploading a specific XML-file (via > eXide's upload interface). > As soon as the file is uploaded, the database crashes and the server has > to be restarted. > After the restart there remains an empty file in the database as a result > of the interrupted upload process. > > The problem occurs only with this single file. It is a simple TEI-XML > document, 68KB small and not special in any way (and it didn't cause any > problems for the last 2 years). > > Attached is a screenshot of the Monex report after uploading the file. > > Has anyone had a similar problem or has any advice how to deal with it? > > Many thanks in advance and best regards > Jannik Franz > > > Jannik Franz > Universität für Musik und darstellende Kunst Wien > Institut für Musikwissenschaft und Interpretationsforschung > Abteilung Wissenschaftszentrum Arnold Schönberg und die Wiener Schule > Schwarzenbergplatz 6/Zaunergasse 1-3 > 1030 Wien > E-Mail: fr...@md... > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |
From: Jannik F. <fr...@md...> - 2024-09-23 14:17:04
|
Dear eXist-db community, in our project we are using eXist-db for over two years now and since a few weeks we are having a problem uploading a specific XML-file (via eXide's upload interface). As soon as the file is uploaded, the database crashes and the server has to be restarted. After the restart there remains an empty file in the database as a result of the interrupted upload process. The problem occurs only with this single file. It is a simple TEI-XML document, 68KB small and not special in any way (and it didn't cause any problems for the last 2 years). Attached is a screenshot of the Monex report after uploading the file. Has anyone had a similar problem or has any advice how to deal with it? Many thanks in advance and best regards Jannik Franz Jannik Franz Universität für Musik und darstellende Kunst Wien Institut für Musikwissenschaft und Interpretationsforschung Abteilung Wissenschaftszentrum Arnold Schönberg und die Wiener Schule Schwarzenbergplatz 6/Zaunergasse 1-3 1030 Wien E-Mail: fr...@md... |
From: Vladimir P. <pra...@gm...> - 2024-09-19 15:59:34
|
eXistDB 6.2.0 Saving to a locally running database via the remote interface is several orders of magnitude slower compared to using an embedded database. I can't figure out where the problem lies, or how to store the data more efficiently. I need to store about 100,000 records. When saving via the remote interface, the speed is around 100 records per second. However, when using the embedded database, the speed is about 9,000 records per second. What am I doing wrong, or how can I improve this? Below is a code example. remote uri: xmldb:exist://localhost:8080/exist/xmlrpc embeded uri: xmldb:exist:// In both cases the db is running on the same machine. db configuration file is attached. private void createRandomTestData(String prefix, int records) throws Exception { final Random random = new Random(); final HashMap<String, String> data = new HashMap<>(); for (int i = 0; i < records; i++) { double latitude = MIN_LAT + (MAX_LAT - MIN_LAT) * random.nextDouble(); double longitude = MIN_LON + (MAX_LON - MIN_LON) * random.nextDouble(); String randomId = UUID.randomUUID().toString(); String randomName = generateRandomString(random, 30); String randomValue = generateRandomString(random, 10); String resourceId = prefix + "/" + randomId; String xml = String.format( """ <entity> <name>%s</name> <value>%s</value> <pos>%.8f %.8f</pos> </entity>""", randomName, randomValue, latitude, longitude ); data.put(resourceId, xml); } xmlDbService.saveEntity(data); } public void saveEntity(@NonNull HashMap<String, String> resources) throws Exception { var sortedResources = new TreeMap<>(resources); Collection col = null; String currentCollectionUri = null; long start = System.currentTimeMillis(); int i = 0; int c = 0; try { for (var entry : sortedResources.entrySet()) { String resourceId = entry.getKey(); String xmlData = entry.getValue(); var param = splitResourceId(resourceId); String collectionUri = param[0]; String resourceName = param[1]; if (col == null || !collectionUri.equals(currentCollectionUri)) { if (col != null) { col.close(); } col = getOrCreateCollection(collectionUri); col.setProperty("indent", "no"); currentCollectionUri = collectionUri; } i++; if (i % 1000 == 0) { long executionTime = System.currentTimeMillis() - start; log.info("Inserted: {}, rate {} / sec", i, c * 1000L / executionTime); start = System.currentTimeMillis(); c = 0; } XMLResource res = (XMLResource) col.createResource(resourceName, XMLResource.RESOURCE_TYPE); res.setContent(xmlData); col.storeResource(res); c++; log.trace("Resource saved: {} in collection: {}", resourceName, currentCollectionUri); } } finally { if (col != null) { col.close(); } } } <dependency> <groupId>org.exist-db</groupId> <artifactId>exist-core</artifactId> <version>6.2.0</version> </dependency> <dependency> <groupId>net.sf.xmldb-org</groupId> <artifactId>xmldb-api</artifactId> <version>1.7.0</version> </dependency> Thank you for your help and/or advice. V.  |
From: Kampkaspar, D. <dar...@tu...> - 2024-09-11 14:40:19
|
Hi Vincent, when I’m sure that no write has taken place, I go along the same route as Pieter Lamers said in his post, i.e. deleting the log and all lock files. eXist then does not notice there has been an interruption and it should come up without re-index. As I, too, have a lot of files and a large index definition, I usually try to avoid re-indexing. Meaning, if I know it’s going to re-index, I delete the whole thing and just do a re-ingest of all files from a recent DB dump. 1) you can go through your files collection by collection instead of doing everything at once (which gives you a much better overview of where in the process you actually are); and 2) the DB is not completely locked (which it is during the automatic re-index). In my experience, the re-ingest is not significantly slower* than the automatic re-index and to me the benefit of knowing where you are in the process is worth more than a small speed bonus. * While I have not timed it, it feels than a re-ingest is faster than the re-index. But I do not know nearly enough about eXist’s internal handling of re-indexing and indexing when ingesting to say whether there’s really a difference or other effects play into this. Of course, when re-ingesting from a dump and that dump is situated on a different piece of hardware, a re-ingest would also benefit from the separation of the I/O tasks which might give some added boost (again, I have not really timed this in a clean environment). -- That being said, almost 1 month of re-indexing and 100GB mem usage seem quite a lot. I do not dare say “excessive“ as I do not know your data and index definition, but my data’s usually done after about 3 to 4 days (no dedicated hardware and a max. of 6 GB RAM – a re-ingest is around 2–3 days, sometimes less). How big are you data\lucene and data\range directories and how big are the dbx files? That might help compare these figures. All best, Dario Am Dienstag, dem 10.09.2024 um 19:10 +0000 schrieb Lizzi, Vincent: Hello eXist-db community, By comparing the folders data\lucene and data\range to a previous backup, going by the number of files and total size of these folders, it looks like the reindexing process is about 50% done. The reindexing process has been running since August 13 and is using over 100 Gb of memory. Is there any way to start eXist-db and allow it to go through its recovery process but stop it from reindexing database files? I’m wondering if that could be a way to get the database operational again, and then I could manually run xmldb:reindex(). Thanks, Vincent _____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vin...@ta...<mailto:vin...@ta...> Information Classification: General From: Lizzi, Vincent Sent: Tuesday, September 3, 2024 11:36 AM To: Exist-open <exi...@li...> Subject: eXist-db repair and reindex process taking a very long time Hello eXist-db community, I’ve been monitoring an eXist-db database that is going through its automated recovery process, and am wondering if there is any way to get more information about its progress and how soon the process will finish. This eXist-db has full text indexing and range indexes configured on several large collections. The EC2 server on which eXist-db is hosted had an outage. When eXist-db was restarted its automatic repair process began. That was about 3 weeks ago. Through Windows Resource Monitor I can see that the eXist-db process is reading from dob.dbx and structure.dbx and writing to structure.dbx and writing to files in the “lucene” and “range” folders, and the process is using about 6% of CPU consistently, and memory usage has increased gradually to over 100 GB. The last line in exist.log is still: 2024-08-13 17:02:59,710 [main] INFO (NativeBroker.java [repair]:3692) - Reindexing database files ... Is there any way to find out more what the process is doing, estimate when it will finish, or release any bottlenecks, without interrupting the process? Thanks, Vincent ______________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group 530 Walnut St., Suite 850, Philadelphia, PA 19106 E-Mail:vin...@ta...<mailto:vin...@ta...> Web:www.tandfonline.com<http://www.tandfonline.com> Taylor & Francis is a trading name of Informa UK Limited, registered in England under no. 1072954 "Everything should be made as simple as possible, but not simpler." _______________________________________________ Exist-open mailing list Exi...@li...<mailto:Exi...@li...> https://lists.sourceforge.net/lists/listinfo/exist-open -- Dario Kampkaspar Leitung ZEiD – Zentrum für digitale Editionen Universitäts- und Landesbibliothek Darmstadt Postadresse: Magdalenenstr. 8 64289 Darmstadt Besucheradresse: Residenzschloss 1 64283 Darmstadt +49 6151 16-76292 +49 151 29121599 |
From: Pieter L. <pie...@be...> - 2024-09-11 11:03:28
|
Hi Vincent, When I want to skip the re-indexing I stop, remove journal.lck and *.log from the exist-db/data folder and restart. If the database was taken down during an update then this will lead to corruption. If no write was taking place, then usually (to my experience) this is harmless. Please note that my exist-db/data folder is not the exist resource data folder (which we put somewhere else) but the journaling folder. There are usually only two files in that one. good luck! If I end up needing to re-index on our 20GB data set it usually takes an hour and a half. But I might be using lighter indexes than you, so not sure if this is comparable. Best, Pieter On 10/09/2024 21:10, Lizzi, Vincent wrote: > > Hello eXist-db community, > > By comparing the folders data\lucene and data\range to a previous > backup, going by the number of files and total size of these folders, > it looks like the reindexing process is about 50% done. The reindexing > process has been running since August 13 and is using over 100 Gb of > memory. > > Is there any way to start eXist-db and allow it to go through its > recovery process but stop it from reindexing database files? I’m > wondering if that could be a way to get the database operational > again, and then I could manually run xmldb:reindex(). > > Thanks, > > Vincent > > _____________________________________________ > > *Vincent M. Lizzi* > > Head of Information Standards | Taylor & Francis Group > > vin...@ta... > <mailto:vin...@ta...> > > > Information Classification: General > > *From:*Lizzi, Vincent > *Sent:* Tuesday, September 3, 2024 11:36 AM > *To:* Exist-open <exi...@li...> > *Subject:* eXist-db repair and reindex process taking a very long time > > Hello eXist-db community, > > I’ve been monitoring an eXist-db database that is going through its > automated recovery process, and am wondering if there is any way to > get more information about its progress and how soon the process will > finish. > > This eXist-db has full text indexing and range indexes configured on > several large collections. The EC2 server on which eXist-db is hosted > had an outage. When eXist-db was restarted its automatic repair > process began. That was about 3 weeks ago. Through Windows Resource > Monitor I can see that the eXist-db process is reading from dob.dbx > and structure.dbx and writing to structure.dbx and writing to files in > the “lucene” and “range” folders, and the process is using about 6% of > CPU consistently, and memory usage has increased gradually to over 100 > GB. The last line in exist.log is still: > > 2024-08-13 17:02:59,710 [main] INFO (NativeBroker.java [repair]:3692) > - Reindexing database files ... > > Is there any way to find out more what the process is doing, estimate > when it will finish, or release any bottlenecks, without interrupting > the process? > > Thanks, > > Vincent > > ______________________________________________ > > *Vincent M. Lizzi* > > Head of Information Standards | Taylor & Francis Group > > 530 Walnut St., Suite 850, Philadelphia, PA 19106 > > E-Mail: vin...@ta... > <mailto:vin...@ta...> > > Web: www.tandfonline.com <http://www.tandfonline.com> > > Taylor & Francis is a trading name of Informa UK Limited, > > registered in England under no. 1072954 > > "Everything should be made as simple as possible, but not simpler." > > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open -- Pieter Lamers John Benjamins Publishing Company Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands tel: +31 20 630 4747 web:www.benjamins.com |
From: Lizzi, V. <Vin...@ta...> - 2024-09-10 19:27:16
|
Hello eXist-db community, By comparing the folders data\lucene and data\range to a previous backup, going by the number of files and total size of these folders, it looks like the reindexing process is about 50% done. The reindexing process has been running since August 13 and is using over 100 Gb of memory. Is there any way to start eXist-db and allow it to go through its recovery process but stop it from reindexing database files? I'm wondering if that could be a way to get the database operational again, and then I could manually run xmldb:reindex(). Thanks, Vincent _____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vin...@ta...<mailto:vin...@ta...> Information Classification: General From: Lizzi, Vincent Sent: Tuesday, September 3, 2024 11:36 AM To: Exist-open <exi...@li...> Subject: eXist-db repair and reindex process taking a very long time Hello eXist-db community, I've been monitoring an eXist-db database that is going through its automated recovery process, and am wondering if there is any way to get more information about its progress and how soon the process will finish. This eXist-db has full text indexing and range indexes configured on several large collections. The EC2 server on which eXist-db is hosted had an outage. When eXist-db was restarted its automatic repair process began. That was about 3 weeks ago. Through Windows Resource Monitor I can see that the eXist-db process is reading from dob.dbx and structure.dbx and writing to structure.dbx and writing to files in the "lucene" and "range" folders, and the process is using about 6% of CPU consistently, and memory usage has increased gradually to over 100 GB. The last line in exist.log is still: 2024-08-13 17:02:59,710 [main] INFO (NativeBroker.java [repair]:3692) - Reindexing database files ... Is there any way to find out more what the process is doing, estimate when it will finish, or release any bottlenecks, without interrupting the process? Thanks, Vincent ______________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group 530 Walnut St., Suite 850, Philadelphia, PA 19106 E-Mail: vin...@ta...<mailto:vin...@ta...> Web: www.tandfonline.com<http://www.tandfonline.com> Taylor & Francis is a trading name of Informa UK Limited, registered in England under no. 1072954 "Everything should be made as simple as possible, but not simpler." |
From: Ted H. <meg...@gm...> - 2024-09-10 01:23:17
|
If I could use the await command, I would do so. But when I use the await command, I get the following error in the eXist database: async functions are only available in es8. And I don't know how to set eXist so it will use es8. But I digress. What I want to do is use one function to retrieve data in a .json file and then use another function to store data from the JSON file into global variables. Then I will use those variables to set attributes of SVG objects. But I'm doing something wrong and I don't know what I'm doing wrong. Here is my code so far: My HTML Code: <html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg=" http://www.w3.org/2000/svg" xmlns:xs="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.w3.org/1999/xhtml SVG_Bezier_Curve_Webpage_XML_Schema.xsd"> <head> <title>SVG_Diagonal_Line</title> <link rel="stylesheet" type="text/css" href=" http://localhost:8080/exist/rest/db/apps/HTML_Student/SVG_Diagonal_Line.css "/> <script language="javascript" src=" http://localhost:8080/exist/rest/db/apps/HTML_Student/SVG_Diagonal_Line.js "/> </head> <body onload="Store_Data()"> <div id="Button_Box"> <input type="text" id="Data_Text"/> </div> <div id="SVG_Position"> <svg:svg id="My_SVG" height="500px" width="600px"> <svg:line id="Diagonal_Line_1" x1="300px" x2="395px" y1="200px" y2="300px"/> <svg:line id="Diagonal_Line_2" x1="500px" x2="405px" y1="200px" y2="300px"/> <svg:rect id="Base_Rectangle"/> <svg:ellipse id="Germanium_Ellipse"/> <svg:ellipse id="Battery_1_Bottom" rx="20" ry="10" cx="250" cy="350" fill="green" stroke="black" stroke-width="3"/> <svg:rect id="Battery_1__Body" x="230px" y="300px" width="40px" height="50px" stroke="black" stroke-width="3" fill="green"/> <svg:ellipse id="Battery_1_Top" rx="20" ry="10" cx="250" cy="300" stroke="black" stroke-width="3"/> <svg:rect id="Battery_2_Body" x="530px" y="300px" width="40px" height="50px" stroke="black" stroke-width="3" fill="green"/> <svg:path id="Battery_2_Bottom" d="M 530,350 A 20,10 0,0,0 570,350" stroke="black" stroke-width="3" fill="green"/> <svg:ellipse id="Battery_2_Top" rx="20" ry="10" cx="550" cy="300" stroke="black" stroke-width="3"/> </svg:svg> </div> </body> </html> My CSS Code: #Diagonal_Line_1{ position:absolute; stroke: #FFD700; stroke-width: 3; fill: none;} #Diagonal_Line_2{ stroke: #FFD700; stroke-width: 3; fill: none;} #Data_Text{ position: absolute; top: 50px; left: 300px; height: 50px; width: 300px;} #My_Svg{ position: absolute; top: 300px; left: 200px; height: 500px; width: 500px;} #Button_Box{ position: absolute; left: 150px;} #Base_Rectangle{ position: absolute; height: 15px; width: 100px; fill: #b87333; stroke: black; stroke-width: 2;} Germanium_Ellipse{ position: absolute; fill: #d5d5d7; stroke: blue; stroke-width: 2;} My Javascript Code: var Retrieved_Data; var Rectangle_X_Coordinate; var URL; var Rectangle_Object; var Rectangle_Y_Coordinate; var Germanium_Object; var Germanium_X_Radius; var Germanium_Y_Radius; var Germanium_X_Center; var Germanium_Y_Center; var Set_Attributes; function Retrieve_Data(){ URL = ' http://localhost:8080/exist/rest/db/apps/HTML_Student/SVG_Diagonal_Line.json '; fetch(URL) .then ((res) => response.json()); } function Store_Data() { Retrieved_Data = Retrieve_Data(); document.getElementById("Data_Text").value = data.Diagonal_Lines.Diagonal_Line_2.First_X1_Coordinate; document.getElementById("Data_Text").value = data.Diagonal_Lines.Base_Rectangle.Rectangle_Y_Coordinate; Rectangle_Object = document.getElementById('Base_Rectangle'); Germanium_Object = document.getElementById('Germanium_Ellipse'); Rectangle_X_Coordinate = JSON.stringify(Retrieved_Data.Rectangle_X_Coordinate); Rectangle_Y_Coordinate = JSON.stringify(data.Diagonal_Lines.Base_Rectangle.Rectangle_Y_Coordinate); Germanium_X_Radius = JSON.stringify(data.Diagonal_Lines.Germanium_Ellipse.Germanium_X_Radius); Germanium_Y_Radius = JSON.stringify(data.Diagonal_Lines.Germanium_Ellipse.Germanium_Y_Radius); Germanium_X_Center = JSON.stringify(data.Diagonal_Lines.Germanium_Ellipse.Germanium_X_Center); Germanium_Y_Center = JSON.stringify(data.Diagonal_Lines.Germanium_Ellipse.Germanium_Y_Center); document.getElementById("Data_Text").value = Rectangle_X_Coordinate; } function Draw_Objects(){ Draw_Objects(Rectangle_X_Coordinate, Rectangle_Y_Coordinate); document.getElementById("Data_Text").value = Rectangle_X_Coordinate; Rectangle_Object.setAttribute('x', Rectangle_X_Coordinate); Rectangle_Object.setAttribute('y', Rectangle_Y_Coordinate); Germanium_Object.setAttribute('rx', Germanium_X_Radius); Germanium_Object.setAttribute('ry', Germanium_Y_Radius); Germanium_Object.setAttribute('cx', Germanium_X_Center); Germanium_Object.setAttribute('cy', Germanium_Y_Center); Germanium_Object.style = "fill: #d5d5d7; stroke: black; stroke-width: 3;" And my JSON Data: {"Diagonal_Lines": { "Diagonal_Line_1": {"First_X1_Coordinate": "200px", "Second_X1_Coordinate": "295px", "First_Y1_Coordinate": "200px", "Second_Y1_Coordinate": "300px"}, "Diagonal_Line_2": {"First_X1_Coordinate": "400px", "Second_X1_Coordinate": "305px", "First_Y1_Coordinate": "200px", "Second_Y1_Coordinate": "300px"}, "Base_Rectangle": {"Rectangle_X_Coordinate": 350, "Rectangle_Y_Coordinate": 300}, "Germanium_Ellipse": {"Germanium_X_Radius": 90, "Germanium_Y_Radius": 25, "Germanium_X_Center": 400, "Germanium_Y_Center": 340}, "Battery_1_Bottom": {"Battery_1_X_Radius": 20, "Battery_1_Y_Radius": 10, "Battery_1_X_Center": 250, "Battery_1_Y_Center": 350}, "Battery_1_Body": {"Battery_1_X": 230, "Battery_1_Y": 300}, "Battery_2_Top": {"Battery_2_X_Radius": 20, "Battery_2_Y_Radius": 10, "Battery_2_X_Center": 250, "Battery_2_Y_Center": 350}, "Battery_2_Body": {"Battery_2_X": 530, "Battery_2_Y": 300}}} |
From: Erik S. <er...@xa...> - 2024-09-04 09:34:27
|
Declarative Amsterdam 2024 will be happening at CWI, Amsterdam Science Park on Thursday/Friday November 7 and 8, 2024. * On Thursday morning we start with tutorials on XForms, in-source testing and iXML. Make sure to bring your own device to participate optimally in the hands-on sessions. * The symposium will take place on Thursday afternoon and continue on Friday. The program offers a blend of renowned speakers alongside lesser-known experts. The presentations cover a wide range of topics, including various techniques, tools, applications, and implementations. As in previous editions, we have an engaging and informative lineup for our attendees, as you can see for yourself at <https://declarative.amsterdam/program> https://declarative.amsterdam/program The conference is a hybrid event, live at the Science Park in Amsterdam, and live-streamed. Registration is open; early bird registration closes October 1st, so hurry along to <https://declarative.amsterdam/registration> https://declarative.amsterdam/registration. Best wishes, The Declarative Amsterdam Conference Committee. |
From: Lizzi, V. <Vin...@ta...> - 2024-09-03 16:13:24
|
Hi Lars, I’ve had a few observations from using eXist-db that might be of interest. * Queries that take a long time to run and access documents in different collections can run be made to run in less time by placing the documents into a single collection. * Simultaneously reading from and writing to the same collection can cause blocks. * The tracing function in MoneX is useful to see what indexes are being used by a query. It looks like your Person XML and Article XML are using different namespaces, so that would provide a way differentiate the two kinds of documents if they are in the same collection. I hope this is helpful in some way. Kind regards, Vincent _____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vin...@ta...<mailto:vin...@ta...> Information Classification: General From: Lars Scheidelerl <sch...@sa...> Sent: Tuesday, September 3, 2024 7:46 AM To: Claudius Teodorescu <cla...@gm...>; exi...@li... Subject: Re: [Exist-open] Fwd: Cross reference Index - why is so slow Hey Claudius, thank you for the insight. But with the solution you presented, I ask myself why should I use eXistDb at all? Then I could simply use an XML parser and read the fields directly from the files at file level or use another document-based DB that can do XML. But I want to use the features that are in a complete system - I think the diversions with a custom search engine for combined fields is somehow good, but also thought around the corner. Surely there must be a solution to the problem in eXistdb? Best regards Lars Am 03.09.24 um 11:27 schrieb Claudius Teodorescu: Dear Lars, My name is Claudius Teodorescu, and I am currently working for the Academy of Mainz. For the cases with publishing dictionaries (around 8, see https://clre.solirom.ro<https://clre.solirom.ro>), I have chosen the static website approach, along with static indexes, and a static search engine that is browser-based and is written in Rust and compiled to WebAssembly. One can see some test data from the DWDS dictionary (around 250,000 entries, and four indexes) at https://claudius-teodorescu.gitlab.io/dwds-site/<https://claudius-teodorescu.gitlab.io/dwds-site/>. Maybe this is of any help for you. I think that I do not have to mention that the indexing takes seconds, as I have written the indexing engine in Rust language. :) Best regards, Claudius On Tue, 3 Sept 2024 at 11:30, Lars Scheidelerl <sch...@sa...<mailto:sch...@sa...>> wrote: Dear Boris, thank you very much for the answer. The person data and the article data are in different collections. For every person we have one file, for every article we have one file. We don't have a fixed order of when data is imported or written in, as I understand it, what you are suggesting would make it necessary for all the person or item data to always be available before the index of the other collection, so that it can be accessed in the index of the other. Maybe we have to, like you pointed out, tests the module functions for better performance. But in the past however, after several different approaches that we have already tried, we have realized that querying the data, no matter how good it is in eXide, for example, is significantly slower when it is used for the index. Could it be that the structural index is not used when re-indexing? Our assumption was that the data is iterated over differently or that new blobs are written over the structure and content in addition to indexing. In other words, that it validates, saves and indexes the data at the same time as re-indexing. Other approaches include the use of cache or a helper file, where the fields are composed beforehand and then indexed accordingly, but this also takes a long time and unfortunately also blocks working on the files. So if we write a helper file with all fields in eXide, the whole process takes about 90s, if we work with the fields as we would build the index, i.e. with xml:id in the helper file, ~110s. Re-Index 2-4 hours. Not really understandable. We are now primarily trying to improve this using the xml:id/id() function, but not much hope to improve the re-index on a production scale. But if all data is re-indexed, i.e. the xml:id fields are not available, it is in vain, too. Would love to learn more about this topic and continue to share experiences. Lars Am 03.09.24 um 00:49 schrieb Boris Lehečka: > Dear Lars, > > I have similar issues with indexing dictionaries: my indexing > procedure ask for data from the taxonomy (like expansions of > abbreviations) and sometimes indexing a dictionary with about 36,000 > entries took a whole day. > > I don't remember who (Juri Leino, I guess), pointed out to me that > the index is saved only after the whole document is parsed. After > moving each dictionary entry to a separate file, indexing took much > less time (several hours). > > However, this does not seem to be the cause of your problem. > > In my opinion, your indexing code (in the module) is very > complicated, sometimes it can be much simpler, for example without > explicit conversion to string (like in tei:persName[string(@ref) eq > $identifier ...]), or normalizing spaces in ip:getFullText (full-tex t > search usually uses only the parts between spaces). > > My suggestion is following: first, create an index for persons in > a separate collection (with separate collection.xconf): compute fields > with values you will query or want to return when you index the > articles (in different collection). And second, use Lucene and > fulltext search in your "index-persons" module to find data in the > index from the first phase. > > This is just an idea, not tested, I hope someone else is much more > experienced in the magic of indexing. > > Best, > > Boris Lehečka > > Dne 02.09.2024 v 16:57 Lars Scheidelerl napsal(a): >> Hey, >> >> we assume that we are not using the index in our project as intended. >> Because when we try to build the index we have created, it takes a >> very long time. >> >> We have two collections, one with a data stack of 687 in which data >> is stored, and one with 400 xml where articles are are stored. >> >> For the personal article we want certain information from the >> articles and vice versa. >> >> Person XML: >> >> <person xml:id="i0c9ab7e2-2e21-39ff-aea8-c56ad4702a7f" status="safe" >> modified="2024-07-30T13:26:09.154+02:00"> >> <name>Marcanton Zimara</name> >> <identifier >> preferred="YES">https://d-nb.info/gnd/120156784<https://d-nb.info/gnd/120156784></identifier> >> <alternateName>Marcusantonius Zimara</alternateName> >> <alternateName>Marcus Anthonius Zimara</alternateName> >> <alternateName>Antonius Zimara</alternateName> >> <alternateName>M. Antonius Zimarra</alternateName> >> <alternateName>Marc Antoine Zimara</alternateName> >> <alternateName>M. Anto. Zimare</alternateName> >> <alternateName>Marco A. Zimara</alternateName> >> <alternateName>Marcus A. Zimara</alternateName> >> <alternateName>Marc Ant. Zimara</alternateName> >> <alternateName>Marcantonio Zimara</alternateName> >> <alternateName>Marcus Antonius Zimara</alternateName> >> <alternateName>Marcus Antonius Zimarra</alternateName> >> <alternateName>Marcianto Zimare</alternateName> >> <alternateName>Marco Antonio Zimarra</alternateName> >> <alternateName>Marco Antonio Zimare</alternateName> >> <birthDate>1460</birthDate> >> <deathDate>1532</deathDate> >> <description>JWO</description> >> <sortableName>Zimara, Marcanton </sortableName> >> </person> >> >> Articel XML: >> >> <?xml version="1.0" encoding="UTF-8"?> >> <TEI xmlns="http://www.tei-c.org/ns/1.0<http://www.tei-c.org/ns/1.0>"> >> <teiHeader> >> <fileDesc> >> <titleStmt> >> <title>a nihilo nihil fit</title> >> <author> >> <persName >> ref="/db/projects/jwo/data/lists/personenListe.xml#BS_d1e509" >> xml:id="author_BS_d1e509"> >> <forename>Marcanton</forename> >> <surname>Zimara</surname> >> </persName> >> </author> >> </titleStmt> >> <sourceDesc> >> <p xml:id="p_sourceDesc_igw_tvr_pzb">born digital</p> >> </sourceDesc> >> </fileDesc> >> </teiHeader> >> <text xml:lang="de-DE" type="main"> >> <body> >> <div1 xml:id="div1_d1e23_2"> >> <p xml:id="p_d1e27_1" n="1"> Lorem ipsum dolor sit >> amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt >> ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis >> nostrud exercitation <persName xml:id="persName_sa123" >> ref="https://d-nb.info/gnd/120156784<https://d-nb.info/gnd/120156784>" >> rend="smallcaps">Zimara</persName> ullamco laboris nisi ut aliquip ex >> ea commodo consequat. Duis aute irure dolor in reprehenderit in >> voluptate velit esse cillum dolore eu <persName >> xml:id="persName_s123" >> ref="https://d-nb.info/gnd/120156784<https://d-nb.info/gnd/120156784>">Zimara</persName> fugiat nulla >> pariatur. Excepteur sint occaecat cupidatat non proident, sunt in >> culpa qui officia deserunt mollit anim id est laborum. </p> >> </div1> >> </body> >> </text> >> </TEI> >> >> Collection.xconf: >> >> <collection xmlns="http://exist-db.org/collection-config/1.0<http://exist-db.org/collection-config/1.0>"> >> <index xmlns:gndo="https://d-nb.info/standards/elementset/gnd#<https://d-nb.info/standards/elementset/gnd#>" >> xmlns:owl="http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl#>" >> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>" >> xmlns:xs="http://www.w3.org/2001/XMLSchema<http://www.w3.org/2001/XMLSchema>"> >> <lucene> >> <module uri="http://place.sok.org/xquery/index-persons<http://place.sok.org/xquery/index-persons>" >> prefix="ip" >> at="xmldb:exist:///db/apps/sok-application/modules/index-persons.xqm"<xmldb:exist:///db/apps/sok-application/modules/index-persons.xqm>/> >> <analyzer >> class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> >> <analyzer >> class="org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer" >> id="nodiacritics"/> >> <text qname="adcache"> >> <field name="basicId" >> expression="//person/@id/string()"/> >> <field name="fullname" >> expression="string(./basic/person/name)"/> >> <field name="gndURI" >> expression="string(./basic/person/identifier[@preferred eq 'YES'])"/> >> <field name="gndID" >> expression="substring-after(./basic/person/identifier[@preferred eq >> 'YES']/string(), '/gnd/')"/> >> <field name="status" >> expression="./basic/person/@status/string()<mailto:./basic/person/@status/string()>"/> >> <field name="articleID" >> expression="ip:getArticleFromPersonCache(.)"<ip:getArticleFromPersonCache(.)>/> >> <field name="articleRole" >> expression="ip:getArticleRoleFromPersonCache(.)"<ip:getArticleRoleFromPersonCache(.)>/> >> <field name="fulltext" expression="ip:getFullText(.)"<ip:getFullText(.)>/> >> </text> >> </lucene> >> </index> >> </collection> >> >> Module Functions: >> >> module namespace ip = >> "http://place.sok.org/xquery/index-persons<http://place.sok.org/xquery/index-persons>"; >> >> declare namespace basic = >> "http://place.sok.org/xquery/basic<http://place.sok.org/xquery/basic>" ; >> >> declare namespace xs = "http://www.w3.org/2001/XMLSchema<http://www.w3.org/2001/XMLSchema>"; >> declare namespace tei = "http://www.tei-c.org/ns/1.0<http://www.tei-c.org/ns/1.0>"; >> declare namespace util = "http://exist-db.org/xquery/util<http://exist-db.org/xquery/util>"; >> >> declare function ip:getArticleFromPersonCache($adcache as element()) >> as xs:string* { >> let $parentCollectionPath as xs:anyURI? := >> ip:getParentCollection($adcache), >> $basicId as xs:string := $adcache/basic/person/@id/string(), >> $identifier as xs:string? := >> $adcache/basic/person/identifier[@preferred eq 'YES']/string(), >> $listId as xs:string? := collection( >> $variables:jwo-lists-path )/tei:TEI//tei:person[ >> basic:basic-id-from-url(string(@sameAs)) eq $basicId]/@xml:id/string(), >> $foundInDocumentIds as xs:string* := >> if ( matches($parentCollectionPath,'prepublish') ) >> then >> ( >> collection($parentCollectionPath)/tei:TEI[./tei:teiHeader//tei:idno[1]/string() >> ne >> ''][matches(replace((normalize-space('||'||string-join(distinct-values(.//tei:persName[@ref]/@ref/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')||normalize-space('||'||string-join(distinct-values(.//tei:persName[@source]/@source/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')),'\|{4}',''),'\|{2}('||$basicId||'|'||$listId||'|'||$identifier||')\|{2}')]//tei:idno/string() >> ) >> else >> ( >> collection($parentCollectionPath)/tei:TEI[./tei:teiHeader//tei:idno[1]/string() >> ne >> ''][matches(replace((normalize-space('||'||string-join(distinct-values(.//tei:persName[@ref][not(parent::editor)]/@ref/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')||normalize-space('||'||string-join(distinct-values(.//tei:persName[@source][not(parent::editor)]/@source/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')),'\|{4}',''),'\|{2}('||$basicId||'|'||$listId||'|'||$identifier||')\|{2}')]//tei:idno/string() >> ) >> return >> ( >> $foundInDocumentIds >> ) >> }; >> >> declare >> function ip:getAuthenticatedArticleCollection($collection-name as >> xs:string) as item()* { >> if ($collection-name eq 'prepublish') then >> xmldb:xcollection($variables:jwo-prepublish-path) else >> xmldb:xcollection($variables:jwo-publish-path) >> }; >> >> declare function >> ip:getPersNamesInCollectionFromCachedPerson($cached-person as >> element(), $collection-name as xs:string) as element()* { >> let $basicId := $cached-person/basic/person/@id/string() >> let $identifier := >> $cached-person/basic/person/identifier[@preferred eq 'YES']/string() >> let $listId := collection( $variables:jwo-lists-path >> )/tei:TEI/tei:text[1]/tei:body[1]/tei:listPerson[1]/tei:person[ >> basic:basic-id-from-url(string(@sameAs)) eq $basicId]/@xml:id/string() >> let $collection >> := ip:getAuthenticatedArticleCollection($collection-name) >> return ( >> $collection//tei:persName[ >> string(@ref) eq $identifier >> or ip:getIdFromUri(string(@ref)) eq $listId >> or substring-after(string(@ref), '#') eq $basicId >> or substring-before(substring-after(string(@source), >> 'persons/'), '.xml') eq $basicId] >> ) >> }; >> >> declare function ip:getRoleFromPersName($persName as element(), >> $collection-name as xs:string) as xs:string? { >> if ($persName/ancestor::*/local-name() = 'author') >> then ( 'author' ) >> else if ($persName/ancestor::*/local-name() = 'editor') >> then ( >> if ($collection-name eq 'prepublish') then ( 'editor' ) >> (: Ignore editors in published case :) >> else () >> ) >> else ( 'annotated' ) >> }; >> >> declare function ip:getArticleRoleFromPersonCache($cached-person as >> element(), $collection-name as xs:string) as xs:string* { >> let $allPersNames := >> if ($collection-name ne 'prepublish') then ( >> ip:getPersNamesInCollectionFromCachedPerson($cached-person, >> $collection-name)[not(ancestor::*/local-name() = 'editor')] >> ) else ( >> ip:getPersNamesInCollectionFromCachedPerson($cached-person, >> $collection-name) >> ) >> return ( >> for $articleGroup in $allPersNames >> let $articleID := $articleGroup/ancestor::tei:TEI//tei:idno[1] >> group by $articleID >> return ( >> $articleID || '@@' || string-join(distinct-values( >> for $persName in $articleGroup >> let $role := ip:getRoleFromPersName($persName, >> $collection-name) >> order by $role >> return $role >> ), ' ') >> ) >> ) >> }; >> >> declare function ip:getParentCollection($element as node()) as >> xs:anyURI? { >> resolve-uri('../../', $element/base-uri()) >> }; >> >> declare function ip:getIdFromUri($uri as xs:string) as xs:string { >> substring-after($uri, '#') >> }; >> >> declare function basic:basic-id-from-url($url as xs:string) as >> xs:string? { >> substring-after(substring-before($url, '?dataset'),'persons/') >> }; >> >> declare function ip:getFullText($element) as xs:string { >> let $parentCollection as xs:anyURI? := >> ip:getParentCollection($element) >> return >> ( >> normalize-space(string-join( >> let $basicId as xs:string := >> $element/basic/person/@id/string(), >> $identifier as xs:string* := >> $element/basic/person/identifier[@preferred eq 'YES']/string(), >> $listId as item()* := collection( >> $variables:lists-path >> )/tei:TEI/tei:text[1]/tei:body[1]/tei:listPerson[1]/tei:person[ >> basic:basic-id-from-url(string(@sameAs)) eq $basicId]/@xml:id/string(), >> $element-string as xs:string* := >> string($element), >> $collections as item()* := >> collection($parentCollection)//tei:persName[string(@ref) eq >> $identifier or ip:getIdFromUri(string(@ref)) eq $listId or >> substring-after(string(@ref), '#') eq $basicId or >> substring-before(substring-after(string(@source), 'persons/'), >> '.xml') eq $basicId][1], >> $element-cache-string as xs:string* := >> string-join(for $found-element in $collections where >> count($found-element) > 0 return $found-element, ' ') >> return >> ( >> $element-string,$element-cache-string >> ),' ')) >> ) >> }; >> >> Please help. >> >> > > > _______________________________________________ > Exist-open mailing list > Exi...@li...<mailto:Exi...@li...> > https://lists.sourceforge.net/lists/listinfo/exist-open<https://lists.sourceforge.net/lists/listinfo/exist-open> -- Lars Scheideler - wiss. technischer Mitarbeiter - Althochdeutsches Wörterbuch & Digital Humanities Sächsische Akademie der Wissenschaften zu Leipzig Karl-Tauchnitz-Straße 1 04107 Leipzig sch...@sa...<mailto:sch...@sa...> www.saw-leipzig.de<http://www.saw-leipzig.de> _______________________________________________ Exist-open mailing list Exi...@li...<mailto:Exi...@li...> https://lists.sourceforge.net/lists/listinfo/exist-open<https://lists.sourceforge.net/lists/listinfo/exist-open> -- Cu stimă, Claudius Teodorescu -- Lars Scheideler - wiss. technischer Mitarbeiter - Althochdeutsches Wörterbuch & Digital Humanities Sächsische Akademie der Wissenschaften zu Leipzig Karl-Tauchnitz-Straße 1 04107 Leipzig sch...@sa...<mailto:sch...@sa...> www.saw-leipzig.de<http://www.saw-leipzig.de> |
From: Lizzi, V. <Vin...@ta...> - 2024-09-03 16:04:02
|
Hello eXist-db community, I've been monitoring an eXist-db database that is going through its automated recovery process, and am wondering if there is any way to get more information about its progress and how soon the process will finish. This eXist-db has full text indexing and range indexes configured on several large collections. The EC2 server on which eXist-db is hosted had an outage. When eXist-db was restarted its automatic repair process began. That was about 3 weeks ago. Through Windows Resource Monitor I can see that the eXist-db process is reading from dob.dbx and structure.dbx and writing to structure.dbx and writing to files in the "lucene" and "range" folders, and the process is using about 6% of CPU consistently, and memory usage has increased gradually to over 100 GB. The last line in exist.log is still: 2024-08-13 17:02:59,710 [main] INFO (NativeBroker.java [repair]:3692) - Reindexing database files ... Is there any way to find out more what the process is doing, estimate when it will finish, or release any bottlenecks, without interrupting the process? Thanks, Vincent ______________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group 530 Walnut St., Suite 850, Philadelphia, PA 19106 E-Mail: vin...@ta...<mailto:vin...@ta...> Web: www.tandfonline.com<http://www.tandfonline.com> Taylor & Francis is a trading name of Informa UK Limited, registered in England under no. 1072954 "Everything should be made as simple as possible, but not simpler." Information Classification: General |
From: Lars S. <sch...@sa...> - 2024-09-03 11:46:49
|
Hey Claudius, thank you for the insight. But with the solution you presented, I ask myself why should I use eXistDb at all? Then I could simply use an XML parser and read the fields directly from the files at file level or use another document-based DB that can do XML. But I want to use the features that are in a complete system - I think the diversions with a custom search engine for combined fields is somehow good, but also thought around the corner. Surely there must be a solution to the problem in eXistdb? Best regards Lars Am 03.09.24 um 11:27 schrieb Claudius Teodorescu: > Dear Lars, > > My name is Claudius Teodorescu, and I am currently working for the > Academy of Mainz. > > For the cases with publishing dictionaries (around 8, see > https://clre.solirom.ro), I have chosen the static website approach, > along with static indexes, and a static search engine that is > browser-based and is written in Rust and compiled to WebAssembly. > > One can see some test data from the DWDS dictionary (around 250,000 > entries, and four indexes) at > https://claudius-teodorescu.gitlab.io/dwds-site/. > > Maybe this is of any help for you. > > I think that I do not have to mention that the indexing takes seconds, > as I have written the indexing engine in Rust language. :) > > Best regards, > Claudius > > On Tue, 3 Sept 2024 at 11:30, Lars Scheidelerl > <sch...@sa...> wrote: > > Dear Boris, > > thank you very much for the answer. > > The person data and the article data are in different collections. > For every person we have one file, for every article we have one file. > > We don't have a fixed order of when data is imported or written in, > as I understand it, what you are suggesting would make it > necessary for > all the person or item data to > always be available before the index of the other collection, so > that it > can be accessed in the index of the other. > > Maybe we have to, like you pointed out, tests the module functions > for > better performance. > But in the past however, after several different approaches that > we have > already tried, we have realized that querying the data, > no matter how good it is in eXide, for example, is significantly > slower > when it is used for the index. > Could it be that the structural index is not used when re-indexing? > Our assumption was that the data is iterated over differently or that > new blobs are written over the structure and content in addition to > indexing. > In other words, that it validates, saves and indexes the data at the > same time as re-indexing. > > Other approaches include the use of cache or a helper file, where the > fields are composed beforehand and then indexed accordingly, > but this also takes a long time and unfortunately also blocks > working on > the files. > So if we write a helper file with all fields in eXide, the whole > process > takes about 90s, if we work with the fields as we would build the > index, > i.e. with xml:id in the helper file, ~110s. > Re-Index 2-4 hours. Not really understandable. > > We are now primarily trying to improve this using the xml:id/id() > function, but not much hope to improve the re-index on a > production scale. > But if all data is re-indexed, i.e. the xml:id fields are not > available, > it is in vain, too. > > Would love to learn more about this topic and continue to share > experiences. > > Lars > > Am 03.09.24 um 00:49 schrieb Boris Lehečka: > > Dear Lars, > > > > I have similar issues with indexing dictionaries: my indexing > > procedure ask for data from the taxonomy (like expansions of > > abbreviations) and sometimes indexing a dictionary with about > 36,000 > > entries took a whole day. > > > > I don't remember who (Juri Leino, I guess), pointed out to > me that > > the index is saved only after the whole document is parsed. After > > moving each dictionary entry to a separate file, indexing took much > > less time (several hours). > > > > However, this does not seem to be the cause of your problem. > > > > In my opinion, your indexing code (in the module) is very > > complicated, sometimes it can be much simpler, for example without > > explicit conversion to string (like in tei:persName[string(@ref) eq > > $identifier ...]), or normalizing spaces in ip:getFullText > (full-tex t > > search usually uses only the parts between spaces). > > > > My suggestion is following: first, create an index for > persons in > > a separate collection (with separate collection.xconf): compute > fields > > with values you will query or want to return when you index the > > articles (in different collection). And second, use Lucene and > > fulltext search in your "index-persons" module to find data in the > > index from the first phase. > > > > This is just an idea, not tested, I hope someone else is much > more > > experienced in the magic of indexing. > > > > Best, > > > > Boris Lehečka > > > > Dne 02.09.2024 v 16:57 Lars Scheidelerl napsal(a): > >> Hey, > >> > >> we assume that we are not using the index in our project as > intended. > >> Because when we try to build the index we have created, it takes a > >> very long time. > >> > >> We have two collections, one with a data stack of 687 in which > data > >> is stored, and one with 400 xml where articles are are stored. > >> > >> For the personal article we want certain information from the > >> articles and vice versa. > >> > >> Person XML: > >> > >> <person xml:id="i0c9ab7e2-2e21-39ff-aea8-c56ad4702a7f" > status="safe" > >> modified="2024-07-30T13:26:09.154+02:00"> > >> <name>Marcanton Zimara</name> > >> <identifier > >> preferred="YES">https://d-nb.info/gnd/120156784</identifier> > >> <alternateName>Marcusantonius Zimara</alternateName> > >> <alternateName>Marcus Anthonius Zimara</alternateName> > >> <alternateName>Antonius Zimara</alternateName> > >> <alternateName>M. Antonius Zimarra</alternateName> > >> <alternateName>Marc Antoine Zimara</alternateName> > >> <alternateName>M. Anto. Zimare</alternateName> > >> <alternateName>Marco A. Zimara</alternateName> > >> <alternateName>Marcus A. Zimara</alternateName> > >> <alternateName>Marc Ant. Zimara</alternateName> > >> <alternateName>Marcantonio Zimara</alternateName> > >> <alternateName>Marcus Antonius Zimara</alternateName> > >> <alternateName>Marcus Antonius Zimarra</alternateName> > >> <alternateName>Marcianto Zimare</alternateName> > >> <alternateName>Marco Antonio Zimarra</alternateName> > >> <alternateName>Marco Antonio Zimare</alternateName> > >> <birthDate>1460</birthDate> > >> <deathDate>1532</deathDate> > >> <description>JWO</description> > >> <sortableName>Zimara, Marcanton </sortableName> > >> </person> > >> > >> Articel XML: > >> > >> <?xml version="1.0" encoding="UTF-8"?> > >> <TEI xmlns="http://www.tei-c.org/ns/1.0"> > >> <teiHeader> > >> <fileDesc> > >> <titleStmt> > >> <title>a nihilo nihil fit</title> > >> <author> > >> <persName > >> ref="/db/projects/jwo/data/lists/personenListe.xml#BS_d1e509" > >> xml:id="author_BS_d1e509"> > >> <forename>Marcanton</forename> > >> <surname>Zimara</surname> > >> </persName> > >> </author> > >> </titleStmt> > >> <sourceDesc> > >> <p xml:id="p_sourceDesc_igw_tvr_pzb">born > digital</p> > >> </sourceDesc> > >> </fileDesc> > >> </teiHeader> > >> <text xml:lang="de-DE" type="main"> > >> <body> > >> <div1 xml:id="div1_d1e23_2"> > >> <p xml:id="p_d1e27_1" n="1"> Lorem ipsum dolor sit > >> amet, consectetur adipiscing elit, sed do eiusmod tempor > incididunt > >> ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis > >> nostrud exercitation <persName xml:id="persName_sa123" > >> ref="https://d-nb.info/gnd/120156784" > >> rend="smallcaps">Zimara</persName> ullamco laboris nisi ut > aliquip ex > >> ea commodo consequat. Duis aute irure dolor in reprehenderit in > >> voluptate velit esse cillum dolore eu <persName > >> xml:id="persName_s123" > >> ref="https://d-nb.info/gnd/120156784">Zimara</persName> fugiat > nulla > >> pariatur. Excepteur sint occaecat cupidatat non proident, sunt in > >> culpa qui officia deserunt mollit anim id est laborum. </p> > >> </div1> > >> </body> > >> </text> > >> </TEI> > >> > >> Collection.xconf: > >> > >> <collection xmlns="http://exist-db.org/collection-config/1.0"> > >> <index > xmlns:gndo="https://d-nb.info/standards/elementset/gnd#" > >> xmlns:owl="http://www.w3.org/2002/07/owl#" > >> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > >> xmlns:xs="http://www.w3.org/2001/XMLSchema"> > >> <lucene> > >> <module > uri="http://place.sok.org/xquery/index-persons" > >> prefix="ip" > >> > at="xmldb:exist:///db/apps/sok-application/modules/index-persons.xqm"/> > >> <analyzer > >> class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> > >> <analyzer > >> > class="org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer" > > >> id="nodiacritics"/> > >> <text qname="adcache"> > >> <field name="basicId" > >> expression="//person/@id/string()"/> > >> <field name="fullname" > >> expression="string(./basic/person/name)"/> > >> <field name="gndURI" > >> expression="string(./basic/person/identifier[@preferred eq > 'YES'])"/> > >> <field name="gndID" > >> > expression="substring-after(./basic/person/identifier[@preferred eq > >> 'YES']/string(), '/gnd/')"/> > >> <field name="status" > >> expression="./basic/person/@status/string()"/> > >> <field name="articleID" > >> expression="ip:getArticleFromPersonCache(.)"/> > >> <field name="articleRole" > >> expression="ip:getArticleRoleFromPersonCache(.)"/> > >> <field name="fulltext" > expression="ip:getFullText(.)"/> > >> </text> > >> </lucene> > >> </index> > >> </collection> > >> > >> Module Functions: > >> > >> module namespace ip = > >> "http://place.sok.org/xquery/index-persons"; > >> > >> declare namespace basic = > >> "http://place.sok.org/xquery/basic" ; > >> > >> declare namespace xs = > "http://www.w3.org/2001/XMLSchema"; > >> declare namespace tei = > "http://www.tei-c.org/ns/1.0"; > >> declare namespace util = > "http://exist-db.org/xquery/util"; > >> > >> declare function ip:getArticleFromPersonCache($adcache as > element()) > >> as xs:string* { > >> let $parentCollectionPath as xs:anyURI? := > >> ip:getParentCollection($adcache), > >> $basicId as xs:string := > $adcache/basic/person/@id/string(), > >> $identifier as xs:string? := > >> $adcache/basic/person/identifier[@preferred eq 'YES']/string(), > >> $listId as xs:string? := collection( > >> $variables:jwo-lists-path )/tei:TEI//tei:person[ > >> basic:basic-id-from-url(string(@sameAs)) eq > $basicId]/@xml:id/string(), > >> $foundInDocumentIds as xs:string* := > >> if ( matches($parentCollectionPath,'prepublish') ) > >> then > >> ( > >> > collection($parentCollectionPath)/tei:TEI[./tei:teiHeader//tei:idno[1]/string() > > >> ne > >> > ''][matches(replace((normalize-space('||'||string-join(distinct-values(.//tei:persName[@ref]/@ref/string()) > > >> ! replace(.,'.*?#',''), > >> > '||')||'||')||normalize-space('||'||string-join(distinct-values(.//tei:persName[@source]/@source/string()) > > >> ! replace(.,'.*?#',''), > >> > '||')||'||')),'\|{4}',''),'\|{2}('||$basicId||'|'||$listId||'|'||$identifier||')\|{2}')]//tei:idno/string() > >> ) > >> else > >> ( > >> > collection($parentCollectionPath)/tei:TEI[./tei:teiHeader//tei:idno[1]/string() > > >> ne > >> > ''][matches(replace((normalize-space('||'||string-join(distinct-values(.//tei:persName[@ref][not(parent::editor)]/@ref/string()) > > >> ! replace(.,'.*?#',''), > >> > '||')||'||')||normalize-space('||'||string-join(distinct-values(.//tei:persName[@source][not(parent::editor)]/@source/string()) > > >> ! replace(.,'.*?#',''), > >> > '||')||'||')),'\|{4}',''),'\|{2}('||$basicId||'|'||$listId||'|'||$identifier||')\|{2}')]//tei:idno/string() > >> ) > >> return > >> ( > >> $foundInDocumentIds > >> ) > >> }; > >> > >> declare > >> function ip:getAuthenticatedArticleCollection($collection-name as > >> xs:string) as item()* { > >> if ($collection-name eq 'prepublish') then > >> xmldb:xcollection($variables:jwo-prepublish-path) else > >> xmldb:xcollection($variables:jwo-publish-path) > >> }; > >> > >> declare function > >> ip:getPersNamesInCollectionFromCachedPerson($cached-person as > >> element(), $collection-name as xs:string) as element()* { > >> let $basicId := $cached-person/basic/person/@id/string() > >> let $identifier := > >> $cached-person/basic/person/identifier[@preferred eq > 'YES']/string() > >> let $listId := collection( $variables:jwo-lists-path > >> )/tei:TEI/tei:text[1]/tei:body[1]/tei:listPerson[1]/tei:person[ > >> basic:basic-id-from-url(string(@sameAs)) eq > $basicId]/@xml:id/string() > >> let $collection > >> := ip:getAuthenticatedArticleCollection($collection-name) > >> return ( > >> $collection//tei:persName[ > >> string(@ref) eq $identifier > >> or ip:getIdFromUri(string(@ref)) eq $listId > >> or substring-after(string(@ref), '#') eq $basicId > >> or substring-before(substring-after(string(@source), > >> 'persons/'), '.xml') eq $basicId] > >> ) > >> }; > >> > >> declare function ip:getRoleFromPersName($persName as element(), > >> $collection-name as xs:string) as xs:string? { > >> if ($persName/ancestor::*/local-name() = 'author') > >> then ( 'author' ) > >> else if ($persName/ancestor::*/local-name() = 'editor') > >> then ( > >> if ($collection-name eq 'prepublish') then ( 'editor' ) > >> (: Ignore editors in published case :) > >> else () > >> ) > >> else ( 'annotated' ) > >> }; > >> > >> declare function > ip:getArticleRoleFromPersonCache($cached-person as > >> element(), $collection-name as xs:string) as xs:string* { > >> let $allPersNames := > >> if ($collection-name ne 'prepublish') then ( > >> ip:getPersNamesInCollectionFromCachedPerson($cached-person, > >> $collection-name)[not(ancestor::*/local-name() = 'editor')] > >> ) else ( > >> ip:getPersNamesInCollectionFromCachedPerson($cached-person, > >> $collection-name) > >> ) > >> return ( > >> for $articleGroup in $allPersNames > >> let $articleID := > $articleGroup/ancestor::tei:TEI//tei:idno[1] > >> group by $articleID > >> return ( > >> $articleID || '@@' || string-join(distinct-values( > >> for $persName in $articleGroup > >> let $role := ip:getRoleFromPersName($persName, > >> $collection-name) > >> order by $role > >> return $role > >> ), ' ') > >> ) > >> ) > >> }; > >> > >> declare function ip:getParentCollection($element as node()) as > >> xs:anyURI? { > >> resolve-uri('../../', $element/base-uri()) > >> }; > >> > >> declare function ip:getIdFromUri($uri as xs:string) as xs:string { > >> substring-after($uri, '#') > >> }; > >> > >> declare function basic:basic-id-from-url($url as xs:string) as > >> xs:string? { > >> substring-after(substring-before($url, '?dataset'),'persons/') > >> }; > >> > >> declare function ip:getFullText($element) as xs:string { > >> let $parentCollection as xs:anyURI? := > >> ip:getParentCollection($element) > >> return > >> ( > >> normalize-space(string-join( > >> let $basicId as xs:string := > >> $element/basic/person/@id/string(), > >> $identifier as xs:string* := > >> $element/basic/person/identifier[@preferred eq 'YES']/string(), > >> $listId as item()* := collection( > >> $variables:lists-path > >> )/tei:TEI/tei:text[1]/tei:body[1]/tei:listPerson[1]/tei:person[ > >> basic:basic-id-from-url(string(@sameAs)) eq > $basicId]/@xml:id/string(), > >> $element-string as xs:string* := > >> string($element), > >> $collections as item()* := > >> collection($parentCollection)//tei:persName[string(@ref) eq > >> $identifier or ip:getIdFromUri(string(@ref)) eq $listId or > >> substring-after(string(@ref), '#') eq $basicId or > >> substring-before(substring-after(string(@source), 'persons/'), > >> '.xml') eq $basicId][1], > >> $element-cache-string as xs:string* := > >> string-join(for $found-element in $collections where > >> count($found-element) > 0 return $found-element, ' ') > >> return > >> ( > >> $element-string,$element-cache-string > >> ),' ')) > >> ) > >> }; > >> > >> Please help. > >> > >> > > > > > > _______________________________________________ > > Exist-open mailing list > > Exi...@li... > > https://lists.sourceforge.net/lists/listinfo/exist-open > > -- > Lars Scheideler > - wiss. technischer Mitarbeiter - > Althochdeutsches Wörterbuch & Digital Humanities > > Sächsische Akademie der Wissenschaften zu Leipzig > Karl-Tauchnitz-Straße 1 > 04107 Leipzig > > sch...@sa... > www.saw-leipzig.de <http://www.saw-leipzig.de> > > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > > > > -- > Cu stimă, > Claudius Teodorescu -- Lars Scheideler - wiss. technischer Mitarbeiter - Althochdeutsches Wörterbuch & Digital Humanities Sächsische Akademie der Wissenschaften zu Leipzig Karl-Tauchnitz-Straße 1 04107 Leipzig sch...@sa... www.saw-leipzig.de |
From: Lars S. <sch...@sa...> - 2024-09-03 08:29:59
|
Dear Boris, thank you very much for the answer. The person data and the article data are in different collections. For every person we have one file, for every article we have one file. We don't have a fixed order of when data is imported or written in, as I understand it, what you are suggesting would make it necessary for all the person or item data to always be available before the index of the other collection, so that it can be accessed in the index of the other. Maybe we have to, like you pointed out, tests the module functions for better performance. But in the past however, after several different approaches that we have already tried, we have realized that querying the data, no matter how good it is in eXide, for example, is significantly slower when it is used for the index. Could it be that the structural index is not used when re-indexing? Our assumption was that the data is iterated over differently or that new blobs are written over the structure and content in addition to indexing. In other words, that it validates, saves and indexes the data at the same time as re-indexing. Other approaches include the use of cache or a helper file, where the fields are composed beforehand and then indexed accordingly, but this also takes a long time and unfortunately also blocks working on the files. So if we write a helper file with all fields in eXide, the whole process takes about 90s, if we work with the fields as we would build the index, i.e. with xml:id in the helper file, ~110s. Re-Index 2-4 hours. Not really understandable. We are now primarily trying to improve this using the xml:id/id() function, but not much hope to improve the re-index on a production scale. But if all data is re-indexed, i.e. the xml:id fields are not available, it is in vain, too. Would love to learn more about this topic and continue to share experiences. Lars Am 03.09.24 um 00:49 schrieb Boris Lehečka: > Dear Lars, > > I have similar issues with indexing dictionaries: my indexing > procedure ask for data from the taxonomy (like expansions of > abbreviations) and sometimes indexing a dictionary with about 36,000 > entries took a whole day. > > I don't remember who (Juri Leino, I guess), pointed out to me that > the index is saved only after the whole document is parsed. After > moving each dictionary entry to a separate file, indexing took much > less time (several hours). > > However, this does not seem to be the cause of your problem. > > In my opinion, your indexing code (in the module) is very > complicated, sometimes it can be much simpler, for example without > explicit conversion to string (like in tei:persName[string(@ref) eq > $identifier ...]), or normalizing spaces in ip:getFullText (full-tex t > search usually uses only the parts between spaces). > > My suggestion is following: first, create an index for persons in > a separate collection (with separate collection.xconf): compute fields > with values you will query or want to return when you index the > articles (in different collection). And second, use Lucene and > fulltext search in your "index-persons" module to find data in the > index from the first phase. > > This is just an idea, not tested, I hope someone else is much more > experienced in the magic of indexing. > > Best, > > Boris Lehečka > > Dne 02.09.2024 v 16:57 Lars Scheidelerl napsal(a): >> Hey, >> >> we assume that we are not using the index in our project as intended. >> Because when we try to build the index we have created, it takes a >> very long time. >> >> We have two collections, one with a data stack of 687 in which data >> is stored, and one with 400 xml where articles are are stored. >> >> For the personal article we want certain information from the >> articles and vice versa. >> >> Person XML: >> >> <person xml:id="i0c9ab7e2-2e21-39ff-aea8-c56ad4702a7f" status="safe" >> modified="2024-07-30T13:26:09.154+02:00"> >> <name>Marcanton Zimara</name> >> <identifier >> preferred="YES">https://d-nb.info/gnd/120156784</identifier> >> <alternateName>Marcusantonius Zimara</alternateName> >> <alternateName>Marcus Anthonius Zimara</alternateName> >> <alternateName>Antonius Zimara</alternateName> >> <alternateName>M. Antonius Zimarra</alternateName> >> <alternateName>Marc Antoine Zimara</alternateName> >> <alternateName>M. Anto. Zimare</alternateName> >> <alternateName>Marco A. Zimara</alternateName> >> <alternateName>Marcus A. Zimara</alternateName> >> <alternateName>Marc Ant. Zimara</alternateName> >> <alternateName>Marcantonio Zimara</alternateName> >> <alternateName>Marcus Antonius Zimara</alternateName> >> <alternateName>Marcus Antonius Zimarra</alternateName> >> <alternateName>Marcianto Zimare</alternateName> >> <alternateName>Marco Antonio Zimarra</alternateName> >> <alternateName>Marco Antonio Zimare</alternateName> >> <birthDate>1460</birthDate> >> <deathDate>1532</deathDate> >> <description>JWO</description> >> <sortableName>Zimara, Marcanton </sortableName> >> </person> >> >> Articel XML: >> >> <?xml version="1.0" encoding="UTF-8"?> >> <TEI xmlns="http://www.tei-c.org/ns/1.0"> >> <teiHeader> >> <fileDesc> >> <titleStmt> >> <title>a nihilo nihil fit</title> >> <author> >> <persName >> ref="/db/projects/jwo/data/lists/personenListe.xml#BS_d1e509" >> xml:id="author_BS_d1e509"> >> <forename>Marcanton</forename> >> <surname>Zimara</surname> >> </persName> >> </author> >> </titleStmt> >> <sourceDesc> >> <p xml:id="p_sourceDesc_igw_tvr_pzb">born digital</p> >> </sourceDesc> >> </fileDesc> >> </teiHeader> >> <text xml:lang="de-DE" type="main"> >> <body> >> <div1 xml:id="div1_d1e23_2"> >> <p xml:id="p_d1e27_1" n="1"> Lorem ipsum dolor sit >> amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt >> ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis >> nostrud exercitation <persName xml:id="persName_sa123" >> ref="https://d-nb.info/gnd/120156784" >> rend="smallcaps">Zimara</persName> ullamco laboris nisi ut aliquip ex >> ea commodo consequat. Duis aute irure dolor in reprehenderit in >> voluptate velit esse cillum dolore eu <persName >> xml:id="persName_s123" >> ref="https://d-nb.info/gnd/120156784">Zimara</persName> fugiat nulla >> pariatur. Excepteur sint occaecat cupidatat non proident, sunt in >> culpa qui officia deserunt mollit anim id est laborum. </p> >> </div1> >> </body> >> </text> >> </TEI> >> >> Collection.xconf: >> >> <collection xmlns="http://exist-db.org/collection-config/1.0"> >> <index xmlns:gndo="https://d-nb.info/standards/elementset/gnd#" >> xmlns:owl="http://www.w3.org/2002/07/owl#" >> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >> xmlns:xs="http://www.w3.org/2001/XMLSchema"> >> <lucene> >> <module uri="http://place.sok.org/xquery/index-persons" >> prefix="ip" >> at="xmldb:exist:///db/apps/sok-application/modules/index-persons.xqm"/> >> <analyzer >> class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> >> <analyzer >> class="org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer" >> id="nodiacritics"/> >> <text qname="adcache"> >> <field name="basicId" >> expression="//person/@id/string()"/> >> <field name="fullname" >> expression="string(./basic/person/name)"/> >> <field name="gndURI" >> expression="string(./basic/person/identifier[@preferred eq 'YES'])"/> >> <field name="gndID" >> expression="substring-after(./basic/person/identifier[@preferred eq >> 'YES']/string(), '/gnd/')"/> >> <field name="status" >> expression="./basic/person/@status/string()"/> >> <field name="articleID" >> expression="ip:getArticleFromPersonCache(.)"/> >> <field name="articleRole" >> expression="ip:getArticleRoleFromPersonCache(.)"/> >> <field name="fulltext" expression="ip:getFullText(.)"/> >> </text> >> </lucene> >> </index> >> </collection> >> >> Module Functions: >> >> module namespace ip = >> "http://place.sok.org/xquery/index-persons"; >> >> declare namespace basic = >> "http://place.sok.org/xquery/basic" ; >> >> declare namespace xs = "http://www.w3.org/2001/XMLSchema"; >> declare namespace tei = "http://www.tei-c.org/ns/1.0"; >> declare namespace util = "http://exist-db.org/xquery/util"; >> >> declare function ip:getArticleFromPersonCache($adcache as element()) >> as xs:string* { >> let $parentCollectionPath as xs:anyURI? := >> ip:getParentCollection($adcache), >> $basicId as xs:string := $adcache/basic/person/@id/string(), >> $identifier as xs:string? := >> $adcache/basic/person/identifier[@preferred eq 'YES']/string(), >> $listId as xs:string? := collection( >> $variables:jwo-lists-path )/tei:TEI//tei:person[ >> basic:basic-id-from-url(string(@sameAs)) eq $basicId]/@xml:id/string(), >> $foundInDocumentIds as xs:string* := >> if ( matches($parentCollectionPath,'prepublish') ) >> then >> ( >> collection($parentCollectionPath)/tei:TEI[./tei:teiHeader//tei:idno[1]/string() >> ne >> ''][matches(replace((normalize-space('||'||string-join(distinct-values(.//tei:persName[@ref]/@ref/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')||normalize-space('||'||string-join(distinct-values(.//tei:persName[@source]/@source/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')),'\|{4}',''),'\|{2}('||$basicId||'|'||$listId||'|'||$identifier||')\|{2}')]//tei:idno/string() >> ) >> else >> ( >> collection($parentCollectionPath)/tei:TEI[./tei:teiHeader//tei:idno[1]/string() >> ne >> ''][matches(replace((normalize-space('||'||string-join(distinct-values(.//tei:persName[@ref][not(parent::editor)]/@ref/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')||normalize-space('||'||string-join(distinct-values(.//tei:persName[@source][not(parent::editor)]/@source/string()) >> ! replace(.,'.*?#',''), >> '||')||'||')),'\|{4}',''),'\|{2}('||$basicId||'|'||$listId||'|'||$identifier||')\|{2}')]//tei:idno/string() >> ) >> return >> ( >> $foundInDocumentIds >> ) >> }; >> >> declare >> function ip:getAuthenticatedArticleCollection($collection-name as >> xs:string) as item()* { >> if ($collection-name eq 'prepublish') then >> xmldb:xcollection($variables:jwo-prepublish-path) else >> xmldb:xcollection($variables:jwo-publish-path) >> }; >> >> declare function >> ip:getPersNamesInCollectionFromCachedPerson($cached-person as >> element(), $collection-name as xs:string) as element()* { >> let $basicId := $cached-person/basic/person/@id/string() >> let $identifier := >> $cached-person/basic/person/identifier[@preferred eq 'YES']/string() >> let $listId := collection( $variables:jwo-lists-path >> )/tei:TEI/tei:text[1]/tei:body[1]/tei:listPerson[1]/tei:person[ >> basic:basic-id-from-url(string(@sameAs)) eq $basicId]/@xml:id/string() >> let $collection >> := ip:getAuthenticatedArticleCollection($collection-name) >> return ( >> $collection//tei:persName[ >> string(@ref) eq $identifier >> or ip:getIdFromUri(string(@ref)) eq $listId >> or substring-after(string(@ref), '#') eq $basicId >> or substring-before(substring-after(string(@source), >> 'persons/'), '.xml') eq $basicId] >> ) >> }; >> >> declare function ip:getRoleFromPersName($persName as element(), >> $collection-name as xs:string) as xs:string? { >> if ($persName/ancestor::*/local-name() = 'author') >> then ( 'author' ) >> else if ($persName/ancestor::*/local-name() = 'editor') >> then ( >> if ($collection-name eq 'prepublish') then ( 'editor' ) >> (: Ignore editors in published case :) >> else () >> ) >> else ( 'annotated' ) >> }; >> >> declare function ip:getArticleRoleFromPersonCache($cached-person as >> element(), $collection-name as xs:string) as xs:string* { >> let $allPersNames := >> if ($collection-name ne 'prepublish') then ( >> ip:getPersNamesInCollectionFromCachedPerson($cached-person, >> $collection-name)[not(ancestor::*/local-name() = 'editor')] >> ) else ( >> ip:getPersNamesInCollectionFromCachedPerson($cached-person, >> $collection-name) >> ) >> return ( >> for $articleGroup in $allPersNames >> let $articleID := $articleGroup/ancestor::tei:TEI//tei:idno[1] >> group by $articleID >> return ( >> $articleID || '@@' || string-join(distinct-values( >> for $persName in $articleGroup >> let $role := ip:getRoleFromPersName($persName, >> $collection-name) >> order by $role >> return $role >> ), ' ') >> ) >> ) >> }; >> >> declare function ip:getParentCollection($element as node()) as >> xs:anyURI? { >> resolve-uri('../../', $element/base-uri()) >> }; >> >> declare function ip:getIdFromUri($uri as xs:string) as xs:string { >> substring-after($uri, '#') >> }; >> >> declare function basic:basic-id-from-url($url as xs:string) as >> xs:string? { >> substring-after(substring-before($url, '?dataset'),'persons/') >> }; >> >> declare function ip:getFullText($element) as xs:string { >> let $parentCollection as xs:anyURI? := >> ip:getParentCollection($element) >> return >> ( >> normalize-space(string-join( >> let $basicId as xs:string := >> $element/basic/person/@id/string(), >> $identifier as xs:string* := >> $element/basic/person/identifier[@preferred eq 'YES']/string(), >> $listId as item()* := collection( >> $variables:lists-path >> )/tei:TEI/tei:text[1]/tei:body[1]/tei:listPerson[1]/tei:person[ >> basic:basic-id-from-url(string(@sameAs)) eq $basicId]/@xml:id/string(), >> $element-string as xs:string* := >> string($element), >> $collections as item()* := >> collection($parentCollection)//tei:persName[string(@ref) eq >> $identifier or ip:getIdFromUri(string(@ref)) eq $listId or >> substring-after(string(@ref), '#') eq $basicId or >> substring-before(substring-after(string(@source), 'persons/'), >> '.xml') eq $basicId][1], >> $element-cache-string as xs:string* := >> string-join(for $found-element in $collections where >> count($found-element) > 0 return $found-element, ' ') >> return >> ( >> $element-string,$element-cache-string >> ),' ')) >> ) >> }; >> >> Please help. >> >> > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open -- Lars Scheideler - wiss. technischer Mitarbeiter - Althochdeutsches Wörterbuch & Digital Humanities Sächsische Akademie der Wissenschaften zu Leipzig Karl-Tauchnitz-Straße 1 04107 Leipzig sch...@sa... www.saw-leipzig.de |