From: Alberto S. <has...@gm...> - 2025-05-20 08:17:45
|
Hello, Michael Thanks for sharing your use case. Indeed, it might get useful Thanks On Mon, May 19, 2025 at 2:23 PM Michael Westbay < wes...@ja...> wrote: > Hi Alberto, > > For me, splitting them makes them more manageable when I am going through > a given collection with a WebDAV editor. > > For example, I have a database of baseball players. The XML file for a > given player is in the format: "surname-givenname.xml." I sort them under > the persons collection as: > > [image: image.png] > > Each first letter is divided into two or three letter sub-collections. I > try to keep each to around 100 names each, but as the database grows, some > have grown as large as 300 names. That usually means that I want to divide > it up some more. (The _ collection is for names in Kanji -- Japanese > characters.) > > The reason I break them up is because WebDAV is really slow when there are > a lot of files in a single collection. If I only processed the XML files, > it wouldn't be an issue. But I often go in and manually edit files, so the > hierarchy helps. > > A quick count of the number of players I have: > > xquery version "3.0"; > > let $start-time := current-dateTime() > let $players := collection('/db/uni/persons')/*:person > let $count := count($players) > let $end-time := current-dateTime() > > return <result start-time="{$start-time}" end-time="{$end-time}" > count="{$count}"/> > > <result start-time="2025-05-19T22:20:24.288+09:00" > end-time="2025-05-19T22:20:24.288+09:00" count="43434"></result> > > Looks like it's pretty much instantaneous to get 43,434 players. In > reality, it took a couple of seconds to display the result. > > > 2025年5月19日(月) 20:12 Alberto Simões <has...@gm...>: > >> Hello, Michael >> >> I cannot split them so that I can specify different collection names. >> In that case, splitting does not bring any additional value? >> >> Thanks >> >> On Mon, May 19, 2025 at 10:25 AM Michael Westbay < >> wes...@ja...> wrote: >> >>> Hi Alberto, >>> >>> collection("/db/records")/record will match all <record>...</record> >>> documents under /db/records and sub-folders (sub-collections?). >>> >>> If you can organize them by date (year sub-folders), including that in >>> the collection parameter will mean less records to search. And all >>> sub-folders under that collection will still be included in the XPath >>> search. >>> >>> >>> >>> 2025年5月19日(月) 17:23 Alberto Simões <has...@gm...>: >>> >>>> Hello >>>> >>>> Are there differences in terms of performance between having a large >>>> collection (150k docs) with or without a folder structure? >>>> >>>> I want to treat them as a single collection, but I don't know if it >>>> helps to have sub-collections to organise them, or if that is irrelevant to >>>> eXist. >>>> >>>> I appreciate any help you can provide. >>>> Alberto >>>> >>>> -- >>>> Alberto Simões >>>> _______________________________________________ >>>> Exist-open mailing list >>>> Exi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/exist-open >>>> >>> >>> >>> -- >>> Michael Westbay >>> Writer/System Administrator >>> http://www.japanesebaseball.com/ >>> >> >> >> -- >> Alberto Simões >> > > > -- > Michael Westbay > Writer/System Administrator > http://www.japanesebaseball.com/ > -- Alberto Simões |