From: Jo C. <Jo....@ha...> - 2025-05-21 09:51:03
|
Hi all, If you do a lot of element lookups against a highly structured set of collections (for example, .../0/0, .../0/1 for storing files under uuids), you end up having to put a read lock on every subcollection in turn (256+16+1=273 collections in the uuid case) for each search. Best regards, -- Jo On Tue, May 20, 2025 at 9:17 AM Alberto Simões <has...@gm...> wrote: > Hello, Michael > > Thanks for sharing your use case. > Indeed, it might get useful > Thanks > > On Mon, May 19, 2025 at 2:23 PM Michael Westbay < > wes...@ja...> wrote: > >> Hi Alberto, >> >> For me, splitting them makes them more manageable when I am going through >> a given collection with a WebDAV editor. >> >> For example, I have a database of baseball players. The XML file for a >> given player is in the format: "surname-givenname.xml." I sort them >> under the persons collection as: >> >> [image: image.png] >> >> Each first letter is divided into two or three letter sub-collections. I >> try to keep each to around 100 names each, but as the database grows, some >> have grown as large as 300 names. That usually means that I want to divide >> it up some more. (The _ collection is for names in Kanji -- Japanese >> characters.) >> >> The reason I break them up is because WebDAV is really slow when there >> are a lot of files in a single collection. If I only processed the XML >> files, it wouldn't be an issue. But I often go in and manually edit files, >> so the hierarchy helps. >> >> A quick count of the number of players I have: >> >> xquery version "3.0"; >> >> let $start-time := current-dateTime() >> let $players := collection('/db/uni/persons')/*:person >> let $count := count($players) >> let $end-time := current-dateTime() >> >> return <result start-time="{$start-time}" end-time="{$end-time}" >> count="{$count}"/> >> >> <result start-time="2025-05-19T22:20:24.288+09:00" >> end-time="2025-05-19T22:20:24.288+09:00" count="43434"></result> >> >> Looks like it's pretty much instantaneous to get 43,434 players. In >> reality, it took a couple of seconds to display the result. >> >> >> 2025年5月19日(月) 20:12 Alberto Simões <has...@gm...>: >> >>> Hello, Michael >>> >>> I cannot split them so that I can specify different collection names. >>> In that case, splitting does not bring any additional value? >>> >>> Thanks >>> >>> On Mon, May 19, 2025 at 10:25 AM Michael Westbay < >>> wes...@ja...> wrote: >>> >>>> Hi Alberto, >>>> >>>> collection("/db/records")/record will match all <record>...</record> >>>> documents under /db/records and sub-folders (sub-collections?). >>>> >>>> If you can organize them by date (year sub-folders), including that in >>>> the collection parameter will mean less records to search. And all >>>> sub-folders under that collection will still be included in the XPath >>>> search. >>>> >>>> >>>> >>>> 2025年5月19日(月) 17:23 Alberto Simões <has...@gm...>: >>>> >>>>> Hello >>>>> >>>>> Are there differences in terms of performance between having a large >>>>> collection (150k docs) with or without a folder structure? >>>>> >>>>> I want to treat them as a single collection, but I don't know if it >>>>> helps to have sub-collections to organise them, or if that is irrelevant to >>>>> eXist. >>>>> >>>>> I appreciate any help you can provide. >>>>> Alberto >>>>> >>>>> -- >>>>> Alberto Simões >>>>> _______________________________________________ >>>>> Exist-open mailing list >>>>> Exi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/exist-open >>>>> >>>> >>>> >>>> -- >>>> Michael Westbay >>>> Writer/System Administrator >>>> http://www.japanesebaseball.com/ >>>> >>> >>> >>> -- >>> Alberto Simões >>> >> >> >> -- >> Michael Westbay >> Writer/System Administrator >> http://www.japanesebaseball.com/ >> > > > -- > Alberto Simões > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |