Re: [Exist-open] Organizing large collection

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

If you do a lot of element lookups against a highly structured set of
collections (for example, .../0/0, .../0/1 for storing files under uuids),
you end up having to put a read lock on every subcollection in turn
(256+16+1=273 collections in the uuid case) for each search.

Best regards, -- Jo

On Tue, May 20, 2025 at 9:17 AM Alberto Simões <has...@gm...> wrote:

> Hello, Michael
>
> Thanks for sharing your use case.
> Indeed, it might get useful
> Thanks
>
> On Mon, May 19, 2025 at 2:23 PM Michael Westbay <
> wes...@ja...> wrote:
>
>> Hi Alberto,
>>
>> For me, splitting them makes them more manageable when I am going through
>> a given collection with a WebDAV editor.
>>
>> For example, I have a database of baseball players. The XML file for a
>> given player is in the format: "surname-givenname.xml." I sort them
>> under the persons collection as:
>>
>> [image: image.png]
>>
>> Each first letter is divided into two or three letter sub-collections. I
>> try to keep each to around 100 names each, but as the database grows, some
>> have grown as large as 300 names. That usually means that I want to divide
>> it up some more. (The _ collection is for names in Kanji -- Japanese
>> characters.)
>>
>> The reason I break them up is because WebDAV is really slow when there
>> are a lot of files in a single collection. If I only processed the XML
>> files, it wouldn't be an issue. But I often go in and manually edit files,
>> so the hierarchy helps.
>>
>> A quick count of the number of players I have:
>>
>> xquery version "3.0";
>>
>> let $start-time := current-dateTime()
>> let $players := collection('/db/uni/persons')/*:person
>> let $count := count($players)
>> let $end-time := current-dateTime()
>>
>> return <result start-time="{$start-time}" end-time="{$end-time}"
>> count="{$count}"/>
>>
>> <result start-time="2025-05-19T22:20:24.288+09:00"
>> end-time="2025-05-19T22:20:24.288+09:00" count="43434"></result>
>>
>> Looks like it's pretty much instantaneous to get 43,434 players. In
>> reality, it took a couple of seconds to display the result.
>>
>>
>> 2025年5月19日(月) 20:12 Alberto Simões <has...@gm...>:
>>
>>> Hello, Michael
>>>
>>> I cannot split them so that I can specify different collection names.
>>> In that case, splitting does not bring any additional value?
>>>
>>> Thanks
>>>
>>> On Mon, May 19, 2025 at 10:25 AM Michael Westbay <
>>> wes...@ja...> wrote:
>>>
>>>> Hi Alberto,
>>>>
>>>> collection("/db/records")/record will match all <record>...</record>
>>>> documents under /db/records and sub-folders (sub-collections?).
>>>>
>>>> If you can organize them by date (year sub-folders), including that in
>>>> the collection parameter will mean less records to search. And all
>>>> sub-folders under that collection will still be included in the XPath
>>>> search.
>>>>
>>>>
>>>>
>>>> 2025年5月19日(月) 17:23 Alberto Simões <has...@gm...>:
>>>>
>>>>> Hello
>>>>>
>>>>> Are there differences in terms of performance between having a large
>>>>> collection (150k docs) with or without a folder structure?
>>>>>
>>>>> I want to treat them as a single collection, but I don't know if it
>>>>> helps to have sub-collections to organise them, or if that is irrelevant to
>>>>> eXist.
>>>>>
>>>>> I appreciate any help you can provide.
>>>>> Alberto
>>>>>
>>>>> --
>>>>> Alberto Simões
>>>>> _______________________________________________
>>>>> Exist-open mailing list
>>>>> Exi...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/exist-open
>>>>>
>>>>
>>>>
>>>> --
>>>> Michael Westbay
>>>> Writer/System Administrator
>>>> http://www.japanesebaseball.com/
>>>>
>>>
>>>
>>> --
>>> Alberto Simões
>>>
>>
>>
>> --
>> Michael Westbay
>> Writer/System Administrator
>> http://www.japanesebaseball.com/
>>
>
>
> --
> Alberto Simões
> _______________________________________________
> Exist-open mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-open
>

Re: [Exist-open] Organizing large collection

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Organizing large collection