Re: [Exist-open] Backup of collections with more than 1000000 sub-collections.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

now ,only 50000 collections and xml file, no update, the incremental backup file size is 60M. but the full backup file size only 178M.

And I have a question:  if I make a full backup at date 2013-06-20, everyday do a incremental backup,   when I do recovery, Do I must use all of incremental backup files and full backup file? 

--

此致

   easy

莫愁前路无知己，天下谁人不识君。

在 2013-06-30 00:03:26，easy <li...@12...> 写道：

Hi，Jens，

   Thanks.

 I have described my application scene there before. I plan to manage  resident's electronic health record for a city,which has more than 1,000.000 people.  because the number of people is large ,everyone has more than 100 files,  I have no good way to organize the db's structure, I found this way is a option.
 I plan to map the people's ID into path of collection. for example: a man with ID: 510210195502043434, I put all his EHR file into : /db/510210/195502/043434. so when I want query this man's record with ID,I can only query in collection('/db/510210/195502/043434'), the performance is good I think. and there will be no much more files in a collection (if there are 1,000,000 file in a collection, what's the result for query? reindex? ) 
  so ..

  Because the data is important,so I need do backup at everyday.  because it's large, so I create a full backup  first , then I create incremental backup everyday, but I found there is more than 1G for incremental backup file even there is few update for db. 

--

此致

莫愁前路无知己，天下谁人不识君。

At 2013-06-29 23:22:07,"Jens Østergaard Petersen" <oe...@gm...> wrote:
Hi Xiaodong,

I think everyone can see your problem, but can you describe a restore process that does not have __contents__.xml when collections or resources have not been changed? I am not saying that this is impossible, but a restore process like that will be a lot more complicated than the present restore process. Perhaps things as important as backup and restore have to be kept simple, even if the backups take up a lot of space?

I do not know the structure of your data, but perhaps it is not necessary for one person to have one collection?

One thing that is odd is that (as can be seen below) for each collection and resource listed in __contents__.xml both a "name" and a "filename" is given. These are not identical, since "name" is percent escaped and "filename" not. Having only one of these will not solve your problem, but I don't see why this information has to be duplicated. 

Jens

On Jun 29, 2013, at 9:13 AM, easy <li...@12...> wrote:

This is the __context__.xml example:

<collection xmlns="http://exist.sourceforge.net/NS/exist" name="/db/XMLDB/130635/19780718/1468" version="1" owner="admin" group="dba" mode="755" created="2013-04-10T08:00:02.15+08:00">
    <acl entries="0" version="1"/>
    <subcollection name="A0201" filename="A0201"/>
    <subcollection name="A0202" filename="A0202"/>
    <subcollection name="A0203" filename="A0203"/>
    <subcollection name="A0204" filename="A0204"/>
    <subcollection name="A0103" filename="A0103"/>
    <subcollection name="B0006" filename="B0006"/>
    <subcollection name="B0001" filename="B0001"/>
    <subcollection name="B0002" filename="B0002"/>
    <subcollection name="A0402" filename="A0402"/>
    <subcollection name="B0004" filename="B0004"/>
    <subcollection name="B0003" filename="B0003"/>
    <subcollection name="A0401" filename="A0401"/>
    <subcollection name="A0301" filename="A0301"/>
    <subcollection name="A0302" filename="A0302"/>
</collection>
--------------------
 there is only info about the collection "/db/XMLDB/130635/19780718/1468" creted time, version, no update time, if the incremental backup done each hour, this collection no updation, but the incremental file will include this file in each time and is same .   is needed?

--

此致

  easy

莫愁前路无知己，天下谁人不识君。

At 2013-06-29 15:05:47,easy <li...@12...> wrote:

I don't know why need to add a __context__.xml in every collection,even there is no update in the collection and its children collection?
 This will cause a large number of empty __context__.xml (just the child collection name list ) in  increment backup file for a db with large number of collection, and  more is there need to create each collection in backup file even for which no updated?

So ,I think ,the important is the method to check if there is any different from last one or not. currently implemented will cause lot of diskspace lost for large number of collection.

--

此致

easy

莫愁前路无知己，天下谁人不识君。

At 2013-06-28 20:37:37,"Dmitriy Shabanov" <sha...@gm...> wrote:

I didn't manage to check, but I think that this __context__.xml must have this information because when you run next time backup it should check if there is any different from last one or not.

As alternative I see only "archive" flag .... any other idea? 

On Fri, Jun 28, 2013 at 2:59 PM, easy <li...@12...> wrote:

  I find  a problem about  backup.   I have a exist-db with more than 1000,000 collection( how to store everyone's EHR in a large city?) , I make a backup job, with inremental backup = true, but I find ,the backup file include a __contexts.xml__  for every collection even there has no update in the collection, so the incremental  backup file size always more than 100M (just list the collection ,__contents.xml__).
  so ,the backup method need to think  how to deal with a db or collection with large number of child collecions.is it right?

--
Dmitriy Shabanov

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev_______________________________________________
Exist-open mailing list
Exi...@li...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: [Exist-open] Backup of collections with more than 1000000 sub-collections.

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Backup of collections with more than 1000000 sub-collections.