From: Hungerburg <pc...@my...> - 2013-06-30 10:13:03
|
Am 2013-06-30 09:41, schrieb Jens Østergaard Petersen: > > Since you have one million files, I would divide them into 256 > collections (00 to FF) and store the files in them according to the > first characters of their UUID (< 4000 in each). Then you only have to > worry about 256 __contents__.xml! Regarding binary data, eXist is a database on top of another database (the filesystem), isn't it? Now, if there are more than 100 files, some or many non-xml data, per each of one million records, if organising the catalogue in 256 bins, that would give hundreds of thousands files per directory... This number mandates, in my view, a filesystem benchmark, or at least, the underlying filesystem should be chosen wisely. > > On Jun 29, 2013, at 6:03 PM, easy <li...@12... > <mailto:li...@12...>> wrote: > >> Hi,Jens, >> >> Thanks. >> I have described my application scene there before. I plan to manage >> resident's electronic health record for a city,which has more than >> 1,000.000 people. because the number of people is large ,everyone has >> more than 100 files, I have no good way to organize the db's >> structure, I found this way is a option. >> I plan to map the people's ID into path of collection. for example: a >> man with ID: 510210195502043434, I put all his EHR file into : >> /db/510210/195502/043434. so when I want query this man's record with >> ID,I can only query in collection('/db/510210/195502/043434'), the >> performance is good I think. and there will be no much more files in a >> collection (if there are 1,000,000 file in a collection, what's the >> result for query? reindex? ) >> so .. >> >> Because the data is important,so I need do backup at everyday. >> because it's large, so I create a full backup first , then I create >> incremental backup everyday, but I found there is more than 1G for >> incremental backup file even there is few update for db. |