I am not sure that there is any simple solution to the compression
problem I explain belowbut may be...
I am using eXist to store 150 Mo of XML in UTF-8 on a CDROM.
I actually get a 2.67 expansion factor.
I use the native backend andI have the compress option to "true" in my
I'd like to reduce this expansion factor as I have to store also 400Mo
of images on the CD.
so, is it possible to compress more the dbx files, while exist can still
use them in read only mode ?
Are the dbx files sizes sensible to the length of the xml tag and
attribute names ?
thanks for any help,
From: Wolfgang Meier <meier@if...> - 2002-03-11 12:28:20
> I am using eXist to store 150 Mo of XML in UTF-8 on a CDROM.
> I actually get a 2.67 expansion factor.
> I use the native backend andI have the compress option to "true" in my
Indexing usually needs a lot of disk space. Most pages are used by the=20
B+-Trees and for a B+-Tree pages are only guaranteed to be at least half=20
full. I still have some ideas how to reduce disk usage (e.g. by using les=
bytes for the keys) but I think we will never achieve an expansion factor=
better than 2.
Yet one way to reduce storage sizes would be to restrict fulltext indexin=
distinct parts of your documents. Thus I ported some of the code I alread=
wrote for the relational backend to the native backend. The native backen=
now processes indexing information it finds in conf.xml, e.g.:
<index doctype=3D"PLAY" default=3D"none" attributes=3D"false">
This way I've been able to reduce the size of words.dbx for some sample=20
collections by half leaving away attributes and rarely used elements. You=
still query non-indexed elements with =3D, contains or starts-with but fu=
functions will not find any results.
I put my latest development version into CVS (accessible via sourceforge)=
current CVS module is called eXist-0.8. Hope it works (I'll be in holiday=
the next 7 days).