From: Bryan T. <br...@sy...> - 2006-07-23 22:31:38
|
Jason, This sounds like it is quite achievable. I would use a single store = file to contain the repository records and the indices. It sounds like you have several indices based on the metadata in the repository records. That = is fine - you can create multiple indices over the same records and store = the records and the indices in a single file. Please note that jdbm does = not maintain the relationship between the metadata records and the indices = -- that is something that you would have to do yourself on insert, update, = and delete. Based on what you have outlined, it seems that there is a short-term requirement to have the file exist outside of the store and to use the metadata records and indices to avoid filename length problems (reminds = me of browser URL limitation workaround ;-) and for fast access. In the = longer term, it might make sense to store the files themselves within the = store. While jdbm is not well-oriented to this last task (storing the files in = the store) today, that is something on my short list. Thanks, -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...] On Behalf Of Jason Dillon Sent: Sunday, July 23, 2006 1:27 PM To: Bryan Thompson Cc: jdb...@li...; 'Trygve Laugst=F8l' Subject: Re: [Jdbm-general] Virtual file system based on JDBM The initial idea was to use JDBM to avoid really long paths on =20 windows that sometimes happens with the Maven 2 repository format. For example, Apache Geronimo has a sample app called DayTrader, which =20 uses this groupId: org.apache.geronimo.samples.daytrader And then artifactId's like: daytrader-wsappclient Say we are using 1.2-SNAPSHOT for version and jar for type, then to =20 put into a Maven 2 repository, this particular artifact would need to =20 be put into a file named: org/apache/geronimo/sampes/daytrader/daytrader-wsappclient/1.2-=20 SNAPSHOT/daytrader-wsappclient-1.2-SNAPSHOT.jar Thats 110 chars, almost 50% of the length namespace on windows. If =20 users extract the Geronimo dist in their desktop (typical), then that =20 means that the 'org' directory is relaly under something like: c:/documents and settings/some user name/desktop/geronimo-=20 jetty-1.2-SNAPSHOT/repository Which adds another 70-80 chars (87 in this case), which brings us to =20 197 and much closer to the limit.. For more nested components (which have longer groupId's) or more =20 descriptive (longer) artifactId's the long file is easily breached... =20 and with no control over where users put these files to play with it =20 is really hard to predict how a dist will behave. * * * I'd be happy to put everything into the JDBM database if that was =20 scalable and efficient. Since I'm not that familiar with JDBM I =20 can't say one way or the other, hence I had leaned towards only =20 putting metadata into the JDBM and leaving the content on a normal =20 file system, using this format: hash (of groupId, artifactId, version) / artifactId-version.type I could just use a hash for everything, but I thought it would be =20 better to at least have some idea what the file was. This scheme =20 reduces the chances of running into the windows long filename problem =20 significantly, and maybe almost enough to say it is fixed for most =20 sane usages. And if really needed, the filename could just be the =20 hash (of groupId, artifactId, version, type) to reduce the chance of =20 long file names even further. But we need a way to quickly lookup artifacts by full or partial =20 artifact definition. For example, may way to list all artifacts in =20 with a groupId=3Dorg.apache.geronimo.samples.daytrader and type=3Djar. = This is where the JDBM metadata and indexes come in I believe. Another small issue, is that the Apache Geronimo repository API =20 (roughly based on the Maven repo concept, but not using the Maven =20 code) expects java.io.File objects to operate upon. I hope to fix =20 that, but in the short-term it will be easier to just have a =20 java.io.File to pass up for consumption in the short-term. * * * More generally... I would still like to see a VFS filesystem backed =20 fully by JDBM. If such a VFS provider existed, then I would have =20 just used it and modified out repo impl to delegate to it and keep =20 using the Maven2 repository format. My guess is that it would be relatively simple to implement... =20 assuming full understanding of the constraints of using JDBM. Maybe =20 a store + btree for filesystem nodes and then another store for file =20 data. --jason On Jul 23, 2006, at 4:38 AM, Bryan Thompson wrote: > I had sort of assumed that there was an interest in being able to ship > around these file systems as maven repositories and that =20 > encapsulation was > therefore of interest. If you are going to use the file system to =20 > support a > virtual file system, then what is the point of using a persistence =20 > store for > the indices? -bryan > > -----Original Message----- > From: jdb...@li... > [mailto:jdb...@li...] On Behalf Of =20 > Trygve > Laugst=F8l > Sent: Saturday, July 22, 2006 1:23 PM > To: jdb...@li... > Subject: Re: [Jdbm-general] Virtual file system based on JDBM > > Bryan Thompson wrote: >> One other thing. Jdbm uses serializers to transform objects into =20 >> byte[]s. >> This works great for many record oriented applications. However, I > presume >> that you will be storing moderately large files, e.g, up to a few > megabytes >> for larger maven dependencies. As things stand, you will have to =20 >> have > those >> files in memory when they are serialized (using a NOP =20 >> serialization since > I >> presume that they will already be represented as a byte[]). > > For this particular use-case (and most other use cases where people =20 > want > to store file data in databases) I would recommend storing the file > content outside of the database itself using the primary key as the > filename + ".data" or something similar. > > -- > Trygve > > ---------------------------------------------------------------------- = > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to =20 > share your > opinions on IT & business topics through brief surveys -- and earn =20 > cash > http://www.techsay.com/default.php?=20 > page=3Djoin.php&p=3Dsourceforge&CID=3DDEVDEV > _______________________________________________ > Jdbm-general mailing list > Jdb...@li... > https://lists.sourceforge.net/lists/listinfo/jdbm-general > > > ---------------------------------------------------------------------- = > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to =20 > share your > opinions on IT & business topics through brief surveys -- and earn =20 > cash > http://www.techsay.com/default.php?=20 > page=3Djoin.php&p=3Dsourceforge&CID=3DDEVDEV > _______________________________________________ > Jdbm-general mailing list > Jdb...@li... > https://lists.sourceforge.net/lists/listinfo/jdbm-general -------------------------------------------------------------------------= Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share = your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV _______________________________________________ Jdbm-general mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-general |