From: Kern S. <ke...@si...> - 2004-08-24 18:35:51
|
On Tue, 2004-08-24 at 00:43, Christopher Hull wrote: > Kern Sibbald wrote: ... > > > >postgresql seems to be pretty slow for inserts. I guess it is optimized > >for lookups. > > > >This change might give you problems (maybe not) as Bacula expects only > >one filename entry to be in the File table. If it finds multiple ones, > >I'm not sure whether it just prints an warning message or creates and > >error. > > > That should not happen since I'm starting with an empty database, and I > don't allow any job to be imported more than once. Unless you have a very unusual filesystem, you are going to have some duplicate filenames on it. In that case, you will definitely get multiple entries in the database: /etc/abc /lib/abc There are two files with the same name there. ... > I need to avoid doing just that, so I need to copy the data to tape in > such a way that it will be possible for bacula to think it had > originally wrote them that way. OK, that is a real copy in the sense of updating the database rather than just copying the data. It requires a record by record copy so that the database can be updated at the same time -- as you say, it is sort of a bcopy with bscan merged in. ... > > > If I need to populate a new database won't I need to look at every > record anyway? I think this is what bscan does If I'm not mistaken. Yes. > > >If you are looking for a feature where you would have a current database > >that corresponds to what is on a tape to be archived or taken offsite, > >then you will probably want something within Bacula that I have been > >planning for the release after 1.35, which is a Copy job and a Migrate > >job. A Copy job would make a copy of an existing job or existing jobs, > >and Migrate would move the job/jobs to the new Volume from the old disk > >or tape Volume. It sounds a bit like you want to do this, but starting > >from an empty database. > > > I think this sound very close to what I'm looking to do. OK ... > Yes! Restored from the new database, directly from the new media (tape > volumes). ... > More than likely, the database will permanetly online, on our > appliance. I have not decided whether to try to do an offline copy of > the database as well. ... > Well perhaps "modified spooling" was a bad choice of terminology. I was > thinking of it as spooling with a permanent on-disk storage (instead of > the transient on disk storge as spooling exists today) but it sounds > like your Migrate and Copy job is very much along the lines that I am > thinking of only with a 2nd database keeping track of the contents of > the migrated/copied jobs. So basically what I need to do is copy a job > or set of jobs from their disk based storage to tape based storage, > whilst storing the copied jobs information in a separate database so > that the job's may be restored directly from tape. And come to think of > It I'll have to figure out a way to have a easily restorable copy of the > database on the last or a separate tape, so the tapes can be used > effciently in the event the residient (ondisk) database is not available > for what ever reason. Now to answer your question: My main question is, in your opinion will it be easier to adapt bcopy + bscan to do what I need or the director spooling code? First here are my thoughts about what a final implementation would result in, without considering the work to get there: bcopy+bscan: - stand alone implementation. - difficult to select individual Jobs for copying as opposed to Volumes. - possibly complicated list of options to do what you want (both are rather obscure already). - not integrated with Bacula - possible conflict of tape drives if Bacula is running Bacula: - not stand alone -- must run Bacula - harder to do one of a kind copies - much easier to select individual Jobs for copying - requires more conf directives (and a lot of thought before implementing them). - integrated solution - can automate Copy/Migration on total usage, time, lots of other things that Bacula knows, but requires the directives to do so. - fewer problems of tape drive conflicts Implementation difficulty: -- probably overall the same bcopy+bscan: - bcopy can easily read a Volume. - bscan is made to generate a database from the input records, but you need to generate the database from the output records. This would require some work. - any command line options would easily be picked up and used in the code. - Bottom line: probably a lot of new code, but examples of most pieces exist and it would all be done in a single file. Bacula: - It is sort of like a restore feeding into a backup. - All the code to do this already exists in stored/read.c and stored/append.c by moving some of the code to subroutines in each of those routines, one could easily connect them. - The difficulties are in the design of the Director directives and their data structures to implement this -- requires a good deal of careful though. - The information would then need to be passed from the Director to the SD (I think the FD would be bypassed in this). This requires new inter-daemon protocols. Not really hard but some work. - Bottom line: the basic code exists, it is just a lot of little details to handle the .conf files, their new data structures, and the communications between the daemons. I prefer to see an integrated Bacula solution, but even a bcopy+bscan implementation would be useful. I hope this helps. Best regards, Kern |