[Dar-libdar_api] Re: libdar polling, KDar forking...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Johnathan Burchill wrote:
> I talked with a programmer friend last night (Mark) about interrupting 
> threads. A real programming challenge it seems. The apparent way to go is 
> to poll. In fact, the pthread_cancel() function, and the ENABLE and 
> DISABLE flags are a type of polling. The libdar would have to have polling 
> points to make it work. Mark explained that the stigma of polling comes 
> from asynchronous communication, as an example. But the real problem is to 
> ascertain how long you are willing to wait between polls. 

OK, I see.

> 
> As an example, suppose you have a process that loops through a list of 
> files, compressing each in turn. Let's say the program polls the outside 
> world once per iteration to see if it should cancel the operation. If the 
> compression time is small, then the polling interval is small, and the 
> cancel operation takes effect almost immediately. 
> 
> If, on the other hand, the compression takes a long time, say 30 seconds to 
> possibly several minutes for a large file or a slow computer, then the 
> user will have to wait that time before the execution polls again. This 
> might be an unacceptably long delay to cancel the operation.
> 
> In either case, most of the loop is involved in doing real work (the 
> compression in this case) with a very small part devoted to polling. 
> 
> So a tradeoff must be made between how complicated one wants to get in the 
> code --- perhaps by implementing the compression algorithm directly and 
> adding new polling points --- and how much delay is acceptable for things 
> like cancelling the file compression operation. If users typically 
> compress only small files, then the delay will be minimal and acceptable. 
> But if they compress large files, the delay associated with cancelling the 
> operation can be a problem. 

With night ideas come. Your description makes me thing about another 
possibility. I think in all operations done by dar, there is readings 
and writings to files (even to virtual files when running in dry-run 
mode). So instead of polling at each file boundary, the polling could be 
done at each read() or write() time. As the file I/O operation are 
wrapped in class,  --- in fact it is a stack of class, one compressing, 
another scrambling, yet another slicing, --- I could add a new class 
that has the only effect to poll a certain variable for cancelation 
while transparently transmitting the data to the upper or lower class in 
the stack. This class could take place near the bottom of this stack, 
and if a cancelation is seen, it would throw an Euser_abort exception. 
The use of exception here would have the advantage to free up allocated 
memory along the exception path.

>[...]
> 
> Another way would be to fork the process and either exec a new program 
> dedicated to the task, or just run the copy of the original process. 
> Communications between the parent and child could be established through a 
> pipe or shared memory. Then if the user cancels the operation, we just 
> kill the process, and don't worry a bit about dangling threads, locked 
> mutexes, and the lot. The advantage of this technique is that we are 
> guaranteed to kill the process quickly and get control back to the user. 
> At least as quickly as a <Ctrl>-C on the command-line.

yes, I agree.

> 
> The issue of whether threads are better than forked processes depends on 
> the application. In the case of the backup program, we might typically 
> just run a backup, or a restore, by itself. I could see where we are even 
> testing one archive while diffing another one. But there won't be more 
> than several libdar operations going on at once. So the extra overhead 
> associated with forking a kdar is not really a problem.
> 
> If I decided to go with the fork technique, I wouldn't have to worry about 
> how you implement the cancelling operation in libdar, and so our lines of 
> development could in principle be carried out independently from that 
> point of view.

yes.

> 
> I have to get educated on forking before I make the plunge into those 
> waters, but it may be a few weeks as I have some jobs to apply for and 
> other work-related priorities that will eat at my spare time. But I will 
> try to get the cancel operation implemented because it is important not to 
> leave the user "hanging" and frustrated when they want to stop an 
> operation.

Forking or not forking...  :-)

to resume, either you fork and have easy cancelation, but will have 
overhead in development (kdar only) and program execution about 
returning information about the processing job, or you keep threads 
which implies extra-development (libdar and kdar) to have 
cancellation,and keep easy user feedback about the running task.

I like the second one but I will require more time to be available to 
users. So I suggest one could start with a forking kdar waiting for a 
libdar that allows thread cancellation.

About spare time I have the same restriction actually, at least up to 
the end of july. So I will concentrate on the many little tasks I have 
to do in dar, as it is not easy to have a global sight of what you do 
when your work is interrupted too much time and get done over too long 
period.

> 
> JB

Denis.

[Dar-libdar_api] Re: libdar polling, KDar forking...

For full, incremental, compressed and encrypted backups or archives

[Dar-libdar_api] Re: libdar polling, KDar forking...