- priority: 5 --> 4
One thing we have been silent about in the papers is the actual running time of the program. Our approach is considerably more expensive than the Horn approach.
I note that, at a high level at least, it should be trivial to parallelize the processing of the individual documents to improve the running time of the application.
On the perl side (on UNIX at least), I could use fork() to pull this off, but threads are probably more appropriate. I am not sure how all of this plays in the other languages/OSes. But I think this is worth pursuing. I have four processors on my current system, and we are only pegging one of them.
Indeed, since I don't think the ordering of the evaluation of the regexes matters, we could probably parallelize on a lower level.
Thoughts?
Cheers,
Dave