I think I'm coming to realize that the Processor has to be a singleton, but the compiler should be allocated per-thread.
Use a singleton Processor which makes sure all documents (parsed and results) are sharing the same namespace tables and other goodies but create a new Compiler object for every thread. I have found them pretty lightweight and create compiler objects on demand not just per thread unless I am in a known tight loop and happen to have one to reuse.
I use this strategy in xmlsh (www.xmlsh.org) and have had no problems in the 5+ years of using this.
Just keep a Processor lying around and everything else create on demand, per thread or function or whatever.