The "final" version of this patch is up here:

Main things:

1. compute_shape_functions API (libMesh internal) had to be updated to take (quadrature) points to calculate the shape functions.
2. The calculation of the shape functions for H1 conforming elements has been moved to the H1FETransformation class. We had to do this anyway for HCurl, etc., but I think it's more conceptually correct anyway. I haven't seen any noticeable performance difference with Intel on the examples (if anything it's been slightly faster, but nothing worth mentioning).
3. I had to remove the calculate_dphi = true assignment in the get_dphidxi, etc. calls, but I've at least left the asserts there. The problem is that we may need dphidxi without actually needing dphi. The only time this will break is if the user *only* calls get_phi, but also calls get_dphidxi. Can anyone think of a case where this would happen (I couldn't come up with one)? If so, any suggestions?
4. I went ahead and added the curl and div calculations for the H1 conforming, vector-valued elements. Roy and I tried to avoid stashing the curl_phi and div_phi data structures if the element was scalar valued, but it seems rather difficult and it should be only 24 bytes*2 per finite element, so assuming fe's are being cached, this shouldn't be a big deal.
5. For giggles, I included the HCurlFETransformation just to show the necessity of the internal API change and to get early feedback on it. It doesn't actually get instantiated yet (will come with the Nedelec patch).
6. interior_curl and interior_div have been added to FEMContext (others will be added later).
7. The H1 curl and div calculations are untested.

The patch was made against r5903. In addition to --enable-everything, I tested with --enable-complex, --disable-second, and --enable-2D-only, all in dbg mode.

I'd like to commit this ASAP because I'm sitting on another patch that implements the first type Nedelec triangle with converging curl-curl example which depends on this...

Any feedback is appreciated.