Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#1082 saxon:parse-html() on .NET

v9.2
closed
Michael Kay
5
2012-10-08
2010-08-07
Michael Kay
No

The extension function saxon:parse-html() relies on John Cowan's TagSoup parser, which is not issued with Saxon. In consequence, the function does not work "out ot the box" on .NET.

In a future version (9.3) I've decided that it makes sense to include the code of TagSoup in the saxon9pe and saxon9ee assemblies, so that this function works without any special configuration requirements.

In the meantime, Joe Edwards advises of this workaround:

(a) cross-compile the TagSoup JAR file to a DLL assembly using IKVMC.

(b) set a custom ClassLoader on the Saxon Configuration as follows:

Processor.Implementation.getDynamicLoader().setClassLoader( new CustomClassLoader( Assembly.GetEntryAssembly() ) );

// …

private class CustomClassLoader: ClassLoader {

   private readonly AssemblyClassLoader _TagSoupClassLoader;

   public CustomClassLoader( Assembly assembly ): base( new AssemblyClassLoader( assembly ) ) {

          _TagSoupClassLoader = new AssemblyClassLoader( Assembly.GetAssembly( typeof( Parser ) ) );

   }

   public override Class loadClass( string name ) {

          if( name.StartsWith( "org.ccil.cowan" ) ) return _TagSoupClassLoader.loadClass( name );

          else return base.loadClass( name );

   }

}

This looks like an interesting technique that could be extended to deal with other dynamic loading problems on .NET.

Discussion

  • Michael Kay
    Michael Kay
    2010-11-16

    Fixed in 9.3.0.1