I've to parse a lot of web site.
I want to take their html and transform it into xml. My idea was to take the html , transform it in xml, apply to the xml an xslt and obtain my custom xml. each site (xml) will have its own xslt with xpath..
I've done something like that
org.xml.sax.XMLReader reader = org.xml.sax.helpers.XMLReaderFactory.createXMLReader ("org.htmlparser.sax.XMLReader");
org.xml.sax.ContentHandler content = new MyContentHandler ();
org.xml.sax.ErrorHandler errors = new MyErrorHandler ();
I've understand that the MyContentHandler will take care about xml tags processing. For the moment I've implemented this only with system.out to test it.
I really don't know how I can do what I want..
For example: how can I apply a xslt to the google site's xml to obtain another xml?
I don't want to parse each tag with java code in the 'MyContentHandler' I want that xslt thake care about this. After I retrive the clean xml from the html I'll give this to the xslt .. so I can take my custom xml.
Someone can help me?
thanks a lot guys