I tend to use John Cowan's TagSoup in preference to JTidy. TagSoup has the option to run as a SAX parser, so you can use it to build a Saxon tree directly. However, you may have reasons to prefer JTidy - they don't do quite the same job, and I don't know what your requirements are.
 

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay



From: Gaurav sharma [mailto:sham.gaurav@gmail.com]
Sent: 20 November 2009 06:43
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] issue while compile and execute an XQueryexpressionwithSaxon.

Thanks for the namespace clue. I am able to resolve the problem. now i am getting all the input element.
which is really good.

actually i am using Jtidy to convert response into DOM object and using SAXON with Xquery to get the desired result.

Do you think i can remove Jtidy layer and use SAXON for the same.


Many Thanks :),
Gaurav

On Fri, Nov 20, 2009 at 5:31 PM, Michael Kay <mike@saxonica.com> wrote:
Sorry if I didn't make myself clear. I don't think the problem has anything to do with your Java source code. It has to do with your data. I suspect that in the data supplied as input to the query, the input elements are in a namespace, whereas you are searching for input elements in no namespace.
 
Try replacing your query with the query "." (which copies the input document to the result) to see what the input document actually looks like.
 
Incidentally, you should note that using DOM with Saxon is not very efficient: it runs 5-10 times slower than using Saxon's native tree format.

From: Gaurav sharma [mailto:sham.gaurav@gmail.com]
Sent: 20 November 2009 06:05
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] issue while compile and execute an XQuery expressionwithSaxon.

Thanks Michael,

I have attached the one sample source code for your reference.
it would be great if you can give me some  direction or problem solution.

i am trying to get all the input element from response.

Note: I am using Httpclient and JTidy API



Regards,
Gaurav

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import net.sf.saxon.Configuration;
import net.sf.saxon.dom.DocumentWrapper;
import net.sf.saxon.query.DynamicQueryContext;
import net.sf.saxon.query.StaticQueryContext;
import net.sf.saxon.query.XQueryExpression;
import net.sf.saxon.trans.XPathException;

import org.apache.commons.httpclient.HttpException;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
import org.w3c.dom.Document;
import org.w3c.tidy.Tidy;

public class HttpclientTutorial1 {

    private static String uri = "https://superseeker.super.ato.gov.au/SuperSeekerWeb/default.aspx?pid=71";
    private static String url="https://superseeker.super.ato.gov.au";
    private String restURI=null;
    private static String query = "for $x in  //input \n" +
                                  "return $x \n";

    public static void main(String[] args) {

        HttpclientTutorial1 hct = new HttpclientTutorial1();
        try {
            hct.getScrapedData();
        } catch (ClientProtocolException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

    private void getScrapedData() throws ClientProtocolException, IOException {

        // Create an instance of HttpClient.
        DefaultHttpClient httpclient = new DefaultHttpClient();

        HttpGet httpget = new HttpGet(uri);

        try {

            HttpResponse responseBody = httpclient.execute(httpget);
            Pattern p = Pattern.compile("window.location.*pid=71");
            Matcher m = p.matcher(EntityUtils.toString(responseBody.getEntity()));
            boolean found = false;

            while (m.find()) {
                System.out.println("I found the text "+m.group()+ " starting at " +m.start() + "index and ending at index "+ m.end());
                restURI=org.apache.commons.lang.StringUtils.removeStart(m.group(), "window.location = '");
                System.out.println(restURI);
                    found = true;
            }
            if(!found){
                System.out.println("No match found");
            }
            HttpGet httpget1=new HttpGet(url+restURI);
            responseBody = httpclient.execute(httpget1);
            List result=getElementFromResponse(responseBody,query);
           
        } catch (HttpException e) {
            System.err.println("Fatal protocol violation: " + e.getMessage());
            e.printStackTrace();
        } catch (IOException e) {
            System.err.println("Fatal transport error: " + e.getMessage());
            e.printStackTrace();
        } catch (XPathException e) {
            e.printStackTrace();
        } finally {
            // Release the connection.
            httpclient.getConnectionManager().shutdown();
        }

    }


    private List getElementFromResponse(
            HttpResponse responseBody,String query) throws IllegalStateException, IOException, XPathException {
        HttpEntity entity = responseBody.getEntity();
        List result=null;
        if (entity != null) {
            InputStream responseBodyStream = entity.getContent();
            // Convert the response into document object
            Document tidyDOM = ConvertResponseIntoDomObject(responseBodyStream);
            // get the element using Xquery and extract the input
            // attributes.
            result = retriveDomElementFromDocumentObject(
                    query, url+restURI, tidyDOM);
            System.out.print("size:" +result.size());

        }

        return result;
    }

   
    private List retriveDomElementFromDocumentObject(
            String query, String url, Document doc) throws XPathException {
        Configuration c = new Configuration();
        StaticQueryContext qp = c.newStaticQueryContext();
        XQueryExpression xe = qp.compileQuery(query);
        DynamicQueryContext dqc = new DynamicQueryContext(c);
        dqc.setContextItem(new DocumentWrapper(doc, url, c));
        List domElement = xe.evaluate(dqc);
        return domElement;
    }

    private Document ConvertResponseIntoDomObject(InputStream responseBody) {

        Tidy tidy = new Tidy();
        //tidy.setXHTML(true);
        tidy.setQuiet(false);
        tidy.setShowWarnings(true);
        OutputStream o=System.out;
        Document dom = tidy.parseDOM(responseBody, o);
       
       
        return dom;
    }


}


On Fri, Nov 20, 2009 at 7:32 AM, Michael Kay <mike@saxonica.com> wrote:
You haven't shown your source document, so I can't tell why this query retrieves nothing. My guess would be that the <input> elements are in a namespace, whereas your query is only selecting <input> elements that are in no namespace.
 
Incidentally, the query
 
   for $x in //input return $x
 
can be abbreviated to
 
   //input

From: Gaurav sharma [mailto:sham.gaurav@gmail.com]
Sent: 19 November 2009 10:10 Subject: [saxon] issue while compile and execute an XQuery expression withSaxon.


Hi All,

I am SAXON user. And trying to compile and execute an XQuery expression with Saxon.

objective : execute the Xquey on document object and get the Domelement list.


I am using below code for that.
 
Configuration c = new Configuration();
StaticQueryContext qp = new StaticQueryContext(c);
XQueryExpression xe = qp.compileQuery(query);
DynamicQueryContext dqc = new DynamicQueryContext(c);
dqc.setContextNode(new DocumentWrapper(dom, url, c));
List result = xe.evaluate(dqc);

 

 

Here query= "for $x in  //input \n" + "return $x \n";

And dom is a object of org.w3c.dom.Document.

Document object is having input element. But when I evaluate xquery expression, list size is coming zero. Also I am not getting any exception.

I tried same code with some other URL where I am able to see the result. So what is the problem with other one.

 

Can anyone please answer following question?

 

-        Why the list size is coming zero if Document object is having input element.

-        How can I find the baseURI (variable name - url).

-        Is there any way to trace the log/warning/errors.

 

Thanks,

Gaurav



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/saxon-help




------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/saxon-help



--
Regards,
Gaurav Sharma
HCL Tech , Gurgaon
Mobile +919818305458