VTD parser issue intermittent behaviour

User234
2012-07-05
2013-05-15
  • User234

    User234 - 2012-07-05

    Hi,
    In our application we are using VTD parser (version vtd-xml-2.7.jar), There is an intermittent issue like, in our application we have 1000s of messages in xml format comes to Java application where we using VTD parser to parse the XML and store the values in database. Few times we see some values are not properly stored in database, when we see the XML, the value will be there for that transaction. If we retry the same XML message then again we can see teh parser has properly parsed and can see all values from the xml in database. Not sure why vtd is behaving intermittently. Appreciate your views and suggestions.

     
  • User234

    User234 - 2012-08-03

    Hi,

    We have tested with multithreaded in Java, looks VTD parser is not Thread safe, we get the null value, if the blocks which call VTD parser are synchronized then the results are correct. Looks this issue is same as in C with thread safety.
    I am not sure is this right place for VTD forum to discuss.?

     
  • jimmy zhang

    jimmy zhang - 2012-09-06

    Can u be specific on what u mean by thread safe?? also, sorry for the late reply, can u post question to vtd-xml users mailing list?

     
  • Leena

    Leena - 2012-09-27

    Hi,

    Even I am facing the same issue. It fails in multithreaded environment.

    public class CompiledExpressions {
        private Map<String, CloneableAutoPilot> compiledExpressions = new HashMap<>();
        public synchronized void add(String xpath, CloneableAutoPilot cloneableAutoPilot) {
            compiledExpressions.put(xpath, cloneableAutoPilot);
        }
        public synchronized AutoPilot get(String xpath) throws CloneNotSupportedException {
            CloneableAutoPilot cloneableAutoPilot = compiledExpressions.get(xpath);
            return (cloneableAutoPilot == null) ? null : (AutoPilot) cloneableAutoPilot.clone();
        }
    }
    
    package com.tesco.clock.lookup.util.xmlutils;
    import com.tesco.clock.lookup.util.ThreadUniqueTransactionNo;
    import com.ximpleware.AutoPilot;
    import com.ximpleware.VTDGen;
    import com.ximpleware.VTDNav;
    import org.apache.log4j.Logger;
    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.stereotype.Component;
    import java.util.*;
    import static org.apache.commons.lang.StringUtils.isBlank;
    @Component
    public class VTDUtil {
        private static final Logger LOGGER = Logger.getLogger(VTDUtil.class);
        @Autowired
        private CompiledExpressions compiledExpressions;
        VTDUtil() {
        }
        public VTDUtil(CompiledExpressions compiledExpressions) {
            this.compiledExpressions = compiledExpressions;
        }
        public VTDNav parse(String xml) throws Exception {
            VTDGen vtdGen = new VTDGen();
            vtdGen.setDoc(xml.getBytes());
            vtdGen.parse(true);
            VTDNav nav = vtdGen.getNav();
            return nav;
        }
        public AutoPilot compileXPath(String xpath) throws Exception {
            CloneableAutoPilot cloneableAutoPilot = new CloneableAutoPilot();
            cloneableAutoPilot.selectXPath(xpath);
            compiledExpressions.add(xpath, cloneableAutoPilot);
            return (AutoPilot) cloneableAutoPilot.clone();
        }
        public String getNodeValue(VTDNav vtdNav, String xpath) {
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("TxNo: " + ThreadUniqueTransactionNo.get() + " - " + String.format("getNodeValue for : %s", xpath));
            }
            long startTime = System.currentTimeMillis();
            String textContent = null;
            try {
                AutoPilot compiledXPathExpression = getCompiledExpression(vtdNav, xpath);
                String value = compiledXPathExpression.evalXPathToString();
                textContent = isBlank(value) ? null : value;
                compiledXPathExpression.resetXPath();
            } catch (Exception e) {
    //            e.printStackTrace();
                LOGGER.warn("TxNo: " + ThreadUniqueTransactionNo.get() + " - " + String.format("Exception while reading expression : %s", xpath));
            }
            long endTime = System.currentTimeMillis();
            long timeTaken = endTime - startTime;
            if (timeTaken > 10)
                LOGGER.warn("TxNo: " + ThreadUniqueTransactionNo.get() + " - " + " xpath parsing time: " + timeTaken + "  for expression : " + xpath);
            return textContent;
        }
        public List<String> getNodeValues(VTDNav vtdNav, String xpath) throws Exception {
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("TxNo: " + ThreadUniqueTransactionNo.get() + " - " + String.format("getNodes for : %s", xpath));
            }
            AutoPilot compiledXPathExpression = getCompiledExpression(vtdNav, xpath);
            int index = 0;
            List<String> values = new ArrayList<>();
            while ((index = compiledXPathExpression.evalXPath()) != -1) {
                String value = vtdNav.toNormalizedXPathString(index);
                String val = isBlank(value) ? null : value;
                values.add(val);
            }
            compiledXPathExpression.resetXPath();
            return values;
        }
        private AutoPilot getCompiledExpression(VTDNav vtdNav, String xpath) throws Exception {
            AutoPilot compiledExpression = compiledExpressions.get(xpath);
            if (compiledExpression == null) {
                compiledExpression = compileXPath(xpath);
            }
            compiledExpression.bind(vtdNav);
            return compiledExpression;
        }
    }
    

    This VTDUtil is accessed by multiple thread for gettting node value , getNodeValue() and getNodeValues().

    Thanks in advance!

    Leena

     
  • Leena

    Leena - 2012-09-27

    More specific exception:

    java.lang.NullPointerException
            at com.ximpleware.LocationPathExpr.process_child(LocationPathExpr.java:382)
            at com.ximpleware.LocationPathExpr.evalNodeSet(LocationPathExpr.java:2713)
            at com.ximpleware.AutoPilot.evalXPath(AutoPilot.java:876)
            at com.tesco.clock.lookup.util.xmlutils.VTDUtil.getNodeValues(VTDUtil.java:75)
            at com.tesco.clock.lookup.util.xmlutils.adapter.VTDAdapter.getNodeValues(VTDAdapter.java:40)
            at com.tesco.clock.lookup.jsonutil.transformer.generic.SummationTransformer.transform(SummationTransformer.java:36)
            at com.tesco.clock.lookup.jsonutil.StringValue.evaluate(StringValue.java:34)
            at com.tesco.clock.lookup.jsonutil.TemplateBasedJsonConverter.convert(TemplateBasedJsonConverter.java:64)
            at com.tesco.clock.lookup.jsonutil.TemplateBasedJsonConverter.convert(TemplateBasedJsonConverter.java:44)
            at com.tesco.clock.lookup.service.ClockResultService.getClockResultByTransactionNo(ClockResultService.java:37)
            at com.tesco.clock.lookup.resources.BasketResource.basketByTransaction(BasketResource.java:53)
            at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:601)
            at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
            at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
            at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
            at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
                                                                                                                                 169138,22      6%
    
     
  • Leena

    Leena - 2012-09-27

    If I "synchronized" all the methods in vtdutil class. everything works fine.
    What is going wrong here.
    Your help is appreciated.

     
  • jimmy zhang

    jimmy zhang - 2012-09-29

    Did u actually add the synchronized key word and make it work?

     
  • Leena

    Leena - 2012-09-29

    Well! I found out what is the root cause of it.  This is due to the AutoPilot.bind() method which is not immutable.

    1. So what I am doing is I am pre-compiling all xpaths on server start-up and storing it into a map which is shared in all threads.
       So compiled xpath map looks like this:
      xpath -> AutoPilot

    2. So on the fly when request comes to parse different xmls, I get Precompiled-Xpath from from Map for a given XPath. And call   bind(vtdNav) on it.
       AutoPilot comiledXPath =    comiledMap.get(xpath);
       comiledXPath.bind(vtdNav);

      But this bind method binds the autoPilot for that xml (vtdNav) and it is no more generic.

      So my shared map is corrupted.

    3. To solve this what I did is:
        
    class   CloneableAutoPilot extends AutoPilot implements Cloneable{
      @Override
        protected Object clone() throws CloneNotSupportedException {
            return super.clone();
        }
    }

    In a map I store xpath -> CloneableAutoPilot

    And when I fetch it,
    AutoPilot comiledXPath =   (AutoPilot) comiledMap.get(xpath).clone();
       comiledXPath.bind(vtdNav);

    But of course,   this is not the deep cloning, so I faced issue.

    4. So I did one more trick, I have used some cloning library to clone it. Then everything is working fine but as expected iy hampers the performance.

    Summary:

    1. I want to pre-compile the xpath so I can meet the NFRs. And I want share this pre-compiled xpaths across multiple requests.
       If I pre-compile on the fly then it badly impacts the performance.

    2. AutoPilot.bind(vtdNav) is mutable and is a root cause of problem.

    How can solve this?

    Thanks for a prompt reply. I am looking forward for a positive reply with some solution.

    Thanks!
    Leena

     
  • jimmy zhang

    jimmy zhang - 2012-10-02

    There has been a fix to solve a multithreaded issue posted by another user. Basicly i dont think there should be any thread safety isuues with vtd xml. Could you check out the latest release and from cvs vtdnav java and give a try?

     
  • Leena

    Leena - 2012-10-02

    I am using the latest version of vtd-xml. 2.11. Is it not the latest one?  I have downloaded from below link
    https://sourceforge.net/projects/vtd-xml/files/latest/download?source=files

    As I have mentioned the root cause of all problem is bind() method of AutoPilot class. It is mutable method i.e. when I call bind(vtdNav) on autopilot, it changes the state of AutoPilot object which I cache in map. Thus results in failure in multithreded environment. In single threaded environment everything works fine

    Can you please suggest me
    1. I want to pre-compile the xpath expression and store it in some cache (hashmap)
    2. And on the fly I want to bind these autopilot to different xmls (vtdNav) in multithreded environment.

    How can we fix above bind() issue?

     
  • Leena

    Leena - 2012-10-02

    Hi,

    Please let me know if you need any more details from me. I will be more than happy to provide the detailed steps on issue.
    Your help is appreciated.

    Regards,
    Leena B.
    -

     
  • Leena

    Leena - 2012-10-02

    I was just going through vtd-xml library and found out about CachedExpr. Can we use CachedExpr instead of AutoPilot.
    Will it habve any perfoemnace impact? In which scenario to use AutoPiolot and which scenario to use CachedExpr?

     
  • jimmy zhang

    jimmy zhang - 2012-10-02

    bind() is not supposed to cause any failure in multithreaded apps, unless you have multiple threads trying to bind to same autopilot object, that could cause issue. but you will need to submit a more detailed test case (preferably a simple one)

    in 2.11 CachedExpr is a new feature, it is however, transparent to users, you should not worry about its existance in the app.

     
  • Leena

    Leena - 2012-10-03

    Hi,

    1. I have a shared Hashmap, which stores xpath string and AutoPilot for that xpath.
       On server startup, for all the xpaths, I create AutoPilot object

     AutoPilot compiledXPathExpression = new AutoPilot();
            compiledXPathExpression.selectXPath(xpathString);
    

    2. Application receives multiple request for different xml. And we need to evaluate the xpaths (which are stored in shared map) for all these xmls.

    Since bind() is a mutable call, when I call autoPilot.bind(vtdNav) it changes the internal state of autoPilot which is stored in a shared  Map.
      
    I have two solutions for this:

    1. I will not pre-compile any xpath expression and will not share it in shared map.
       This does not give me any multithreading issues. But as I do not pre-compile xpaths performance is very slow.

    2. I will pre-compile all xpaths expressions and will share it in shared map.
        While fetching AutoPilot from shared map, i will clone it so original instance of AutoPilot in a map remains untouched. But some how this appraoch also fails in multithreded environment.

    So my requirement is:
    1. I want to pre-compile the xpath expression and store it in some cache (hashmap)
    2. And on the fly I want to bind these autopilot to different xmls (vtdNav) in multithreded environment.

    Do you need any more details from me?

    Regards,
    Leena B
    -

     
  • jimmy zhang

    jimmy zhang - 2012-10-04

    if an AutoPilot object is binded to a VTDNav object, then you should not attach a different VTDNav object unless the evaluation of the xpath is finished. IF during teh xpath evaluation, you try to attach a different VTDNav object during evaluation of xpath using the previous instance of vtdnav, it will fail. That is expected behavior. the solution is to make sure you have more precompiled instances of AutoPilot ready than the max # of threads attempting to attach vtdNav object.

    So to summarize, of couse you have to make sure that you have more instance of autoPilot with pre-compiled xpath than the threads trying to attach vtd-xml. You have to architect your app to avoid that… let me know if i make sense or not

     
  • Leena

    Leena - 2012-10-04

    Thanks for the reply.

    yes it completely make sense. Even I thought of having the pool of AutoPilot objects and each thread will be allocated a autoPilot instance from this pool. Actually i was looking for some simple solution.

    I have other simple approach but somehow that is also giving some issues in multithreded environment.
    1. We pre-compile xpaths and store it in shared map.
    2. whenever thread wants the instance of AutoPilot we clone it and then serve it.

    Something like this:
    public class CloneableAutoPilot extends AutoPilot implemnets Cloneable{

    @Override
        protected Object clone() throws CloneNotSupportedException {
            return super.clone();  

           // Or deep copy of Autopilot object
         
        }

    }

    But this has some issues in multithreded environment and it is not fast too.

     
  • jimmy zhang

    jimmy zhang - 2012-10-04

    Have you considred creating multiple autoPilot object for every xpath expression? i dont know how clone would work, but i highly doubt….
    in other words,

    ap1.select("/a/b/c");
    ap2.select("/a/b/c");

    ….
    do this at the beginning of your app….
    and make sure that you have more autoPilot instance than running threads, of course you have to make sure your app assign it properly…

     
  • Leena

    Leena - 2012-10-04

    Sure! I have not tried that yet. I will try and let you know.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks