Problem while running SnippetAnnotator

Help
2012-11-06
2013-05-30
  • Harsh Mahajan
    Harsh Mahajan
    2012-11-06

    Hi, I am trying to run the SnippetAnnotator class but I get the following runtime error

    Must gather ids: false
    12/11/06 13:15:24 WARN annotation.Disambiguator: 'label' database has not been cached, so this will run significantly slower than it needs to.
    12/11/06 13:15:24 WARN annotation.Disambiguator: 'pageLinksIn' database has not been cached, so this will run significantly slower than it needs to.
    12/11/06 13:15:24 INFO annotation.Disambiguator: loading classifier
    12/11/06 13:15:24 INFO weighting.LinkDetector: loading classifier
    Enter snippet to annotate (or ENTER to quit):
    sachin tendulkar

    All detected topics:

    Topics that are probably good links:

    Augmented markup:
    sachin tendulkar

    Enter snippet to annotate (or ENTER to quit):
    Sachin Tendulkar
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
    at weka.core.Instance.isMissing(Unknown Source)
    at weka.classifiers.trees.j48.C45Split.whichSubset(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.getProbs(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.getProbs(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.getProbs(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.distributionForInstance(Unknown Source)
    at weka.classifiers.trees.J48.distributionForInstance(Unknown Source)
    at weka.classifiers.meta.Bagging.distributionForInstance(Unknown Source)
    at weka.wrapper.Decider.getRawDistributionForInstance(Decider.java:104)
    at weka.wrapper.BinaryDecider.getDecisionDistribution(BinaryDecider.java:37)
    at org.wikipedia.miner.annotation.Disambiguator.getProbabilityOfSense(Unknown Source)
    at org.wikipedia.miner.annotation.TopicDetector.getTopics(Unknown Source)
    at org.wikipedia.miner.annotation.TopicDetector.getTopics(Unknown Source)
    at SnippetAnnotator.annotate(SnippetAnnotator.java:47)
    at SnippetAnnotator.main(SnippetAnnotator.java:81)

    Please help me as to what went wrong ?

     
  • Harsh Mahajan
    Harsh Mahajan
    2013-02-01

    Hi all, modifying the configuration file and providing correct paths to disambiguation and detection models fixed the problems. Sorry for late reply.

     
  •  peace
    peace
    2013-02-12

    Hi harshmahan,

    may I know what the path of  disambiguation and detection models  should be? I've setup the API on my local machine but it's not able to run the annotation demo because no disambiguation model is specified. I could not find any documentation that talks about how to set it up either. Thank you for your help!

     
  • Harsh Mahajan
    Harsh Mahajan
    2013-02-12

    Hi, They are inside models directory.  Here are the paths inside my config file (wikipedia-template.xml). Do tell if you face any other problems.

            <articleComparisonModel>
            /data/knowledgesource/models/artCompare_en_In.model
            </articleComparisonModel>
           
            <!- A file containing a Weka classifier for disambiguating pairs of labels ->
            <labelDisambiguationModel>
            /data/knowledgesource/models/labelDisambig_en_In.model
            </labelDisambiguationModel>
           
            <!- A file containing a Weka classifier for generating relatedness measures between labels ->
            <labelComparisonModel>
            /data/knowledgesource/models/labelCompare_en_In.model
            </labelComparisonModel>

            <!- A file containing a Weka classifier for performing automatic disambiguation of topics in documents ->
            <topicDisambiguationModel>
            /data/knowledgesource/models/disambig_en_In.model
            </topicDisambiguationModel>

            <!- A file containing a Weka classifier for performing automatic link detection ->
            <linkDetectionModel>
            /data/knowledgesource/models/detect_en_In.model
            </linkDetectionModel>

     
  • Harsh Mahajan
    Harsh Mahajan
    2013-02-12

    The models directory is inside the wikiminer root directory.

     
  •  peace
    peace
    2013-02-12

    Hi harshmahan,

    That works. Thanks!

    However, annotation quality at wikipediaminer demo site seems much better than that of my own installation. I used a recent (8 Jan 2013) dump of en Wikipedia and generated the CSV summaries myself. I'm also using all default values from wikipedia-template.xml and used the supplied models as you instructed.  I also tried tuning minSenseProbability, minLinkProbability, minLinksIn but didn't help much.  I guess I'm hitting the same issue as in this unresolved post https://sourceforge.net/projects/wikipedia-miner/forums/forum/676405/topic/4907156

    May I know if you have any luck with tuning the annotation results to come closer to the online demo?

    Many thanks!

     
  • Harsh Mahajan
    Harsh Mahajan
    2013-02-12

    I din't get chance to try the online api, but through this snippet annotator, I also got bit different results. However, there are certain other params associated with topics such as relatednessToContext and relatednessToOtherTopics etc , apart from the linkProbability parameter. You may use these to tune some results. I was also wondering, if it would be better to generate our own models. We can generate our own disambiguation and detection models. As of now, I am not doing that. But do tell me if you give it a try.