Problem while running SnippetAnnotator

Help
2012-11-06
2013-05-30
  • Harsh Mahajan

    Harsh Mahajan - 2012-11-06

    Hi, I am trying to run the SnippetAnnotator class but I get the following runtime error

    Must gather ids: false
    12/11/06 13:15:24 WARN annotation.Disambiguator: 'label' database has not been cached, so this will run significantly slower than it needs to.
    12/11/06 13:15:24 WARN annotation.Disambiguator: 'pageLinksIn' database has not been cached, so this will run significantly slower than it needs to.
    12/11/06 13:15:24 INFO annotation.Disambiguator: loading classifier
    12/11/06 13:15:24 INFO weighting.LinkDetector: loading classifier
    Enter snippet to annotate (or ENTER to quit):
    sachin tendulkar

    All detected topics:

    Topics that are probably good links:

    Augmented markup:
    sachin tendulkar

    Enter snippet to annotate (or ENTER to quit):
    Sachin Tendulkar
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
    at weka.core.Instance.isMissing(Unknown Source)
    at weka.classifiers.trees.j48.C45Split.whichSubset(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.getProbs(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.getProbs(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.getProbs(Unknown Source)
    at weka.classifiers.trees.j48.ClassifierTree.distributionForInstance(Unknown Source)
    at weka.classifiers.trees.J48.distributionForInstance(Unknown Source)
    at weka.classifiers.meta.Bagging.distributionForInstance(Unknown Source)
    at weka.wrapper.Decider.getRawDistributionForInstance(Decider.java:104)
    at weka.wrapper.BinaryDecider.getDecisionDistribution(BinaryDecider.java:37)
    at org.wikipedia.miner.annotation.Disambiguator.getProbabilityOfSense(Unknown Source)
    at org.wikipedia.miner.annotation.TopicDetector.getTopics(Unknown Source)
    at org.wikipedia.miner.annotation.TopicDetector.getTopics(Unknown Source)
    at SnippetAnnotator.annotate(SnippetAnnotator.java:47)
    at SnippetAnnotator.main(SnippetAnnotator.java:81)

    Please help me as to what went wrong ?

     
  • Harsh Mahajan

    Harsh Mahajan - 2013-02-01

    Hi all, modifying the configuration file and providing correct paths to disambiguation and detection models fixed the problems. Sorry for late reply.

     
  •  peace

    peace - 2013-02-12

    Hi harshmahan,

    may I know what the path of  disambiguation and detection models  should be? I've setup the API on my local machine but it's not able to run the annotation demo because no disambiguation model is specified. I could not find any documentation that talks about how to set it up either. Thank you for your help!

     
  • Harsh Mahajan

    Harsh Mahajan - 2013-02-12

    Hi, They are inside models directory.  Here are the paths inside my config file (wikipedia-template.xml). Do tell if you face any other problems.

            <articleComparisonModel>
            /data/knowledgesource/models/artCompare_en_In.model
            </articleComparisonModel>
           
            <!- A file containing a Weka classifier for disambiguating pairs of labels ->
            <labelDisambiguationModel>
            /data/knowledgesource/models/labelDisambig_en_In.model
            </labelDisambiguationModel>
           
            <!- A file containing a Weka classifier for generating relatedness measures between labels ->
            <labelComparisonModel>
            /data/knowledgesource/models/labelCompare_en_In.model
            </labelComparisonModel>

            <!- A file containing a Weka classifier for performing automatic disambiguation of topics in documents ->
            <topicDisambiguationModel>
            /data/knowledgesource/models/disambig_en_In.model
            </topicDisambiguationModel>

            <!- A file containing a Weka classifier for performing automatic link detection ->
            <linkDetectionModel>
            /data/knowledgesource/models/detect_en_In.model
            </linkDetectionModel>

     
  • Harsh Mahajan

    Harsh Mahajan - 2013-02-12

    The models directory is inside the wikiminer root directory.

     
  •  peace

    peace - 2013-02-12

    Hi harshmahan,

    That works. Thanks!

    However, annotation quality at wikipediaminer demo site seems much better than that of my own installation. I used a recent (8 Jan 2013) dump of en Wikipedia and generated the CSV summaries myself. I'm also using all default values from wikipedia-template.xml and used the supplied models as you instructed.  I also tried tuning minSenseProbability, minLinkProbability, minLinksIn but didn't help much.  I guess I'm hitting the same issue as in this unresolved post https://sourceforge.net/projects/wikipedia-miner/forums/forum/676405/topic/4907156

    May I know if you have any luck with tuning the annotation results to come closer to the online demo?

    Many thanks!

     
  • Harsh Mahajan

    Harsh Mahajan - 2013-02-12

    I din't get chance to try the online api, but through this snippet annotator, I also got bit different results. However, there are certain other params associated with topics such as relatednessToContext and relatednessToOtherTopics etc , apart from the linkProbability parameter. You may use these to tune some results. I was also wondering, if it would be better to generate our own models. We can generate our own disambiguation and detection models. As of now, I am not doing that. But do tell me if you give it a try. 

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks