Menu

Question about Example Java code

Help
Irfan
2006-09-06
2012-09-04
  • Irfan

    Irfan - 2006-09-06

    In the example java code usage shown at http://web-harvest.sourceforge.net/usage.php ,when i compile and run the WebHarvestTest class, i get a nullpointer exception for the line,where i try to printout articles,using toString().I tried toList().size() also and it also gave the same null pointer error.

    IVariable articles = (IVariable) scraper.getContext().get("articles");

    List lst = articles.toList();
    System.out.println("Size::"+lst.size());
    System.out.println("Results::"+articles.toString());

    The xml nytimes/nytimes20060906.xml gets created properly and is a valid xml. The example code listed is it referring to a wrong node in the xml.Because i dont see a node "articles" in the xml generated.So is getContext().get("articles") the correct code.I tried get)"article" also, no luck.

    What would be the exact code to say printout the authors of all the articles ??

     
    • Vladimir Nikic

      Vladimir Nikic - 2006-09-06

      The configuration name in usage example is choosen arbitarily. It needn't be that one from examples. Ok, it maybe makes confusion - I'll change it on Usage page. The problem is that there is no "articles" varibale defined in nytimes.xml from examples. That's why it throws null pointer exception.
      If you want to have that variable, it would be enough to put somthing like:

      <var-def name="articles">
      ....
      </var-def>
      

      around file processor in nytimes.xml configuration, or even to replace file processor with variable definition processor if you don't need file as the result.

       
  • Al Waltrip

    Al Waltrip - 2012-01-21

    I've got the same problem. I'm using the canon.xml and the Variable always
    comes back null. Any ideas? Code is below:

    import org.apache.log4j.BasicConfigurator;

    import org.apache.log4j.PropertyConfigurator;

    import org.webharvest.definition.ScraperConfiguration;

    import org.webharvest.runtime.Scraper;

    import org.webharvest.runtime.variables.Variable;

    import java.io.IOException;

    import java.util.Properties;

    public class Test {

    public static void main(String args) throws IOException {

    // Set up a simple configuration that logs on the console.

    BasicConfigurator.configure();

    ScraperConfiguration config = new ScraperConfiguration("C:/Users/alex/Dev/Proj
    ects/JBDS-4.0/TestScraper/temp/scrapertest/canon.xml");

    Scraper scraper = new Scraper(config,
    "C:/Users/alex/Dev/Projects/JBDS-4.0/TestScraper/temp/scrapertest/");

    scraper.setDebug(true);

    scraper.execute();

    Variable products = (Variable) scraper.getContext().get("products");

    System.out.println("Done...");

    }

    }

     
  • Steven P. Goldsmith

    This works with the source from trunk (assumes a WH variable called mapList is
    returned):

        /**
         * Execute WebHarvest script which should parse data into a
         * {@code List<Map<String, Object>>} called mapList. Parameters passed are:
         * 
         * sourceFilePath - Script to execute
         * workingDir     - Web Harvest working directory
         * 
         * @param params Key/value pairs of parameters.
         * @return List of database field names/values.
         */
        @Override
        public final List<Map<String, Object>> execute(
                final Map<String, String> params) {
            List<Map<String, Object>> mapList = null;
            try {
                final ScraperConfiguration config = new ScraperConfiguration(params.
                        get("sourceFilePath"));
                final Scraper scraper =
                        new Scraper(config, params.get("workingDir"));
                scraper.setDebug(false);
                log.info(String.format("Executing script %s", params.get(
                        "sourceFilePath")));
                final long startTime = System.currentTimeMillis();
                scraper.execute();
                log.info(String.format("Executed in: %d ms", System.
                        currentTimeMillis()
                        - startTime));
                // Script is expected to return Web Harvest Variable "mapList" which
                // should be a List of Map objects 
                final Variable listVar = (Variable) scraper.getContext().getVar(
                        "mapList");
                // List to return
                mapList = new ArrayList<Map<String, Object>>();
                // If list returned from scraper is null there was a problem with the
                // script, so return null mapList
                if (listVar != null) {
                    // Convert Web Harvest list to Java List of Web Harvest Variable
                    // objects
                    final List<Variable> list = listVar.toList();
                    for (Variable var : list) {
                        // Get wrapped object (Map in this case)
                        mapList.add((Map) var.getWrappedObject());
                    }
                } else {
                    mapList = null;
                }
            } catch (FileNotFoundException e) {
                throw new CommandException(e);
            }
            return mapList;
        }
    
     

Log in to post a comment.