Menu

ScraperContext warning

Help
2011-11-05
2012-09-04
  • Steven P. Goldsmith

    I'm building WH from trunk as Maven project and when I run my own test app
    against the snapshot JAR I get the following warning:

    WARN org.webharvest.runtime.Scraper - You are using the DEPRECATED scraper
    configuration version. We urge you to migrate to a newer one! Please visit
    http://web-harvest.sourceforge.net/release.php for details.

    I've read where you are reworking the variable mechanism, so is there a way to
    use the new context by default?

     
  • Steven P. Goldsmith

    I figured this out by looking at the source.

    <config xmlns="&lt;a class=" "="" href="http ://web-harvest.sourceforge.net/schema/2.1/core">http://web-harvest.sourceforge.net/schema/2.1/core" charset="UTF-8">

    Otherwise 2.1 trunk source uses 1.0 syntax.

     
  • Alex Wajda

    Alex Wajda - 2011-11-17

    Yep, in WH 2.1 as lot of improvements been done in the variable handling, in
    particular a real dynamic scope was introduced. That required some syntax
    clean up. And also the new dynamic scope breaks the compatibility with the
    existing WH scrapers. For both of this purposes we decided to introduce
    scraper configuration versioning, so that WH knows how to interpret it -
    either in "old" or "new" way. The work on this has just been finished and it
    need a lot of QA. Any help is appreciated.

     
  • Steven P. Goldsmith

    Any way to post this in the wiki (i.e. the 2.1 differences) with examples? I
    figured out a lot by converting existing scripts (thanks for no more <empy>
    tags for var defs) after reading your responses and looking at wh-
    core-2.0.xsd. One thing that would be nice is if I could get back non-wrapped
    (not org.webharvest.runtime.variables.Variable) and say get back a
    List<Map<String,Object>> instead (or any other non-wrapped object). Consider
    the following code. I have to convert Variable list to Java List then iterate
    that list to extract my Map objects.

        /**
         * Return List of Map<String, Object> containing key/value pairs of data
         * after Web Harvest script completes.
         * 
         * @param sourceFilePath Web Harvest script file path
         * @param workingDir Web Harvest working directory path
         * @return List of Map<String, Object> containing key/value pairs of data
         * @throws FileNotFoundException Possible exception
         */
        public List<Map<String, Object>> getList(final String sourceFilePath,
                final String workingDir) throws FileNotFoundException {
            final ScraperConfiguration config = new ScraperConfiguration(
                    sourceFilePath);
            final Scraper scraper = new Scraper(config, workingDir);
            scraper.setDebug(false);
            log.info(String.format("Executing script %s", sourceFilePath));
            final long startTime = System.currentTimeMillis();
            scraper.execute();
            log.info(String.format("Executed in: %d ms", System.currentTimeMillis()
                    - startTime));
            // Script is expected to return Web Harvest Variable "mapList" which
            // should be a List of Map objects 
            final Variable listVar = (Variable) scraper.getContext().getVar(
                    "mapList");
            // List to return
            List<Map<String, Object>> mapList = new ArrayList<Map<String, Object>>();
            // If list returned from scraper is null there was a problem with the
            // script, so return null mapList
            if (listVar != null) {
                // Convert Web Harvest list to Java List of Web Harvest Variable
                // objects
                final List<Variable> list = listVar.toList();
                for (Variable var : list) {
                    // Get wrapped object (Map in this case)
                    mapList.add((Map) var.getWrappedObject());
                }
            } else {
                mapList = null;
            }
            return mapList;
        }
    
     
  • Alex Wajda

    Alex Wajda - 2011-11-17

    Regarding unwrapping variables I have answered in another thread. I've been
    thinking of it for awhile and, yes, it needs to be changed. I put it into
    TODOs.

     

Log in to post a comment.