Menu

Integration HTMLUnit and Webharvest

Help
alosada
2011-11-28
2012-09-04
  • alosada

    alosada - 2011-11-28

    Hi,

    I have a serious issue when asking for detail information in an specific URL.
    Seems that links are created on onclick event....

    PS20111026_INCOMPANYSEV

    I've read a lot about it and I've seen that HTMLUnit api can help me to get
    correct link. Could anyone show me an example of integration in a webharvest
    script?

    I'd like to integrate HTMLUnit in a webharvest script, and once HTMLUnit leads
    me to that detail information come back to parse that URL via webharvest..

    Thanks in advance

     
  • Alex Wajda

    Alex Wajda - 2011-11-29

    Use WH Plugin API to make a plugin. See the very basic example here - http
    ://web-harvest.sourceforge.net/plugins.php

    It is very easy to do. Also take a look at the existing plugins to get some
    clues.

    Note that since WH 2.1 the Plugin API has changed very little, although
    nothing special and you will easily get it when you look at the source code.
    Basically what you need to do it do inherit
    fromorg.webharvest.runtime.processors.WebHarvestPlugin and implement some
    methods. Then you need to register your plugin either via WH IDE or (if you
    use WH programmatically) via
    org.webharvest.definition.DefinitionResolver.registerPlugin(Class pluginClass,
    String uri)

    Then in WH 2.1 the usage is also a bit different - you need to use your custom
    XML NS which your plugin is associated with.

    <config .........
                  xmlns:my="my.namesapce.1">
    ...........
       <my:foo-bar></my:foo-bar>
    ..........
    </config>
    
     

Log in to post a comment.