Menu

difference between gui and java libs

pol pul
2010-08-26
2012-09-04
  • pol pul

    pol pul - 2010-08-26

    I have a config xml that runs in the gui ( webharvest_all_2.jar ).

    However, in my test application, I get a NoSuchMethodError:
    net.sf.saxon.query.QueryResult.serialize.

    This is on the line:

    v=(Variable)(scraper.getContext().get("OpportunityItems"));

    What do I do wrong? Are there important differences between gui and java libs?

    I use Eclipse, Maven, and Webharvest 2

    NB, my pom.xml is:

    <dependencies>

    <dependency>

    <groupId>junit</groupId>

    <artifactId>junit</artifactId>

    <version>3.8.1</version>

    <scope>test</scope>

    </dependency>

    <dependency>

    <groupId>org.webharvest.wso2</groupId>

    <artifactId>webharvest-core</artifactId>

    <version>2.0</version>

    <type>jar</type>

    <scope>compile</scope>

    </dependency>

    <dependency>

    <groupId>net.sf.saxon</groupId>

    <artifactId>saxon-xom</artifactId>

    <version>8.7</version>

    </dependency>

    <dependency>

    <groupId>org.htmlcleaner</groupId>

    <artifactId>htmlcleaner</artifactId>

    <version>1.55</version>

    </dependency>

    <dependency>

    <groupId>bsh</groupId>

    <artifactId>bsh</artifactId>

    <version>1.3.0</version>

    </dependency>

    <dependency>

    <groupId>commons-httpclient</groupId>

    <artifactId>commons-httpclient</artifactId>

    <version>3.1</version>

    </dependency>

    <dependency>

    <groupId>log4j</groupId>

    <artifactId>log4j</artifactId>

    <version>1.2.15</version>

    </dependency>

    </dependencies>

    <repositories>

    <repository>

    <id>org.webharvest.wso2</id>

    <name>Web Harvest Core</name>

    <url>http://dist.wso2.org/maven2/</url>

    <snapshots><enabled>false</enabled></snapshots>

    </repository>

    </repositories>

    logging is working, and gives no errors.

     
  • Alex Wajda

    Alex Wajda - 2010-08-26

    WebHarvest is shipped with its own Saxon build. It seems to be v.9, but I
    wasn't able to find the identical build from the Saxon web site or any public
    Maven repos. What I did for myself is I took the saxon9.jar available in
    WebHarvest2 sources and manually install it into my local Maven repo:

    mvn install:install-file -Dpackaging=jar -DgeneratePom=true \
                             -DgroupId=net.sf.saxon \
                             -DartifactId=saxon \
                             -Dversion=9 \
                             -Dfile=saxon9.jar
    

    and then used it:

            <dependency>
                <groupId>net.sf.saxon</groupId>
                <artifactId>saxon</artifactId>
                <version>9</version>
            </dependency>
    

    Don't try to build WebHarvest2 from sources using pom.xml from trunk, it's
    far from being complete. I wrote my own based on that one and I'd love to
    share it, but unfortunately I haven't received any reply from the developers.
    The project seems to be abandoned :(

    Here's my pom.xml to build WebHarvest2 from svn trunk:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="[url]http://maven.apache.org/POM/4.0.0[/url]" xmlns:xsi="[url]http://www.w3.org/2001/XMLSchema-instance[/url]"
             xsi:schemaLocation="[url]http://maven.apache.org/POM/4.0.0[/url] [url]http://maven.apache.org/maven-v4_0_0.xsd[/url]">
        <modelVersion>4.0.0</modelVersion>
        <groupId>net.sourceforge.web-harvest</groupId>
        <artifactId>web-harvest</artifactId>
        <version>2.0.0-SNAPSHOT</version>
        <description>Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and
            extract useful data from them.
        </description>
        <url>[url]http://web-harvest.sourceforge.net/</url[/url]>
        <inceptionYear>2006</inceptionYear>
        <developers>
            <developer>
                <id>vnikic</id>
                <name>Vladimir Nikic</name>
                <roles>
                    <role>Project Admin</role>
                    <role>Developer</role>
                </roles>
            </developer>
        </developers>
        <licenses>
            <license>
                <name>BSD License</name>
                <url>[url]http://www.opensource.org/licenses/bsd-license.php</url[/url]>
                <distribution>repo</distribution>
                <comments>OWNER = Vladimir Nikic
                    YEAR = 2006-2007
                </comments>
            </license>
        </licenses>
        <scm>
            <url>[url]http://web-harvest.svn.sourceforge.net/</url[/url]>
        </scm>
        <build>
            <sourceDirectory>src</sourceDirectory>
            <resources>
                <resource>
                    <directory>src</directory>
                    <includes>
                        <include>org/webharvest/gui/resources/**/*</include>
                    </includes>
                </resource>
                <resource>
                    <directory>licences</directory>
                    <targetPath>META-INF</targetPath>
                    <includes>
                        <include>**/*</include>
                    </includes>
                </resource>
            </resources>
            <plugins>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <configuration>
                        <archive>
                            <manifestFile>config/MANIFEST.MF</manifestFile>
                        </archive>
                    </configuration>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <configuration>
                        <source>1.5</source>
                        <target>1.6</target>
                        <encoding>UTF-8</encoding>
                        <optimize>true</optimize>
                        <excludes>
                            <exclude>Test.java</exclude>
                        </excludes>
                    </configuration>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-source-plugin</artifactId>
                    <executions>
                        <execution>
                            <id>attach-sources</id>
                            <goals>
                                <goal>jar</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
        <dependencies>
            <dependency>
                <groupId>net.sourceforge.htmlcleaner</groupId>
                <artifactId>htmlcleaner</artifactId>
                <version>2.1</version>
            </dependency>
            <dependency>
                <groupId>org.beanshell</groupId>
                <artifactId>bsh</artifactId>
                <version>2.0b4</version>
            </dependency>
            <dependency>
                <groupId>commons-codec</groupId>
                <artifactId>commons-codec</artifactId>
                <version>1.4</version>
            </dependency>
            <dependency>
                <groupId>commons-collections</groupId>
                <artifactId>commons-collections</artifactId>
                <version>3.2.1</version>
            </dependency>
            <dependency>
                <groupId>commons-httpclient</groupId>
                <artifactId>commons-httpclient</artifactId>
                <version>3.1</version>
            </dependency>
            <dependency>
                <groupId>commons-logging</groupId>
                <artifactId>commons-logging</artifactId>
                <version>1.1.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.commons</groupId>
                <artifactId>commons-email</artifactId>
                <version>1.2</version>
            </dependency>
            <dependency>
                <groupId>commons-net</groupId>
                <artifactId>commons-net</artifactId>
                <version>2.0</version>
            </dependency>
            <dependency>
                <groupId>commons-cli</groupId>
                <artifactId>commons-cli</artifactId>
                <version>1.2</version>
            </dependency>
            <dependency>
                <groupId>log4j</groupId>
                <artifactId>log4j</artifactId>
                <version>1.2.16</version>
            </dependency>
            <dependency>
                <groupId>org.codehaus.groovy</groupId>
                <artifactId>groovy-all</artifactId>
                <version>1.7.4</version>
            </dependency>
            <dependency>
                <groupId>rhino</groupId>
                <artifactId>js</artifactId>
                <version>1.7R2</version>
            </dependency>
            <dependency>
                <groupId>jboss</groupId>
                <artifactId>jnet</artifactId>
                <version>3.2.1</version>
            </dependency>
            <dependency>
                <groupId>net.sf.saxon</groupId>
                <artifactId>saxon</artifactId>
                <version>9</version>
            </dependency>
        </dependencies>
    
        <repositories>
            <repository>
                <id>ibiblio.org</id>
                <name>ibiblio</name>
                <url>[url]http://mirrors.ibiblio.org/pub/mirrors/maven2/</url[/url]>
            </repository>
        </repositories>
    </project>
    
     
  • newbee

    newbee - 2010-08-26

    Hi,

    I am not main developer, but have been using WH for some time. I created my
    own version of the code which is much more efficient than the one shipped with
    the main release. When I sent it to developer for inclusion in the next
    release it was pretty much ignored. It is an open src project and as such you
    are free to make any changes to your own repo and that could work for you. As
    for your question, I believe you can use the latest saxon jar from the project
    (not WH). In fact it is much better and fixes some bugs that I experienced
    earlier.

    Regards,

    Ed

     
  • pol pul

    pol pul - 2010-08-27

    Thanks already

    After adding a lot of manual maven installs (and going for the latest version
    of saxon indeed: 9), my error is changing to another one: (at runtime:)
    java.lang.NoClassDefFoundError: net/sf/saxon/trans/XPathException

    This is occuring before the HTML load.

     
  • Alex Wajda

    Alex Wajda - 2010-08-27

    did you use exact saxon9.jar from the WebHarvest2 sources? As I said any
    other saxon builds available on their web site or public Maven repos will not
    work. You need to use that particular saxon9.jar located in /lib directory
    here http://web-
    harvest.sourceforge.net/download/webharvest2b1-project.zip

     
  • pol pul

    pol pul - 2010-08-27

    This is indeed the one I am using.

    Now I saw in my pom.xml another dependency on saxon-xom. I removed this, and
    now I get again further.

    New issue is now Exception in thread "main" java.lang.NoSuchMethodError:
    org.htmlcleaner.HtmlCleaner: method <init>()V not found, occuring after the
    HTML read.

     
  • Alex Wajda

    Alex Wajda - 2010-08-27

    Please, look at the pom.xml from my first post here and use it as a
    reference. HtmlCleaner should be of the version 2.1

     
  • Alex Wajda

    Alex Wajda - 2010-09-11

    The POM in trunk is fixed.

    To build Web-Harvest use one of the following Maven commands:

    mvn clean install
    

    to build without external dependencies (handy for embedding into other
    projects)

    -OR-

    mvn clean install -Pwith-dependencies
    

    with all dependencies (to run stand-alone)

     

Log in to post a comment.