Xpath not enough, how to pass var into script

  • werner001

    werner001 - 2010-06-13


    I am creating a configuration to extract a number of simple html pages. Their
    structure is like:

    • topic 1
    • name
    • etc

    • topic 2

    • name
    • etc

    I want to collect the name/etc items with their parent topic, but xpath seems
    to be unsufficient as I cannot choose a certain number from the resulting list
    (I know: the second result for "name" is under "topic 2", but there is not
    structural attribute for this). The result will be written into a db table
    (the first name goes into an other column than the second one).

    So, I want to select all "name"s first and select them manually with beanshell
    or javascript (xquery seems to be no help too).

    The question is: If i define a variable like this:

    <var-def name="list">

    <xpath expression="//tr/td/../following-sibling::tr/td/following- sibling::td"></xpath>


    How can I manipulate it in a <script> part, and what kind of object is this?
    Is it possible at all, or any better idea...?

    thank you


  • Anonymous

    Anonymous - 2010-06-14

    what you need is to use xpath
    and review the
    syntax part concerning
    indices in square brackets

  • werner001

    werner001 - 2010-06-14

    Do you mean e.g. td ? The position is variable, so this wouln't help. Do you
    have an example?



  • Anonymous

    Anonymous - 2010-06-14

    it's not a matter of position, but of enclosing tags.

    for example, if your source looks like this:

    some other junk...


    you just need an xpath like


    for the first instance of topic.

    if you provide an excerpt of your source i could be of more help.

  • werner001

    werner001 - 2010-06-15

    Oh, you're allright, but I didn't explain it clearly enough, sorry.

    The "variable content" consists of the same tags, so it could be

    <h2>junk topic 1</h2> 
    <h2>junk topic 2</h2> 
    <h2>maybe another junk topic</h2> 
    <p>bla5 or even bla6
    <h2>maybe not</h2> 

    The actual source is but that doesn't make a difference of course.

    Is there any light at the end of the tunnel?

    thanks - werner

  • Anonymous

    Anonymous - 2010-06-15


    if you use the xpath expression


    you will get a LIST variable with ALL the topics.

    same goes for tables. you just need to find the commonality.

    if you want, please post the ACTUAL HTML so I can give you a few pointers with
    a real example...


  • werner001

    werner001 - 2010-06-15


    I want the

    's next to topic1 and topic2, but only one of them at a time.

  • Anonymous

    Anonymous - 2010-06-15

    i think that's the problem right there... you cannot get "one at a time" with

    maybe you can process this better if you rethink this and use the xpath list
    result inside a

    loop operation

    (provided by webharvest) ?

    I recommend you have a look at the examples in the documentation... there's
    plenty of relevant information there.

  • werner001

    werner001 - 2010-06-15

    okay, here is the actual html:


    You find there the topics "Motor" and "Propeller" which both have a
    "Manufacturer" sub topic. I want to collect the manufacturers for either motor
    or propeller.

  • werner001

    werner001 - 2010-06-15

    Hi again and thanks for your responses,

    I am using the loop operation for other purposes already, but in this case the
    selected items have a different meaning.

    I assume I should use a conditional within this loop asking for the current

    I will try - anyway, could I use a web harvest within a script block?

    thanks a lot for your help



Log in to post a comment.