Menu

apple stats

elhacker
2010-12-29
2012-09-04
  • elhacker

    elhacker - 2010-12-29

    Hi,

    Im trying to pragmatically download stats from the Apple itunes website (http
    s://itunesconnect.apple.com/WebObjects/iTunesConnect.woa),
    but because they frequently
    keep on slightly changing the web page format my current script fails.

    Im new to Web-harvest, just found it 2 weeks ago. I was wondering is it will
    be possible to use Web-Harvest and create a cofig which will follow the script
    example from the link below. If so, would anybody be kind enough to provide
    example of how to use If/Else and jump between urls etc. as per example script
    below.

    http://code.google.com/p/appdailysales/source/browse/trunk/appdailysales.py

    Thanking you in advance.

     
  • elhacker

    elhacker - 2011-01-04

    Does anyone have any thoughts on this? Are the any developers here who will be
    willing to duplicate the logic of perl script using web-harvest for a
    reasonable price? Ive spent many hours on this already but not got very far,
    please help.

     
  • Alex Wajda

    Alex Wajda - 2011-01-05

    (e.g. HtmlUnit)

     
  • Alex Wajda

    Alex Wajda - 2011-01-05

    Ignore my previous post - I sent it in a wrong thread. Can't remove it now :(

    I hate sourceforge! They seem to have the crappiest forum and other
    collaboration tools out there!

     
  • elhacker

    elhacker - 2011-01-05

    Hi wajda79,

    Thats a shame, i thought finally someone has replied to my post. ;(

    From you experience do you not think web-harvest can managed to mimic the perl
    script link i posted? I can see from the various examples it may be possible,
    i just want to be 100% sure before i invest more time trying to work it out.

    Thanks.

    e

     
  • Alex Wajda

    Alex Wajda - 2011-01-05

    Unfortunately I don't know Perl, so I can't get much from looking at that
    script. Web-Harvest does pretty much good job in the flow control. It has
    loops, conditions, functions, variables, expressions and dynamic scope. Also
    it comes with a decent set of embedded data processors allowing you to work
    efficiently with XML, RegExps, executing snippets written on any of 3
    supported dynamic languages (JS, Groovy, BeanShell), exchange data between
    database or file system and WH, etc. Also WH provides a simple but powerful
    plug-in API, so you can significantly extend WH functionality on your demands.

    So I would bet WH is pretty capable to do what that Perl script does, although
    I can't tell you exactly what it is :)

     
  • elhacker

    elhacker - 2011-01-05

    Would you be kind enough to show me example (if you have any) of using if/else
    code, setting cookies and jumping between while keeping session state?

    Sorry to be a pain, im currently working on this now and its driving me
    crazy!!!

    Any examples would be appreciated.

    regards,

    e

     
  • Alex Wajda

    Alex Wajda - 2011-01-05

    The only thing I have to warn you about - the version available on public has
    a number of serious problems which have been fixed in trunk, and until we
    release the new version (taking into account that the team consists of just 2
    developers and both are working on their enthusiasm alone it's not gonna
    happen soon, as far as I can see) I strongly recommend you to build the latest
    WH version from trunk and use it for your work. Although it's not a release
    yet it's in fact much more stable and predictable than 2beta1.

     
  • elhacker

    elhacker - 2011-01-05

    ok, thank you for your time and for sharing this great s/w.

     
  • Alex Wajda

    Alex Wajda - 2011-01-05

    If you look at the manual you'll find many small examples there.

    This is if/else for instance - http://web-
    harvest.sourceforge.net/manual.php#case

    Cookies. WH internally uses Apache Http-Client to deal with HTTP and it
    exposes many its features to WH users. http://web-
    harvest.sourceforge.net/manual.php#http

    What exactly do you need from cookies? If it's only for keeping the session
    alive you don't need to do anything explicitly - http session is kept alive by
    WH by default (actually all cookies are stored between the <http> processors)

     
  • newbee

    newbee - 2011-01-05

    You can find docs and examples here:

    http://web-harvest.sourceforge.net/manual.php#case

    The code that you posted is written in python, and from a first glance you
    should be able to simulate the flow in WH. You will have to perform Xpath
    searches though.

    Not sure if v2 has timeout re-request logic. wajda79? v1 did not.

    Btw on a different topic (this is a suggestion for v2), could we add timeout
    attribute to http tag that controls timeout of http request? As of now it is
    set to indefinite, which is not something that we always want to do.

     
  • elhacker

    elhacker - 2011-01-05

    hi,

    I duplicated that same perl script into CF and its been working for months but
    a few weeks ago Apple made some changes and now my script doesnt work. When i
    looked at the perl script they seem to have added the following code, i dont
    exactly understand it but its something with blocking domain or something.

    cj = MyCookieJar();

    cj.set_policy(cookielib.DefaultCookiePolicy(rfc2965=True))

    Ive already looked at those examples but was looking for something that flowed
    in some logical context on a working config file.

    Anyway, i dont want to take up anymore of your time. Thank you so much for
    your replies. If/when i get it working ill let you know.

     
  • elhacker

    elhacker - 2011-01-05

    thanks for correcting me, yes its written in python just had perl on my mind
    for some reason!

     
  • newbee

    newbee - 2011-01-06

    sometimes it is useful to control it via WH config file (especially if the
    crawler is driven by config file, i.e. same java code is used for multiple
    configs which is the whole point, isn't it?). In any case, this is just a
    suggestion. I found this attribute quite useful and added it in my own
    version. Since I am still on ver1, I do not think my changes will be
    compatible with yours, however.

     

Log in to post a comment.