Menu

#36 URGENT: changes in URL and login system broke this tool, and it has 2 weeks left to ever be relevant again

v1.0_(example)
open
urgent (1)
1
2019-10-29
2019-10-28
No

I see previous tickets opened about both the login pages and the URLs having changed, but they pre-date the latest version of the code available, so I'm opening a new ticket, as Yahoo monkeyed around with their site code several times in the interim anyway, and YAHOO GROUPS IS CLOSING IN TWO WEEKS.

The current tool cannot get past the login page, even if you fix the URL strings. It reports "[WARN] {url here} Document Not Accessible - report to Yahoo", and dies.

Yahoo is killing Yahoo Groups. It's already impossible to post new content (as of today, 2019-11-28), and all the user-created content is going to be removed on 14 December. So that's a very short window in which to rescue a lot of content. Usual "web sucking" utilities aren't much use, because they end up in loops; too many pages/posts connect to too many other ones via different linking mechanisms that there are half a dozen ways to get to the same material.

I was able to mess around and fix the URL stuff in a test copy of your script: The lead-in string must be "https://groups.yahoo.com/neo/groups/" with "/neo/" and with "s" at the end of "groups". Then the group name tacked on after this must NOT end with a slash, which was hard-coded in the original script. And the login page's parameter stuff after "?" has changed as well. Sever other paths have changed, e.g. ".../photos" to ".../photos/photostream", and ".../messages" to ".../conversations/messages".

However, when it comes to using Perl to tease apart the content of Web pages, I just don't know enough to fix that part of it.

I realize you've pretty much abandoned this script, but please consider this an impassioned request for an emergency repair! :-)

Discussion

  • Mithun Bhattacharya

    It is not easy - unfortunately the page is very Javascript heavy - I would have to dig around and see how much of the webpage is dynamically generated. I had assumed the website is sort of streaming the content based on how much you browse but I might be wrong. I will try to spend some time but I can't promise a miracle at this point.

     
  • Stanton McCandlish

    I'll pray hard. :-)

     

Log in to post a comment.