#4 sourceforge page layout changed

closed
nobody
None
1
2010-05-02
2008-08-25
No

download.bat not able to extract filelist and packages from normal.html as format has changed.

regards

A.

Discussion

  • Uwe Diehl

    Uwe Diehl - 2008-08-29

    Logged In: YES
    user_id=2197982
    Originator: NO

    the problem is the Parameter-string by the sed-command
    at line 331:
    bin\sed -n "......" normal.html >getgnuwin32.tmp
    this string should be changed.

    the result-file getgnuwin32.tmp is always empty since at
    the beginning of Aug 2008

     
  • Doug Sweeney

    Doug Sweeney - 2008-10-16

    Here's a context patch of the changes I made to download.bat (v0.6.19) to get the downloads to work again. Works for me, but YMMV...

    ===== snip ============
    *** download.bat.0.6.19 2007-05-18 00:27:01.344313300 -0400
    --- download.bat 2008-10-16 11:26:14.515625000 -0400
    ***************
    *** 7,12 ****
    --- 7,16 ----
    :: date : May 17, 2007
    :: version: 0.6.19
    ::
    + :: patched: Oct 15, 2008 by DJ Sweeney, <dj.sweeney@sweeneyconcepts.com>
    + :: Fixed to work with newer Sourceforge page structure
    + :: NOTE: Will likely break the NEXT time the page structure changes
    + ::
    :: Copyright (c) 2007 by Mathias Michaelis, <michaelis@tcnet.ch>
    ::
    :: -------------------------------------------------------------------
    ***************
    *** 200,205 ****
    --- 204,214 ----
    :set_wgetrc
    if ".%WGETRC%"=="." set WGETRC=%CD%\bin\wget.ini

    + ::
    + :: If -d was specified, use existing getgnuwin32.lst to download packages
    + ::
    + if defined GNUWIN32_DIRECT_DOWNLOAD goto prepare_download_packages
    +
    ::
    :: If we are in verbose mode, tell what we're going to do
    ::
    ***************
    *** 248,278 ****
    :: file.
    ::
    if not exist project.html goto restart_download_project_site
    - if defined GNUWIN32_ALL_PROJECT_SITES goto create_project_list
    bin\sed -n "/<\/html *>/Ip" project.html >project.tmp
    bin\test -s project.tmp
    if errorlevel 1 goto restart_download_project_site
    - goto end_download_project_site
    -
    ::
    ! :: If only a part of project.html could be received, the probability is
    ! :: very high that this part contains a list of all individual gnuwin32
    ! :: projects along with the corresponding urls. So try to follow this
    ! :: links.
    ::
    - :create_project_list
    bin\sed -n "s/>/>\n/g;s/</\n</g;s/\(\s*\n\)\+\s*/\n/g;s/^\n\+//;s/\n\+$//;s/^\s\+//;s/.\+/&/p" project.html >normal.html
    ! bin\sed -n "/\bid[ \t]*=[ \t]*\(\d034frelease\d034\|\bfrelease\b\)/I,/<\/table>/Ip" normal.html >project.html
    del normal.html
    bin\sed -n "$p" project.html | bin\sed -n "/<\/table>/Ip" >project.tmp
    bin\test -s project.tmp
    if errorlevel 1 goto restart_download_project_site
    ! bin\sed -n "s/\&amp;amp;/\&/Ig;s/<a[ \t]\+href[ \t]*=[ \t]*\d034\(\/project\/showfiles\.php?group_id=[0-9]\+\&package_id=[0-9]\+\)\d034>/http:\/\/sourceforge.net\1/Ip" project.html >project.tmp
    del project.html
    ! for /f "delims=" %%i in (project.tmp) do bin\wget --no-cache -O - %%i >>project.html
    goto end_download_project_site

    ::
    :: If the download of the whole project site failed, try to use only
    :: a part of it. This part probably allows to create a list of all
    :: gnuwin32 sub projects.
    --- 257,324 ----
    :: file.
    ::
    if not exist project.html goto restart_download_project_site
    bin\sed -n "/<\/html *>/Ip" project.html >project.tmp
    bin\test -s project.tmp
    if errorlevel 1 goto restart_download_project_site
    ::
    ! :: Revised to handle newer Sourceforge structure
    ! ::
    ! :: The main sourceforge link now only contains a browse list of each
    ! :: package and it's current release number, BUT NO file listings!
    ! ::
    ! :: To get the current version's files, we now must do essentially the
    ! :: same thing we do for the '-a' option, but only for the current release.
    ! ::
    ! :: I've changed a few things around, but the core of this file still works
    ! :: the same.
    ! ::
    ! :: Messy and slow, but it works...
    ::
    bin\sed -n "s/>/>\n/g;s/</\n</g;s/\(\s*\n\)\+\s*/\n/g;s/^\n\+//;s/\n\+$//;s/^\s\+//;s/.\+/&/p" project.html >normal.html
    ! bin\sed -n "/\bid[ \t]*=[ \t]*\(\d034yui-main\d034\|\byui-main\b\)/I,/<\/table>/Ip" normal.html >project.html
    del normal.html
    bin\sed -n "$p" project.html | bin\sed -n "/<\/table>/Ip" >project.tmp
    bin\test -s project.tmp
    if errorlevel 1 goto restart_download_project_site
    ! REM -- copy project.html project.debug.html
    ! if defined GNUWIN32_ALL_PROJECT_SITES (
    ! call :create_project_list_all
    ! ) else (
    ! call :create_project_list_current
    ! )
    del project.html
    ! echo == Building File Lists...
    ! for /f "tokens=1,2 delims= " %%i in (project.tmp) do (
    ! if defined GNUWIN32_ALL_PROJECT_SITES (
    ! echo Getting list of files [all versions]: %%i
    ! ) else (
    ! echo Getting list of files [current version]: %%i
    ! )
    ! bin\wget --quiet --progress=dot --no-cache -O - %%j >>project.html
    ! )
    goto end_download_project_site

    ::
    + :: Create list using current release of package
    + :: == Called from main download section above
    + ::
    + :create_project_list_current
    + bin\sed -n "s/\&amp;amp;/\&/Ig;/^<a\s\+href\s*=\s*\d034\/project\/showfiles.php?group_id=[0-9]\+\&package_id=[0-9]\+\d034>/{n;s/.*/& /I;h};/<a[ \t]\+href[ \t]*=[ \t]*\d034\(\/project\/showfiles\.php?group_id=[0-9]\+\&package_id=[0-9]\+\&release_id=[0-9]\+\)\d034>/{G;s/<a[ \t]\+href[ \t]*=[ \t]*\d034\(\/project\/showfiles\.php?group_id=[0-9]\+\&package_id=[0-9]\+\&release_id=[0-9]\+\)\d034>\s*\(\S*\)/\2 http:\/\/sourceforge.net\1/I;p;d}" project.html >project.tmp.presort
    + :: Make sure no duplicate entries exist...
    + bin\sort -u project.tmp.presort -o project.tmp
    + goto :EOF
    +
    + ::
    + :: Create list including all available releases of package
    + :: == Called from main download section above
    + ::
    + :create_project_list_all
    + bin\sed -n "s/\&amp;amp;/\&/Ig;/<a[ \t]\+href[ \t]*=[ \t]*\d034\(\/project\/showfiles\.php?group_id=[0-9]\+\&package_id=[0-9]\+\)\d034>/{N;s/<a[ \t]\+href[ \t]*=[ \t]*\d034\(\/project\/showfiles\.php?group_id=[0-9]\+\&package_id=[0-9]\+\)\d034>\n\(\S*\)/\2 http:\/\/sourceforge.net\1/Imp}" project.html >project.tmp.presort
    + :: Make sure no duplicate entries exist...
    + bin\sort -u project.tmp.presort -o project.tmp
    + goto :EOF
    +
    + ::
    :: If the download of the whole project site failed, try to use only
    :: a part of it. This part probably allows to create a list of all
    :: gnuwin32 sub projects.
    ***************
    *** 319,324 ****
    --- 365,371 ----
    if not exist project.html goto getgnuwin32_empty
    if not "%GNUWIN32_VERBOSE%"=="" echo = Analysing file list ...
    bin\sed -n "s/>/>\n/g;s/</\n</g;s/\(\s*\n\)\+\s*/\n/g;s/^\n\+//;s/\n\+$//;s/^\s\+//;s/.\+/&/p" project.html >normal.html
    + REM -- copy normal.html normal.debug.html
    del project.html

    ::
    ***************
    *** 328,334 ****
    :: programmed. That's why it's a bit risky. If this script is not
    :: up-to-date, this step may fail!
    ::
    ! bin\sed -n "/^<tr\s\+class\s*=\s*""""package""""\s*>/{n;n;n;s/.*/ &/p};/^<a href=""""http:\/\/downloads.sourceforge.net\/gnuwin32\/\(.*-\(bin\|doc\|lib\|dep\)\.zip\)?.*"""">$/Is//\1/p" normal.html >getgnuwin32.tmp
    del normal.html

    ::
    --- 375,381 ----
    :: programmed. That's why it's a bit risky. If this script is not
    :: up-to-date, this step may fail!
    ::
    ! bin\sed -n "/^<tr\s\+class\s*=\s*""""package""""\s*>/{n;n;n;s/.*/ &/p};/^<a id=""""showfiles_download.*"""" class=""""sfx_.*"""" href=""""http:\/\/downloads.sourceforge.net\/gnuwin32\/\(.*-\(bin\|doc\|lib\|dep\)\.zip\)?.*"""" .*>$/Is//\1/p" normal.html >getgnuwin32.tmp
    del normal.html

    ::
    ***************
    *** 469,485 ****
    --- 516,535 ----
    bin\sed "s/ *$//;s/ /\n/g" loadlist.tmp >loadlist.txt
    del loadlist.tmp
    cd packages
    + set NO_DOWNLOADS=TRUE
    for /f %%f in (..\filelist.txt) do (
    if not exist ..\oldpacks\%%f (
    for /f %%m in (..\loadlist.txt) do (
    if not exist %%f ..\bin\wget "%%m%%f"
    )
    + set NO_DOWNLOADS=
    ) else (
    move /y ..\oldpacks\%%f .
    )
    )
    cd ..
    del loadlist.txt
    + if defined NO_DOWNLOADS echo No packages to download.

    ::
    :: Clean up things

     
  • Andreas Stern

    Andreas Stern - 2008-10-17

    hi,

    works great when you only use the second replacement of the following ( may be I don' understand patchfiles ):
    ***************
    *** 328,334 ****
    :: programmed. That's why it's a bit risky. If this script is not
    :: up-to-date, this step may fail!
    ::
    ! bin\sed -n "/^<tr\s\+class\s*=\s*""""package""""\s*>/{n;n;n;s/.*/
    &/p};/^<a
    href=""""http:\/\/downloads.sourceforge.net\/gnuwin32\/\(.*-\(bin\|doc\|lib\|dep\)\.zip\)?.*"""">$/Is//\1/p"
    normal.html >getgnuwin32.tmp
    del normal.html

    ::
    --- 375,381 ----
    :: programmed. That's why it's a bit risky. If this script is not
    :: up-to-date, this step may fail!
    ::
    ! bin\sed -n "/^<tr\s\+class\s*=\s*""""package""""\s*>/{n;n;n;s/.*/
    &/p};/^<a id=""""showfiles_download.*"""" class=""""sfx_.*""""
    href=""""http:\/\/downloads.sourceforge.net\/gnuwin32\/\(.*-\(bin\|doc\|lib\|dep\)\.zip\)?.*""""
    .*>$/Is//\1/p" normal.html >getgnuwin32.tmp
    del normal.html

    ::
    ***************

    thank you very much

    A.

     
  • Andreas Stern

    Andreas Stern - 2009-05-06

    hi,

    meanwhile Sourceforge page structure has changed again ;-((

    please help!

    thank you very much in advance

    A.

     
  • Jay Satiro

    Jay Satiro - 2010-05-02
    • priority: 5 --> 1
    • status: open --> closed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks