Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: Leif W <warp-9.9@us...> - 2005-08-26 03:15:08
|
> From: "Daniel K. O." <danielosmari@...> > Sent: 2005 August 25 Thursday 22:21 > > pratical. 50 MB isn't pratical. A <1 MB installer able to check the > "installed" tools and download the proper updates/new packages is > pratical. > > Here's how I would (and maybe will) do it: a Python script that > "parses" Well, it requires installing Python, which is a few MB? Or does Python allow compilation to a binary? > http://prdownloads.sourceforge.net/mingw (even though it isn't valid > HTML), then displays the latest version of each package. A > I'm suggesting Python because I know a Python programmer that have > experience in parsing bad HTML and structured text, and can give me a > hand at this. Well, I looked at the TCL code used to generate the releases table included in the download.shtml page. I have some parsing functions in PHP, but it's not very pretty at the moment. However, here's the comments I used which basically distils the process to the bare essentials. /* * Download from prdownloads page: * * back.gif __ * .gif,>,>,fName,< | * label-size,>,fSize,< | * label-date,>,fDate,< | * -- */ With my PHP algorithm, I load the entire to one string variable, to avoid potential problems with doing things line by line in the cases where requisite date may span many lines, or all be in one line. I use successive strpos calls to find the next instance of a string, and remember that offset, and call my next strpos from that offset. Try to do it by eye first to understand. It's really not so bad. But the technique is vulnerable to small changes which are out of our control and may occur at any time, but usually very rarely occur. Instead of rewriting functions by hand, I had the idea to somehow capture the essentials visually depicted above and verbally described below. Then upon error, I just visually inspect the file, see how the format changed, update the essential info (stepping or seeking), and have the new parse function auto generated. I kind of made some progress, but not too good, ugly and incomplete, it's just slightly beyond me. What would be really bright is to detect what changed and update the essential seek/step config automatically. :p That's further still beyond me. Look for the "back.gif" string. The first one is just before the junk we want. Assume complete set is zero. Now we loop, either testing by file length or set completion. File length is what I used, followed by set completion check after loop. Look for .gif, then >, then >, remember position. Look for <. file name is in between. May need to trim whitespace. Check that it's not an empty string. maybe other content checks. Break loop on error else increment set flag. Look for label-size, then >, remember position. Look for <. file size is in between. May need to trim whitespace. Check that it's (a string that can be converted to) a number, not size zero. Break loop on error else increment set flag Look for label-date, then >, remember position. Look for <. file date is in between. May need to trim whitespace. Check that it's not an empty string. maybe other content checks. Break loop on error else increment set flag Modulus set completion flag by the number of data pieces per set, to reset to zero. Assume complete set, store it in a data structure and continue loop, else error. Outside loop, check group, to make sure it didn't error somewhere. If you haven't got a complete set (flag zero), you bailed early, so your collected data is likely bogus. Otherwise function may be successful. It depends on your "other content checks" above. I skipped further tests just to get other stuff working, and was visually verifying good data parsing. Leif |