From: Leif W <war...@us...> - 2005-08-26 13:58:42
|
> From: "Julien Lecomte" <lec...@fr...> > Sent: 2005 August 26 Friday 04:36 > > When parsing http://prdownloads.sourceforge.net/mingw; how can you > tell > which version is current, candidate, etc ? Won't that have to be > hard-scripted ? Hmm, well, you can't. Which is why you also parse the project's file release page. :p Hence why my comments about maybe stuffing the package stuff in release notes, as it'd just be a matter of hooking in to this parser to pull the links, then visit them later and parse them. Not everything in prdownloads is in the file release pages. Older stuff, outdated, odds and ends. The parsing was a little more complex, but not terribly so. I hope the text diagram remains formatted, if not, sorry. You have a loop at category name, a loop at release name, then loop the file names until you don't find another ?download, then you break and continue to find the next Release Notes, until there's none, then you look for the next <h3. Some of the fields are not particularly interesting, like fCount. fSize may seem redundant, but it's in bytes, which I find more useful than prdownload's rounded to the nearest 1kb, but just incase there's some discrepancy, I hang on to the prdownload fSizes as a fallback, as something is better than nothing. Though it should not be necessary. Project: __ <h3,>,catName,[ __ | Release Notes,>,relName,< | | ?download,>,fName,< -- | | >,>,>,fSize,< | | | >,>,fCount,< | | | >,>,fArch,< | | | >,>,fType,< | | | -- -- -- Leif |