Thread: [aKregator-devel] [Bug 85624] New: idea: "web scraping" support (non-rss news site support)
Brought to you by:
lippel
From: Charles P. <pho...@ro...> - 2004-07-21 13:08:00
|
------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. http://bugs.kde.org/show_bug.cgi?id=85624 Summary: idea: "web scraping" support (non-rss news site support) Product: akregator Version: unspecified Platform: unspecified OS/Version: Linux Status: UNCONFIRMED Severity: wishlist Priority: NOR Component: general AssignedTo: akregator-devel lists sourceforge net ReportedBy: phoenixreads rogers com Version: 1.0-beta5 "Pierre" (using KDE 3.2.3, Gentoo) Compiler: gcc version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6) OS: Linux (i686) release 2.6.6-win4lin-r3 I would call this a future development idea. FYI - "Web scraping is the practice of getting information from a web page and reformatting it." The idea is to have, hopefully community created, scripts that would convert a non-rss site into an rss formated file. I could easily see the scripts becoming standardized and shared freely. One naming method would be {site}-{date} (e.g., www-cnn-com-20040721.py) I like python. :) The method would be simple, akregator would have a script associated with a feed. The script outputs a valid xml file so now instead of getting it from the Internet akregator gets it from the script. If the output is invalid akregator would treat it just like an invalid source thus there is not security issues involved using the scripts. Everything else about akregator remains the same. Local urls are supported but unless you step up cron jobs there is no automation or central repository. |