Thread: [aKregator-devel] [Bug 85624] New: idea: "web scraping" support (non-rss news site support)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

http://bugs.kde.org/show_bug.cgi?id=85624      
           Summary: idea: "web scraping" support (non-rss news site support)
           Product: akregator
           Version: unspecified
          Platform: unspecified
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: wishlist
          Priority: NOR
         Component: general
        AssignedTo: akregator-devel lists sourceforge net
        ReportedBy: phoenixreads rogers com

Version:           1.0-beta5 "Pierre" (using KDE 3.2.3, Gentoo)
Compiler:          gcc version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)
OS:                Linux (i686) release 2.6.6-win4lin-r3

I would call this a future development idea.

FYI - "Web scraping is the practice of getting information from a web page and reformatting it."

The idea is to have, hopefully community created, scripts that would convert a non-rss site into an rss formated file. I could easily see the scripts becoming standardized and shared freely. One naming method would be {site}-{date} 
(e.g., www-cnn-com-20040721.py)
I like python. :)

The method would be simple, akregator would have a script associated with a feed. The script outputs a valid xml file so now instead of getting it from the Internet akregator gets it from the script. If the output is invalid akregator would treat it  just like an invalid source thus there is not security issues involved using the scripts. Everything else about akregator remains the same.

Local urls are supported but unless you step up cron jobs there is no automation or central repository.