newsscrape is web scraping for news headline to analyse on how it relates to a news category.
- It extracts RSS feed from Google News.
- Each news headline is matched against Google News category like Entertainment, Sports, etc.
- Called from scheduler to collect this data at 5 minutes interval and be accumulated in a database.
- It contains R statistical computing scripts to learn the pattern on words in the headline resulting a particular category.
- To test its accuracy in predicting the category from a news headline, select a news title from other sources - e.g. http://rss.news.yahoo.com/rss/entertainment - and incorporate it into the R script for outputting a news category it assumes on the news title.
Categories
Web ScrapersFollow newsscrape
Other Useful Business Software
Outgrown Windows Task Scheduler?
Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of newsscrape!